Tips for debugging your new programming language August 11, 2021 on Drew DeVault's blog

Say you’re building a new (compiled) programming language from scratch. You’ll inevitably have to debug programs written in it, and worse, many of these problems will lead you into deep magic, as you uncover problems with your compiler or runtime. And as you find yourself diving into the arcane arts, your tools may be painfully lacking: how do you debug code written in a language for which debuggers and other tooling simply has not been written yet?

In the implementation of my own programming language, I have faced this problem many times, and developed, by necessity, some skills around debugging with crippled tools that may lack an awareness of your language. Of course, the ultimate goal is to build out first-class debugging support, but we must have a language in the first place before we can write tools to debug it. If you find yourself in this situation, here are my recommendations.

First, I’ll echo the timeless words of Brian Kernighan:

The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.

— Unix for Beginners (1979)

Classic debugging techniques are of heightened importance in this environment: first seek to isolate the problem code, then to understand the problem code, then form, and test, a hypothesis — usually with a thoughtful print statement. Often, this is enough.

Unfortunately, you may have to fire up gdb. gdb is often painful in the best of situations, but if you have to use it without debug symbols, you may find yourself shutting off the computer and seeking out rural real estate on which you can establish a new career in farming. If you can stomach it, I can offer some advice.

First, you’re going to be working in assembly, so make sure you’re familiar with how it works. I would recommend keeping the ISA manual and your ABI specification handy. If you’re smart and your language sets up stack frames properly (this is easy, do it early), you should at least have a backtrace, breakpoints at functions, and globals, though all of these will be untyped. You can write C casts to add some ad-hoc types to examine data in your process, like “print *(int *)$rdi”.

You’ll also get used to the ‘x’ command, which eXamines memory. The command format is “x/NT”, where N is the number of objects, and T is the object type: w for word (int), g for giantword (long), and h and b for halfword (short) and byte, respectively: “x/8g $rdi” will interpret rdi as an address where 8 longs are stored and print them out in hexadecimal. Of particular use is the “i” format, for “instruction”, which will disassemble from the given address:

(gdb) x/8i $rip
=> 0x5555555565c8 <rt.memcpy+4>:	mov    $0x0,%eax
   0x5555555565cd <rt.memcpy+9>:	cmp    %rdx,%rax
   0x5555555565d0 <rt.memcpy+12>:	jae    0x5555555565df <rt.memcpy+27>
   0x5555555565d2 <rt.memcpy+14>:	movzbl (%rsi,%rax,1),%ecx
   0x5555555565d6 <rt.memcpy+18>:	mov    %cl,(%rdi,%rax,1)
   0x5555555565d9 <rt.memcpy+21>:	add    $0x1,%rax
   0x5555555565dd <rt.memcpy+25>:	jmp    0x5555555565cd <rt.memcpy+9>
   0x5555555565df <rt.memcpy+27>:	leave  

You can set breakpoints on the addresses you find here (e.g. “b *0x5555555565d0”), and step through one instruction at a time with the “si” command.

I also tend to do some silly workarounds to avoid having to read too much assembly. If I want to set a breakpoint in some specific place, I might do the following:

fn _break() void = void;

export fn main() void = {
    // ...some code...

    // Point of interest
    let x = y[z * q];
    _break();
    somefunc(x);

    // ...some code...
};

Then I can instruct gdb to “b _break” to break when this function is called, use “finish” to step out of the call frame, and I’ve arrived at the point of interest without having to rely on line numbers being available in my binary.

Overall, this is a fairly miserable process which can take 5-10× longer than normal debugging, but with these tips you should at least find your problems solvable. Good motivation to develop better debugging tools for your new language, eh? A future blog post might go over some of this with DWARF and possibly how to teach gdb to understand a new language natively. In the meantime, good luck!

Articles from blogs I read Generated by openring

Status update, May 2024

Hi! Sadly, I need to start this status update with bad news: SourceHut has decided to terminate my contract. At this time, I’m still in the process of figuring out what I’ll do next. I’ve marked some SourceHut-specific projects as unmaintained, such as sr.ht-…

via emersion May 21, 2024

Automatic case design for KiCad

I don't generally get along great with CAD software with the exception of KiCad. I guess the UX for designing things is just a lot simpler when you only have 2 dimensions to worry about. After enjoying making a PCB in KiCad the annoying for me is alwa…

via BrixIT Blog May 15, 2024

The floor is lawa!

And now for something completely different… When was the last time you were excited about a simple window with nothing but a single background color? Well, I currently am. Let me tell you about it… This window is notable, because it was created using the ”pu…

via blogfehler! May 8, 2024