Tips for debugging your new programming language

Say you’re building a new (compiled) programming language from scratch. You’ll inevitably have to debug programs written in it, and worse, many of these problems will lead you into deep magic, as you uncover problems with your compiler or runtime. And as you find yourself diving into the arcane arts, your tools may be painfully lacking: how do you debug code written in a language for which debuggers and other tooling simply has not been written yet?

In the implementation of my own programming language, I have faced this problem many times, and developed, by necessity, some skills around debugging with crippled tools that may lack an awareness of your language. Of course, the ultimate goal is to build out first-class debugging support, but we must have a language in the first place before we can write tools to debug it. If you find yourself in this situation, here are my recommendations.

First, I’ll echo the timeless words of Brian Kernighan:

The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.

— Unix for Beginners (1979)

Classic debugging techniques are of heightened importance in this environment: first seek to isolate the problem code, then to understand the problem code, then form, and test, a hypothesis — usually with a thoughtful print statement. Often, this is enough.

Unfortunately, you may have to fire up gdb. gdb is often painful in the best of situations, but if you have to use it without debug symbols, you may find yourself shutting off the computer and seeking out rural real estate on which you can establish a new career in farming. If you can stomach it, I can offer some advice.

First, you’re going to be working in assembly, so make sure you’re familiar with how it works. I would recommend keeping the ISA manual and your ABI specification handy. If you’re smart and your language sets up stack frames properly (this is easy, do it early), you should at least have a backtrace, breakpoints at functions, and globals, though all of these will be untyped. You can write C casts to add some ad-hoc types to examine data in your process, like “print *(int *)$rdi”.

You’ll also get used to the ‘x’ command, which eXamines memory. The command format is “x/NT”, where N is the number of objects, and T is the object type: w for word (int), g for giantword (long), and h and b for halfword (short) and byte, respectively: “x/8g $rdi” will interpret rdi as an address where 8 longs are stored and print them out in hexadecimal. Of particular use is the “i” format, for “instruction”, which will disassemble from the given address:

(gdb) x/8i $rip
=> 0x5555555565c8 <rt.memcpy+4>:	mov    $0x0,%eax
   0x5555555565cd <rt.memcpy+9>:	cmp    %rdx,%rax
   0x5555555565d0 <rt.memcpy+12>:	jae    0x5555555565df <rt.memcpy+27>
   0x5555555565d2 <rt.memcpy+14>:	movzbl (%rsi,%rax,1),%ecx
   0x5555555565d6 <rt.memcpy+18>:	mov    %cl,(%rdi,%rax,1)
   0x5555555565d9 <rt.memcpy+21>:	add    $0x1,%rax
   0x5555555565dd <rt.memcpy+25>:	jmp    0x5555555565cd <rt.memcpy+9>
   0x5555555565df <rt.memcpy+27>:	leave

You can set breakpoints on the addresses you find here (e.g. “b *0x5555555565d0”), and step through one instruction at a time with the “si” command.

I also tend to do some silly workarounds to avoid having to read too much assembly. If I want to set a breakpoint in some specific place, I might do the following:

fn _break() void = void;

export fn main() void = {
    // ...some code...

    // Point of interest
    let x = y[z * q];
    _break();
    somefunc(x);

    // ...some code...
};

Then I can instruct gdb to “b _break” to break when this function is called, use “finish” to step out of the call frame, and I’ve arrived at the point of interest without having to rely on line numbers being available in my binary.

Overall, this is a fairly miserable process which can take 5-10× longer than normal debugging, but with these tips you should at least find your problems solvable. Good motivation to develop better debugging tools for your new language, eh? A future blog post might go over some of this with DWARF and possibly how to teach gdb to understand a new language natively. In the meantime, good luck!

Tips for debugging your new programming language August 11, 2021 on Drew DeVault's blog

Articles from blogs I read Generated by openring

Disappointing phones

Trudging Through Nonsense

Binary Dependencies: Identifying the Hidden Packages We All Depend On