JOSH: Excellent. Okay. I am first going to remind you how terrible the debugging experience can be and then I'm going to introduce you to a future of rainbows and unicorns and you'll love it. Let's definitely so when you're using a source code debugger. Often you have two main concerns, either you interrupt the program too late and you miss the parts of it that made it important for resolving the problem, in which case you have to restart, or you stop it too early and then you waste time getting to the parts that do matter. And if you end up making a mistake, you still have to restart all over again. This is unpleasant. This is even worse if your error isn't actually consistent because then if you restart there's no guarantee that you'll actually catch the error again. It's a really bad cycle. So let's talk about the specter of non-determinism in programs. So what that means is that your program relies on factors besides the initial state of the program and any initial inputs provided to it in order to decide how it behaves. So here are two simple example programs that both, initialize a random number generator and then take the first random number and then decide what to do based on it. The one on the left is deterministic. That means that if you provide the same seed number on the command line to it, every time it will behave exactly the same way. The one on the right initializes the random number generator based on the current time and that means that if you run the program over and over, it'll have a different seed every time. That is non-deterministic. So other causes include things like user input or the way the kernel schedules your tasks, or, you know, external signals, things like this. So I want to tell you about a tool called rr which stands for "record and replay." And what that does it actually records the non-deterministic behavior of your program, it notices what is non-deterministic, records that, and later allows you to replay previous executions in an exact same way in a deterministic fashion, which is pretty cool. So in practice, that means that your debugger is an Elefant. That means that it can now remember all previous knowledge of a presume and that means that if you restart, everything can be exactly the same any non-deterministic behavior will behave exactly the same way in subsequent restarts of the process. That means that memory addresses will be exactly the same. And that means that, like, any watchpoints, any breakpoints that you leave will still be valid any subsequent times you run your debugger. also you can travel in time, and that's really cool. So let's see what that means in practice. So I have a very simple web server which, occasionally, when you refresh it, instead of showing you a directory listing that actually has a bad request and that's no good. But it's also mysterious. So if you look at the output, you notice that it sometimes prints out a useful message saying, "There was a problem." So let's go and take a look at this replay of the recording that I took of this in the past. And this can be hooked up to any GDB front end. So I'm going to do it in emacs. So first I'll set the breakpoint that printed out the error. AUDIENCE MEMBER: Bigger font? JOSH: I don't think I can. It's a video. So we're sitting on the break point. Right there. So this happens after the actual error has occurred. So that doesn't necessarily help us because we want to know what caused it. So let's go backwards in time. We're now at the previous if-statement. So it's based on a variable called error flag. So let's take a look at where that's stored in memory and we get a pointer out of it. And we'll then set a watchpoint on this pointer, which means, "We'll get a breakpoint whenever that value's changed." Now, let's run the program backwards and see what happens. And we go to the line that changes the zero to one. And that's really cool. So now we can move around in time in program state and we can go forwards to see what happens afterwards or we can go backwards again and we can keep going backwards. If we want to dig into what happened before us, we want to step into functions, so we can go back in time, and a bit more. But we decide we're not actually interested in that, we can reverse finish and go back to the color. So from here, I think we'll step back a little bit more to see what the problem was. Not the if-statement, that's just getting us into where the error was flagged. Let's just go to try_parse here. Eventually we notice there's this mallet call, an allocation that turns out that it failed. And that says, close connection immediately. So that looks like our problem. And so we were able to figure out the cause based on the effects, which is a radical notion. And if you rerun the debugger, the watch point is still valid and he can see that it goes forward. Every time you restart your replay session. So, that is what rr looks like using it. So when you think, like, how's this happening? So there's a few tricks. One is that all multithreading gets scheduled into one single thread so that means that you can control how the actual interruption of -- preemption of threads happens. You'll have to trust me on that, there are reasons. And then almost all non-deterministic behavior here ends up happening through the system calls the kernel which means that the program can intercept the system call and notice the resulting output when recording and then when it's replaying, when it has the system call again, ignore it and then just replace the output you would get with the recorded one and then you use something called hardware performance counters to get a specific point in time that you can associate with these events and in this manner, you then have a trace of programming behavior over time that you can just replay over and over, which is neat. So why is rr especially in particular is that it has special low overhead compared to other tools. It's nothing new. Like, this isn't an idea that's been around since the earlier 2000's, and there's a bunch of tools out there. This one is a new one that's aiming for production quality. We use it at FireFox regularly. It's of new use to us good it can't deal with giant programs and also we don't want any kernel modules, we don't want any hardware. There is any stock 32 or 64 bit Linux system that can do this with it. And it's all open source, so it's pretty fun. So what else can you do with this? You can travel back in time. It's fun. But imagine if all your testing infrastructure ran in rr. If you have less than two times overhead, often that means 1.2 times. That means a ten minute suite will now take two times. Every time you notice a flaky test that occasionally fails, you can be just like oh, I'll just grab the recording from another server and log in there and take a look whenever I have the time. That's pretty nice. And also this works in optimize builds in that if you're ever debugging something in GDB, and this object that I'm trying to inspect optimizes out, you can figure out where that that value came from and eventually you can point a point where it's not optimize out. So additionally this is super helpful for debugging products. For audio video stuff, we have an engineer who's debugging wireframe drops. And when you have GDB, you can say, when was the last time we ran this code? If you were running GDB normally, maybe it was 60 seconds ago. But with rr, it has a fixed point in time that would be the same point in time, and you can actually run logic against that. So that's it. This is a really fun tool to play with, and record and replay makes my life so much better when I use it. And I love it. Come talk to me more about how it works. Thank you. MAGGIE: How amazing were all of these? AUDIENCE MEMBER: Super amazing!