>> Yeah, I know. It sounds like not really the best. So we have two talks, both of which are going to be great, and then we're going to do ice cream. And Paul Khuong is going to talk first. He's going to talk about LZ77. How do you pronounce that? LZ77? Refactors program traces. (applause) >> All right. All right. I'm Paul Khuong. I wrote about -- this blog post, about how you can apply LZ77, and it's a compression algorithm for strings, and use it in a compilation context. Now I'm going to talk about something that's more fundamental, which is why would you want to do that. So first I think the first thing to do is what are traces? So here's a program. On your left... So this is your typical BASIC program. Print hi, then your next statement is looped back to the previous statement. The corresponding trace would be just history of what the program did. So, for instance, here would be an endless sequence of print hi. And what's interesting about program traces is that compiling them is super easy and really fun. So all sorts of interesting transformations that are really hard in general become trivial. Not trivial, but simple, like polynomial, even linear type, usually, once you're working with program traces or expression trees. But there's a general reason for that. Basically, dynamic programming always works, but that's getting awful close to my thesis topic, so let's move on. So my dream compiler would be something like -- you feed in a program, you enumerate all the possible traces, all the possible execution traces for that program, you run your magic on the traces, you make it super tight, and then you compress everything back into object code. Sadly, that can't work, if only because the number of traces is... Very large. Often pretty much infinite. So we would spend a lot of time with the first step. So when we think about general compilation passes -- it's that we kind of try to squash the compression into magic phases, so that something, something, morphism, and you compress, they annihilate one another, and you're left with a souped up version of your magic transformation pass. It's cool. But I don't see how to do that in general, and it's a lot of hard work. What if instead we could brute force our way through that? So if you take your typical recursive function, F, here, and you want to make it quicker, what's the easiest way? You just increase the size of your base case. So for instance, here you could say -- well, all right. I'm doing F, cause F, cause F. Instead I could just kind of squish together the bottom two levels. I get program trace. Do some magic on that, make it a lot quicker, and call that F prime, and now instead of having F cause F cause F, have F cause F prime, that is super quick. So hopefully the whole program will be a lot quicker as well. So yeah. This only works for recursing. If you try to do it on loops or tail recursion, you will fail. And it also works on mu benchmarks. You take these ten lines of code and turn it into something like a dozen or a hundred kilo bytes of machine code, that's going to blow your cache, everything will be slow, and that's the only thing you do in a program that runs matrix multiplications. Not practical. And yet, clearly there's some stricture here. You had a very small program, generated huge output from it, so there must be some redundancy. I traced some stuff there and tried to identify use. Sadly, what happened is mostly you would find loops with two iterations. Not very useful. Not enough compression either. You could also try to find duplicated patches and turn that into subroutines. That worked kind of, ish. But it would often get complex between patches, so that you get subroutines, but you would have hidden away the rest of your redundancy. And that was where I got stuck five years ago, until one morning, this March, as I was going through the RSS feed for archive, as one does in the morning, I stumbled on this. Artur Jez, which is called a really simple approximation of smallest grammar. So in that paper, what Jez tries to do is -- you take a string and you want to convert it into a very tiny grammar. As small as possible. So the basic way to do it would be to say -- well, take your first string here. It's abcdabf. And you just pair terminals to get new terminals. So you have this first pair, ab, call that G, second pair, cd, call that F, and third pair, ab... Oh, I've seen this before. This is still G. Finally F, and now you get a smaller string with a mix of terminals and non-terminals. GH, GF, and you do that again. So that's kind of interesting. Because if you go back to program traces, you can say... Well, non-terminals, called letters, are actually statements. Vm code, machine instructions. Sorry, that's terminals. Then when you create new production rules, that's a new subroutine, and when you have non-terminals, that's actually a call to your subroutine. However, doing this naively, just pairing stuff when you see them, doesn't always work. For instance here, you have abcabcda. There's some redundancy here. Repeated abc. Yet if you pair things, you just get ab. Ca. Bc. Da. And there is not a single repetition there. Not cool. So... What Jez said is well... We've really got algorithms to find redundant patches and strings. LZ77 is a very good one. It's also very simple. And what it does is it tries to describe a string in terms of what you've seen before. So if you have abcabc, you would describe it as abc and a backreference of length 3, starting at a. So you have abc, you define a new function, and rather than repeating abc, you call the function that emits abc. You can also do loops, so you have dedede. What happens is you actually emit de, and then you copy that for a length of 4 characters. That's kind of strange. I only have 2 characters. What happens is you basically end up reading what you just wrote before. So that's a repeat loop. So yeah, you could try to take your LZ77 compression or factorization, as they call it, and turn it into a program. Sadly, that does not work, because the tracers that -- the patches that it refers to make themselves be generated by mixing multiple patches together, and nothing nests correctly, and you're screwed. So what Jez says is... All right. We can't use the factorization directly, but we can use it as an auricle to help us determine when we should or should not pair characters or make subroutines. So you find the factorization and pair things up, except when that would break into another factor. So if you have abcabc, you would do -- all right, I'm going to pair ab, call that d, c, I cannot pair with the next character, because it's a different patch, so I'll leave it alone, then start again, ab. All right. That's D again. And then c. So the nice thing here is that you get some redundancy, and even better, you preserved it, so you can do it again, because now abc has become DCDC. So if you do that, until finally you get a single subroutine call. So Jez does a lot of work to show how that's within a factor log(n) of optimal. I'm not concerned with that. But what I'm going to do is use it on real programs. So here's what happens when you try to draw something in PostScript and you don't know for loops. So you have this line that draws a square. And you just repeat it 8 times, 'til you have this ring of squares. So let's see what happens when I push that into my program, to see if it could find redundancy, and yes, it does. It has factored in my 157 token program into 58 plus 14 curly braces that don't really count. So the only sad thing here is that it doesn't really make sense. It found a loop, but it's kind of weirdly rotated. It ends by pushing something to the stack. But yeah. So... To summarize, if you had nothing to do on a rainy day, consider optimizing traces. It's really fun. Instead of being uber-hard, it's just a nice logic puzzle. And compression is basically about finding hidden structures, even in program code. Thank you. (applause)