>> Awesome, thank you. So I'm Allison Kaptur, as Julia mentioned. I'm a facilitator here at Hacker School. So if you have questions about Hacker School, I would love to talk to you about those. Here are some places you can find me - github and twitter. Today we're going to see how a 1,500 line switch statement powers your Python. Quick comment about the word "your". I'm going to talk about CPython today, which is the reference implementation. If you don't know what Python you use, it's probably CPython. I'm also going to talk about Python 2.7. Many details in different versions and different implementations are slightly different, but the main ideas hold. So to get to this 1500 line switch statement we're going to go deep into the internals of the Python interpreter. When you run a Python program, you can think about there being four steps. There's lexing, there's parsing, there's compiling, and there's interpreting. Today we're only going to talk about the interpreter. And we'll spend just a minute here on what it means for a Python program to be compiled, since Python is not a quote-unquote compiled language... Most languages that people don't think of as compiled do include a compile step. It just doesn't do as much work as a compiled language would. So another way to think of that is that the interpreter is left with a lot of work to do. So Python compiles down to bytecode, and bytecode is the internal representation of Python to the interpreter. You also hear bytecode referred to as an intermediate representation because it's between Python, the top level, and the low level implementation. So let's make this concrete. Let's get our hands on some bytecode. Here's a function we can use to poke around. Python exposes a boatload of internal implementation details at run time so this is actually not that hard. So this is a simple function - it takes two arguments, mods one against the other, and returns the result. So we'll get a handle on the function object, the code object, and the bytecode. The bytecode is not the same as the code object - the bytecode is one level down. So let's take a look. So that was not especially enlightening, just to print the thing. One thing we can already tell just by doing this is a bytecode is a series of bytes. It looks weird because some are printable and others are not printable. We can do a little bit better by saying let's look at the ordinals instead of the printable or non-printable values. Now we have a series of numbers which is slightly more intelligible. So there's a great tool in the standard library called dis, which is a bytecode disassembler. You want a disassembler any time you have assembly or some low level instruction and you want to make it human readable. Dis never targets a machine, so You very rarely see this in a production system, there are some exceptions, and many of them are bad ideas and I'd love to talk about them later. So running our function through the disassembler, we get this, which is more intelligible. Breaking this down, the first is the line number, second column is the index in the bytecode, the third column is the instruction name mapped to be more human readable. So instead of saying 124 we said load fast, instead of saying 22 we say binary_modulo. The third [EDIT: FOURTH] column is the arguments to the byte, the argument to the instruction, and it's not the number of arguments, it's the arguments themselves, and then the fifth column is a hint about what those arguments actually mean. The fifth column and the first column are not technically part of the bytecode but they're on the code object. So one thing you might notice is that binary_modulo does not take any arguments, and that might break our mental model of what a binary function is. We think it takes two arguments. So to find out why this behaves this way, we need to dive into the virtual machine. You'll sometimes hear people use the terms virtual machine and interpreter interchangeably. This is a reasonable thing to do when it comes to Python. And a virtual machine is just a program that is pretending to be a computer. So most of the work that the virtual machine is doing is manipulating the virtual machine stack. So when we look at the code we'll see lots of instructions that do things like push to the stack and pop from the stack and manipulate the stack in other ways. Here's an example with how our function mod works. We load_fast, these two arguments, A and B, push them onto the stack, then we binary mod, which pops the top two things off the stack, performs the calculation, and pushes the result back onto the stack. So this is why binary_modulo doesn't take arguments. It's a binary operation that mods whatever happens to be on the top of the stack. If the stack is not in the right state, this returns garbage. I promised you a 1500 line switch statement. This is it. This is the main loop of the CPython interpreter in all its glory. It's effectively taking each instruction, using the switch statement to find the right thing to do, doing that thing, and moving on to the next instruction. Here's it zoomed in a little bit, if the last slide was hard to read. This file is ceval.c. It's really fun to read and you should go check it out later. I love this because it seems like such an obvious solution in some ways. If you're sitting there, you can imagine yourself being Guido van Rossum and thinking -- okay, I have a string of numbers, and each number maps to an instruction, a set of things that should happen, and I'm writing C. So maybe for now I'll write a switch statement and there'll just be a lot of cases. And then that's actually the solution and it works. Which is really surprising. 1500 lines is a lot and not all C compilers can actually handle this. So in this file, ceval.c, is a switch that you can flip... I shouldn't say switch, but a flag you can flip if your compiler chokes and in the middle it falls down into a second switch statement so you can break it in half if you need to. This is really the code. So back to that bytecode. Let's take a look at the instructions we generated. In particular, load_fast and binary_modulo. We won't have time for the others. So load_fast is the first instruction that's defined, it's the most commonly executed bytecode in most Python programs. The C is somewhat hard to read. Depends if you know C or not. I don't. You can see that we've got a macro, get local, which takes the argument to the instruction, the oparg, and then we push the result of that, whatever that does, onto the stack. How about this unbound local error. Hands up if you've raised an unbound local error while programming Python? So this happens if you have a +=1 in a function body when a isn't local. And here's the code that generates that error. So cool. So here's binary_modulo. This now fits with our mental model. We popped the first two things off the stack. (In practice we're using a slightly different macro, that TOP, that keeps the thing on the stack. Don't worry about that.) We do some stuff, calculate the results, and push the result back on the stack. So one thing you hear about Python is it's a dynamic language, and one of the things that dynamic means in this context is that a lot is happening at run time and a lot is not happening at compile time. The compiler doesn't know very much and the interpreter has to do a lot of work. So let's look at actually calling this function -- this is the normal way you would expect to call this function. It's pretty obvious what's going on. How about this? Guesses for what this will return? It's almost like you all expect this is a trick question. Exactly. !!Con. And you've seen this before, but you've seen it in this format, probably. When you do print with the percent symbol, you're actually modding two strings against each other. >> What?! >> Yes! Or in this case a string and a tuple. So if we look back at binary_modulo, this is slightly less obvious than it appears to be. The first thing we do is we check is to see if v is a string and then we dispatch to string formatting if it is. This is speed optimization. We don't strictly have to do this, because the compiler's ignorance is even deeper than this. This is Python, so we can write a __mod__ function on a custom object that defines what it means to calculate a modulus. And I've written something that's side affecting, because hey, why not. So the interpreter will resolve what it means to mod one thing against another at run time, and the compiler that emitted that binary_modulo instruction has no idea what's going to happen. So there's this great paper by Russell Power and Alex Rubinsteyn, "How Fast Can We Make Interpreted Python?" Their conclusion, using the strategies that they employed, was "not very." And one of the reasons why this is so challenging is because you have no type information. And so you don't have any idea what's going to happen on a very fundamental level. So even if you had something that was... Appeared to be obvious, like we're repeating a calculation, we're doing the binary_modulo of A and B twice, right next to each other in the code, surely you can just do that calculation once and store the result. You can't, because you don't have that knowledge about what the bytecode is actually doing. There's so much more to this, but ten minutes is not a lot of time. This first blog is a really awesome blog about the Python internals and I go through and read it about every six months and learn new things every time. I'm also working on a Python interpreter written in pure Python with Ned Batchelder. If you'd like to contribute to that, I'd love to talk to you - we're currently stuck on nested generators, which is... I don't know the solution to that yet. And here's my blog. So thank you! (applause)