Getting Started with Disassembling Python Code

KonfHub
4 min readSep 13, 2019
Python is of course your friend!

Most of the mainstream languages — including Java, .NET languages like C#, Python — compile to an intermediate language. These languages have interpreters (e.g., Python interpreter) or Virtual Machines (e.g., the Java Virtual Machine) that execute the bytecodes. These bytecodes are either generated on the fly (as in the case of Python) or stored in a file format (Java class file format in Java and CIL or Common Intermediate Language for .NET).

Learning how to read bytecodes can be surprisingly useful though only a few developers ever try it (even expert-level developers). The primary benefit is insight on how the language works “under-the-hood” — i.e., how the language and its features work. In more practical terms, understanding bytecodes can help you debug complex scenarios, give insights into performance issues, and help write better code that exploits the power of the language.

In this brief blog post, let’s get started with disassembling Python’s bytecode.

What is “disassembling”? Its the opposite of assembling — the process of converting the assembly code to machine code (for an actual machine or a Virtual Machine). Disassembly would mean the opposite — the process of converting the machine code to assembly code.

Let’s start with an example. Fire-up your python interpreter (I am using Python 3.7 in this blog post) and try out reading the actual bytecodes:

>>> def some_fun():
... a = 10
... b = 20
... c = 30
... return a * b + c
...
>>> some_fun.__code__
<code object some_fun at 0x1075e9420, file "<stdin>", line 1>
>>> some_fun.__code__.co_code
b'd\x01}\x00d\x02}\x01d\x03}\x02|\x00|\x01\x14\x00|\x02\x17\x00S\x00'

We have defined a simple function named some_fun that evaluates the exresssion a * b + c and returns the result.

The __code__ shows the “The code object representing the compiled function body”. Then we call co_code that is “a string representing the sequence of bytecode instructions”. That’s certainly not human-readable and hence we are going to use the module “dis” to see the human readable version of the bytecode.

>>> import dis
>>> dis.dis(some_fun)
2 0 LOAD_CONST 1 (10)
2 STORE_FAST 1 (a)

3 4 LOAD_CONST 2 (20)
6 STORE_FAST 2 (b)

4 8 LOAD_CONST 3 (30)
10 STORE_FAST 3 (c)

5 12 LOAD_FAST 1 (a)
14 LOAD_FAST 2 (b)
16 BINARY_MULTIPLY
18 LOAD_FAST 3 (c)
20 BINARY_ADD
22 RETURN_VALUE
>>>

First import the “dis” module. The expression dis.dis(some_fun) disassembles the some_fun and shows the bytecode. Let’s take a closer look at what it shows.

The first column shows the line numbers of the source code and the corresponding entries in the disassembled bytecode. The second column shows the bytecode index. The thrid column are the actual bytecodes such as LOAD_CONST, BINARY_MULTIPLY, etc.

Python is a stack-based language. To understand this intermediate format, think about post-fix equivalent of the given in-fix expression “a * b + c” — that would be “a b * c +” and that’s that what the following bytecodes achieve:

             12 LOAD_FAST                1 (a)
14 LOAD_FAST 2 (b)
16 BINARY_MULTIPLY
18 LOAD_FAST 3 (c)
20 BINARY_ADD
22 RETURN_VALUE

Bytecodes can take arguments. For example, the LOAD_FAST takes the value ‘a’ (at index ‘1’) as the argument. The bytecode itself is one byte (because it is a “byte”code as the name indicates) and the argument 1 byte in this case, so totally 2 bytes for “LOAD_FAST 1 (a)”. The current bytecode index is 12 and adding 2, the gives the next index of the bytecode starting at 14, which is “LOAD_FAST 2 (b)”.

Everything in Python programs convert to bytecode. Looking at the bytecodes can “demystify” about how some features work. Here is an example:

>>> def square_fun(a):
... return a * a
...
>>> dis.dis(square_fun)
2 0 LOAD_FAST 0 (a)
2 LOAD_FAST 0 (a)
4 BINARY_MULTIPLY
6 RETURN_VALUE
>>> square_fun
<function square_fun at 0x1076d8598>
>>> square_lambda = lambda a: a * a
>>> dis.dis(square_lambda)
1 0 LOAD_FAST 0 (a)
2 LOAD_FAST 0 (a)
4 BINARY_MULTIPLY
6 RETURN_VALUE
>>> square_lambda
<function <lambda> at 0x1076141e0>

This example is short & sweet and conveys a powerful message: despite differences in syntax, a function and lambda code compiles to the same code! As I promised earlier, now you know what I meant by it can give you “insight on how the language works “under-the-hood” — i.e., how the language and its features work!”

So what are you waiting for: just fire up your interpreter and play around and explore the Python bytecodes —trust me its an adventure full of fun!

References / Further Reading:

  1. The Python data model (for info like __code__): https://docs.python.org/3/reference/datamodel.html
  2. List of Python bytecodes: https://docs.python.org/2/library/dis.html#bytecodes
  3. The dis disassembler for Python bytecodes: https://docs.python.org/2/library/dis.html
  4. Understanding python bytecodes: https://www.synopsys.com/blogs/software-security/understanding-python-bytecode/
  5. Videos on bytecodes: https://www.youtube.com/watch?v=cSSpnq362Bk and https://www.youtube.com/watch?v=GNPKBICTF2w

(Written by: Ganesh Samarthyam, CoFounder, KonfHub Technologies LLP)

--

--

KonfHub

KonfHub is the one-stop platform to create and host tech events. Create the event in under 30 seconds, gamify & amplify for audience engagement!