Python internals

GAMR1520

Markup languages and scripting

Python internals

Dr Graeme Stuart

Reminder

Even the simplest value in python is a complex object. If we create an integer, we are instantiating an object of type int. This gives us access to a lot of magic.

a = 1

pyObject
id	0x7fa8655e00f0
type	<class 'int'>
value	1
refs	1

A pyObject

The memory allocated to our objects is managed for us by the python interpreter. Objects will be cleared from memory once they are no longer accessible to our programme. This is achieved by counting references.

Reference counting

a = "hello"

flowchart LR; subgraph python variables a end subgraph One reference to pyObject hello["hello (1)"] end a -- 0x7f3b40df00f0 ---> hello

b = a

flowchart LR; subgraph python variables a b end subgraph Two references to pyObject hello["hello (2)"] end a -- 0x7f3b40df00f0 ---> hello b -- 0x7f3b40df00f0 ---> hello

Reference counting

a = "hello"
b = a
a = "world"

flowchart LR; subgraph python variables a b end subgraph One reference each world["world (1)"] hello["hello (1)"] end a -- 0x7f3b40d83f70 ---> world b -- 0x7f3b40df00f0 ---> hello

A new pyObject is created at location 0x7f3b40d83f70 containing the string 'world'. The variable a points to the new object. The variable b still points to the original object.

Reference counting

a = "hello"
b = a
a = "world"
b = "world"

flowchart LR; subgraph python variables a b end subgraph 'hello' has no more references world["world (2)"] hello["hello (0)"] end a -- 0x7f3b40d83f70 ---> world b -- 0x7f3b40d83f70 ---> world

The pyObject at 0x7f3b40df00f0, containing the string 'hello' no longer has any references. It will be identified by the garbage collector and the memory will be freed for use.

Reference counting implementation detail

Simple objects like Boolean values and smaller integers are automatically reused and are not garbage collected.

id(100)
id(100)

These would be expected to generate different identifiers. Each statement should create a value in memory, return it’s location, and immediately set the reference count to zero.

140397168938320
140397168938320

We get the same identifier, indicating that the same location in memory was used for both objects, even though no variable was created referencing the object.

This is an implementation detail and should not be relied upon in code.

Reference counting implementation detail

If we create larger objects, the behaviour is as expected. They will be created on demand and destroyed quickly.

id(1000000000)
id(1000000000)

Each statement creates a separate object in memory, which is discarded immediately.

140397166989520
140397166989936

The numbers are different, this is what you should be assuming. Data in memory are erased very quickly once no longer referenced.

Machine code and byte code

Code is a form of data too. Once compiled, instructions can be reduced to machine code. The CPU manages the flow through the sequence of instructions by incrementing a counter. Some instructions can be used to change the counter value.

In interpreted languages like python, the interpreter consumes byte code. Every python programme is compiled to bytecode, we will look at a few examples to get a sense of what is going on.

def conditional_return(a):
    if a:
        return "some value"

 	OPCODE	OPNAME        		ARG	ARGVAL
0	124	LOAD_FAST           	0	a
2	114	POP_JUMP_IF_FALSE   	4	8
4	100	LOAD_CONST          	1	some value
6	83	RETURN_VALUE        	None	None
8	100	LOAD_CONST          	0	None
10	83	RETURN_VALUE        	None	None