GAMR1520: Markup languages and scripting

GAMR1520

Markup languages and scripting

Python internals

Dr Graeme Stuart


Reminder

Even the simplest value in python is a complex object. If we create an integer, we are instantiating an object of type int. This gives us access to a lot of magic.

a = 1
pyObject
id0x7fa8655e00f0
type<class 'int'>
value1
refs1
A pyObject

The memory allocated to our objects is managed for us by the python interpreter. Objects will be cleared from memory once they are no longer accessible to our programme. This is achieved by counting references.


Reference counting

a = "hello"
flowchart LR; subgraph python variables a end subgraph One reference to pyObject hello["hello (1)"] end a -- 0x7f3b40df00f0 ---> hello
b = a
flowchart LR; subgraph python variables a b end subgraph Two references to pyObject hello["hello (2)"] end a -- 0x7f3b40df00f0 ---> hello b -- 0x7f3b40df00f0 ---> hello

Reference counting

a = "hello"
b = a
a = "world"
flowchart LR; subgraph python variables a b end subgraph One reference each world["world (1)"] hello["hello (1)"] end a -- 0x7f3b40d83f70 ---> world b -- 0x7f3b40df00f0 ---> hello

A new pyObject is created at location 0x7f3b40d83f70 containing the string 'world'. The variable a points to the new object. The variable b still points to the original object.


Reference counting

a = "hello"
b = a
a = "world"
b = "world"
flowchart LR; subgraph python variables a b end subgraph 'hello' has no more references world["world (2)"] hello["hello (0)"] end a -- 0x7f3b40d83f70 ---> world b -- 0x7f3b40d83f70 ---> world

The pyObject at 0x7f3b40df00f0, containing the string 'hello' no longer has any references. It will be identified by the garbage collector and the memory will be freed for use.


Reference counting implementation detail

Simple objects like Boolean values and smaller integers are automatically reused and are not garbage collected.

id(100)
id(100)

These would be expected to generate different identifiers. Each statement should create a value in memory, return it’s location, and immediately set the reference count to zero.

140397168938320
140397168938320

We get the same identifier, indicating that the same location in memory was used for both objects, even though no variable was created referencing the object.

This is an implementation detail and should not be relied upon in code.


Reference counting implementation detail

If we create larger objects, the behaviour is as expected. They will be created on demand and destroyed quickly.

id(1000000000)
id(1000000000)

Each statement creates a separate object in memory, which is discarded immediately.

140397166989520
140397166989936

The numbers are different, this is what you should be assuming. Data in memory are erased very quickly once no longer referenced.


Machine code and byte code

Code is a form of data too. Once compiled, instructions can be reduced to machine code. The CPU manages the flow through the sequence of instructions by incrementing a counter. Some instructions can be used to change the counter value.

In interpreted languages like python, the interpreter consumes byte code. Every python programme is compiled to bytecode, we will look at a few examples to get a sense of what is going on.

def conditional_return(a):
    if a:
        return "some value"

 	OPCODE	OPNAME        		ARG	ARGVAL
0	124	LOAD_FAST           	0	a
2	114	POP_JUMP_IF_FALSE   	4	8
4	100	LOAD_CONST          	1	some value
6	83	RETURN_VALUE        	None	None
8	100	LOAD_CONST          	0	None
10	83	RETURN_VALUE        	None	None

A function that does nothing

Here’s a function that does nothing.

the pass does nothing but syntactically it is necessary to avoid an IndentationError

def nothing():
    pass

…and the resulting byte code.

       OPCODE OPERATION            ARGUMENT
     0    100 LOAD_CONST           None
     2     83 RETURN_VALUE         

This loads None and returns it. Functions always return a value and they will return None by default.


Returning a value

Here we add a return statement.

def return_something():
    return "some value"

…and the resulting byte code.

       OPCODE OPERATION            ARGUMENT
     0    100 LOAD_CONST           'some value'
     2     83 RETURN_VALUE         

Simple, the bytecode loads our value (a literal, so its constant) and returns it.


Assignment

Here’s something a bit more advanced (!).

def assign():
    a = 1

…and the resulting byte code.

       OPCODE OPERATION            ARGUMENT
     0    100 LOAD_CONST           1
     2    125 STORE_FAST           a
     4    100 LOAD_CONST           None
     6     83 RETURN_VALUE         

It loads the constant (literal), 1 and stores it in variable a. It then loads None and returns it.


Returning the value

This function does a tiny bit more, it returns the value of a.

def assign_and_return():
    a = 1
    return a

…and the resulting byte code.

       OPCODE OPERATION            ARGUMENT
     0    100 LOAD_CONST           1
     2    125 STORE_FAST           a
     4    124 LOAD_FAST            a
     6     83 RETURN_VALUE         

We can see, the code does the same but it loads from a rather than loading None.


Returning an argument

This function receives a single argument and returns it.

def return_argument(a):
    return a

…and the resulting byte code.

       OPCODE OPERATION            ARGUMENT
     0    124 LOAD_FAST            a
     2     83 RETURN_VALUE         

The variable a is already available. So it just loads it and returns it.


Addition

What happens when we add numbers?

def return_argument_plus_one(a):
    return a + 1

The byte code includes a new BINARY_ADD code.

       OPCODE OPERATION            ARGUMENT
     0    124 LOAD_FAST            a
     2    100 LOAD_CONST           1
     4     23 BINARY_ADD           
     6     83 RETURN_VALUE         

We load the value a, load the constant 1, add them (presumably BINARY_ADD takes the two arguments) and return.


Multiple arguments

A similar example, with two arguments.

def return_argument_product_minus_one(a, b):
    return a * b - 1

The resultant byte code loads the arguments a and b, multiplies them, loads the constant 1, subtracts it and returns the result.

       OPCODE OPERATION            ARGUMENT
     0    124 LOAD_FAST            a
     2    124 LOAD_FAST            b
     4     20 BINARY_MULTIPLY      
     6    100 LOAD_CONST           1
     8     24 BINARY_SUBTRACT      
    10     83 RETURN_VALUE         

Conditionals

def conditional_return1(a, b):
    if a:
        return a
    else:
        return b

The byte code loads a and then includes a POP_JUMP_IF_FALSE code which will jump to code 8 if the loaded value (i.e. a) is False. There are two paths to a return code.

       OPCODE OPERATION            ARGUMENT
     0    124 LOAD_FAST            a
     2    114 POP_JUMP_IF_FALSE    to 8
     4    124 LOAD_FAST            a
     6     83 RETURN_VALUE         
     8    124 LOAD_FAST            b
    10     83 RETURN_VALUE         

More efficient?

Taking advantage of the or operator seems more efficient.

def conditional_return2(a, b):
    return a or b

The byte code is clearly more efficient, using JUMP_IF_TRUE_OR_POP.

       OPCODE OPERATION            ARGUMENT
     0    124 LOAD_FAST            a
     2    112 JUMP_IF_TRUE_OR_POP  to 6
     4    124 LOAD_FAST            b
     6     83 RETURN_VALUE         

Looping

A simple while loop decrements a until it’s less than 10. This leads to some moderately complex byte code.

def looping1(a):
    while a > 10:
        a -= 1
    return a
       OPCODE OPERATION            ARGUMENT
     0    124 LOAD_FAST            a
     2    100 LOAD_CONST           10
     4    107 COMPARE_OP           >
     6    114 POP_JUMP_IF_FALSE    to 24
     8    124 LOAD_FAST            a
    10    100 LOAD_CONST           1
    12     56 INPLACE_SUBTRACT     
    14    125 STORE_FAST           a
    16    124 LOAD_FAST            a
    18    100 LOAD_CONST           10
    20    107 COMPARE_OP           >
    22    115 POP_JUMP_IF_TRUE     to 8
    24    124 LOAD_FAST            a
    26     83 RETURN_VALUE         

Alternative loop

A slightly different approach produces slightly different bytecode to achieve the same result.

def looping2(a):
    while True:
        if a <= 10:
            return a
        a -= 1

       OPCODE OPERATION            ARGUMENT
     0      9 NOP                  
     2    124 LOAD_FAST            a
     4    100 LOAD_CONST           10
     6    107 COMPARE_OP           <=
     8    114 POP_JUMP_IF_FALSE    to 14
    10    124 LOAD_FAST            a
    12     83 RETURN_VALUE         
    14    124 LOAD_FAST            a
    16    100 LOAD_CONST           1
    18     56 INPLACE_SUBTRACT     
    20    125 STORE_FAST           a
    22    113 JUMP_ABSOLUTE        to 2

Accessing data over HTTP

This week we will look at getting data over HTTP. The urllib.request module allows us to trigger HTTP requests. This allows us to grab files from web servers.

from urllib.request import urlopen

url = 'http://gamr1520.github.io/GAMR1520/exercises/2.1.html'
response = urlopen(url)
data = response.read().decode('utf-8')

print(data[:95])
<!DOCTYPE html>
<html lang="en">
<head>
<title>Files and folders</title>
<meta charset="utf-8">

TBH, I usually use the third-party requests library for anything but the simplest examples. But urllib works perfectly for simple requests.


JSON APIs

Using the same approach and the json module, we can load data from JSON apis. For example swapi.

import json
from urllib.request import urlopen

url = 'https://swapi.py4e.com/api/'
response = urlopen(url)
data = json.loads(response.read().decode('utf-8'))
print(json.dumps(data, indent=2))
{
  "people": "https://swapi.py4e.com/api/people/",
  "planets": "https://swapi.py4e.com/api/planets/",
  "films": "https://swapi.py4e.com/api/films/",
  "species": "https://swapi.py4e.com/api/species/",
  "vehicles": "https://swapi.py4e.com/api/vehicles/",
  "starships": "https://swapi.py4e.com/api/starships/"
}

Graphical user interfaces

import tkinter as tk

class MyApplication(tk.Tk):
    def __init__(self):
        super().__init__()
        self.title('My Application')
        self.intro1 = tk.Label(text="This is just some text", font=('Helvetica', 18))
        self.intro2 = tk.Label(text="and some more", font=('Helvetica', 12))
        self.intro1.grid(pady=(20, 0), padx=50)
        self.intro2.grid(pady=(0, 20))

app = MyApplication()
app.mainloop()

A basic application


Complex interfaces and custom widgets

We will build up to a fairly complex interface.

step_04d


Phase test this week

The phase test will cover all the stuff we’ve looked at in the first two weeks

You will be asked to read simple programmes and understand them.


Last years phase test results

The phase test is a small component and is intended to test your basic grasp of the foundational concepts we have been looking at so far. Last year the vast majority of students had no problem with it.

You will have no problems as long as you are engaging with the materials and asking questions until you understand.

phase test results 2023/24 phase test classifications 2023/24

Make sure you understand this

def formatted_list(items, title="list", ch='*', pad=4):
    width = max([len(i) for i in items + [title]]) + pad
    hline = ch * width
    result = [hline, title, hline] + items + [hline]
    result = [f"{ch}{i.center(width)}{ch}" for i in result]
    return "\n".join(result)

shopping = ['apples', 'bananas', 'cherries']
formatted_shopping = formatted_list(shopping, title='fruit', pad=8)
print(formatted_shopping)
******************
*     fruit      *
******************
*     apples     *
*    bananas     *
*    cherries    *
******************

Thanks for listening

Any questions?

We should have plenty of time for questions and answers.

Just ask, you are probably not the only one who wants to know.

Dr Graeme Stuart