GAMR1520: Markup languages and scripting

python logo

Lab 2.3: Object-oriented python

Part of Week 2: Files, functions and classes

General setup

For all lab exercises you should create a folder for the lab somewhere sensible.

Assuming you have a GAMR1520-labs folder, you should create a GAMR1520-labs/week_2 folder for this week and a GAMR1520-labs/week_2/lab_2.3 folder inside that.

Create a collection of scripts. If necessary, use folders to group related examples.

GAMR1520-labs
└─ week_2
    └─ lab_2.3
        ├─ experiment1.py
        └─ experiment2.py

Try to name your files better than this, the filename should reflect their content. For example, string_methods_.py, conditionals.py or while_loops.py.

Make sure your filenames give clues about what is inside each file. A future version of you will be looking back at these, trying to remember where a particular example was.

General approach

As you encounter new concepts, try to create examples for yourself that prove you understand what is going on. Try to break stuff, its a good way to learn. But always save a working version.

Modifying the example code is a good start, but try to write your own programmes from scratch, based on the example code. They might start very simple, but over the weeks you can develop them into more complex programmes.

Think of a programme you would like to write (don't be too ambitious). Break the problem down into small pieces and spend some time each session trying to solve a small problem.

In this set of exercises we will introduce the class keyword and explore dunder methods such as __init__ and __str__ as well as the instance argument (self) which is implicitly passed to class methods.

Table of contents

Classes are type definitions

Remember that all values are objects and that all objects have a type. For example, 'hello' and [1, 2, 3] are values. 'hello' is an instance of the str type. [1, 2, 3] is an instance of the list type.

flowchart TD; subgraph types ["Types"] list str int end subgraph instances [Instances] hello["'hello'"] world["'world'"] listA["[1, 2, 3]"] listB["['a', 'b', 'c']"] listC["[2.2, 'hello', 4]"] 1 2 end 1 --> int 2 --> int hello --> str world --> str listA --> list listB --> list listC --> list

The type of a value will determine what kind of things it can do. For example, a list like [1, 2, 3] will have an append() method. Whereas a str like 'hello' will have an upper() method.

Classes in python are definitions of new data types from which we can create instances. Our starting point will look like this:

class MyClass:
    pass

For an empty class the pass keyword is necessary to avoid an IndentationError in the compound statement. It literally does nothing but represent an empty code block.

The code above is a compound statement using the class keyword containing only the (capitalised) name of our MyClass class. This is how classes are defined.

We can now refer directly to the MyClass class, for example, we can print it.

class MyClass:
    pass

print(MyClass)
<class '__main__.MyClass'>

The output shows the default representation for a MyClass instance indicating that the class is defined in the __main__ module.

In python the file you execute is known as the __main__ module. Double underscores indicate that __main__ is part of the python infrastructure.

We create an instance of our MyClass class by calling it with parentheses, just like we would call a function.

class MyClass:
    pass

my_instance = MyClass()

It’s very important to be clear on the difference between a class and an instance. It should become clear with practice.

The parentheses indicate we are calling our class and can pass values as arguments. Only classes, functions and methods can be called in this way. In fact, as we shall see later, we are calling the class constructor method which can be used to customise our instances. This is really important and we will talk about it a lot later. For now, we will concentrate on the class itself.

If we print the instance, we can see it is a MyClass object. We can also pass our instance to the type() function, just like with any value.

class MyClass:
    pass

my_instance = MyClass()

print(my_instance)
print(type(my_instance))
<__main__.MyClass object at 0x7f4f03e60280>
<class '__main__.MyClass'>

We can see that our my_instance variable points to an object which has the type MyClass.

This is the default representation, we shall see later that we can customise how our types are represented.

You can check that the location of the object is the same as it’s id like this.

print(f'{id(my_instance):#x}')
0x7f4f03e60280

Notice the same number is included when we print the instance.

Attributes

In practice, classes are a way of organising code into namespaces containing attributes. We can assign attributes to our class using dot notation.

class MyClass:
    pass

# Using dot notation
MyClass.apples = 'hello'  # setting an attribute on the class

# more dot notation
print(MyClass.apples)     # This is found and printed
print(MyClass.bananas)    # This will raise an error
hello
Traceback (most recent call last):
  File "path/to/simple_attributes.py", line 9, in <module>
    print(MyClass.bananas)    # This will raise an error
AttributeError: type object 'MyClass' has no attribute 'bananas'

The first print statement works fine but the second raises an AttributeError because the attribute MyClass.bananas was not found.

The MyClass.apples attribute is not a normal variable. It is defined within the namespace provided by the class. It can only be accessed via the class (and, as we shall see, via it’s instances).

Python modules (i.e. files) are also namespaces in the same way. Its essentially a mechanism for avoiding name clashes across modules and classes.

Instance attributes

Instances are also namespaces, a lot like classes. However, there is one major difference. Instances can access attributes from their class.

If an attribute is not found on an instance, it will be looked up on the class. In this way, classes provide attributes that are shared across all instances.

class MyClass:
    pass

# create instances
my_instance = MyClass()
another_instance = MyClass()

# Set some attributes on the class and an instance
MyClass.apples = 'hello'
my_instance.apples = 'different value'
my_instance.bananas = 123

# Make sure you follow this
print(my_instance.apples)       # found on the instance
print(my_instance.bananas)      # found on the instance
print(another_instance.apples)  # found on the class
print(another_instance.bananas) # not found on either
different value
123
hello
Traceback (most recent call last):
  File "path/to/instance_attributes.py", line 17, in <module>
    print(another_instance.bananas) # not found on either
AttributeError: 'MyClass' object has no attribute 'bananas'

Notice the output includes three printed values followed by an error. An AttributeError is raised because the another_instance.bananas attribute is not defined on the instance and is not defined on the class.

Default attributes

Classes can have attributes included in their definition. Anything we put inside the code block will be part of the class definition and will be available to all instances of the class.

Remember that naming things in a meaningful way helps to make our code easy to read and understand. Our MyClass example is a bit pointless. We need to define something meaningful.

We will create a class to represent a coordinate, i.e. a 2-dimensional (x, y) location on a plane. Our Coordinate class will be very simple. It will define two numerical values, x and y.

Classes can be defined with attributes like this:

class Coordinate:
    x = 0
    y = 0

Adding attributes like this into the class definition provides us with default values or constants that individual instances can optionally override. We can access the class-level attributes using dot notation as usual and they provide default values for our instances.

class Coordinate:
    x = 0
    y = 0

c1 = Coordinate()
c2 = Coordinate()

c1.x = 2
c2.y = -3

print(f"{c1.x=}, {c1.y=}")
print(f"{c2.x=}, {c2.y=}")
c1.x=2, c1.y=0
c2.x=0, c2.y=-3

Remember, when we try to access an attribute on an instance, if the attribute is not found, then the class will be checked.

The problem here is that every instance of our coordinate class is created with the same initial values for x and y. To avoid this issue, we need to define a custom constructor method.

Methods

A function defined within a class is known as a method. As we have seen, functions in python have a type and are just like any other data (though importantly, they are callable).

def my_function():
    pass

print(my_function)
<function my_function at 0x7f8e5a766680>

See defining custom functions for a reminder

Functions defined inside class definitions are known as methods. We can define a method within our Coordinate class.

class Coordinate:
    x = 0
    y = 0

    def my_method():
        pass

print(Coordinate.my_method)
<function Coordinate.my_method at 0x7f8e5a7660e0>

There is one main difference from normal functions when we define custom methods on a class.

When called from an instance, a method will automatically receive a copy of the instance as the first argument.

This is crucial, and is best explained with some examples.

What happens when we try to call Coordinate.my_method() from an instance?

class Coordinate:
    x = 0
    y = 0

    def my_method():
        pass

c1 = Coordinate()
c1.my_method()
TypeError: Coordinate.my_method() takes 0 positional arguments but 1 was given

This is unexpected, so this must be important. Pay attention.

The error is telling us that the method “takes 0 positional arguments”.

We can agree with this much, because we did not specify any arguments.

However, it goes on to say “but 1 was given”.

So apparently our method was passed one argument, even though we passed nothing when we called it.

c1.my_method()  # We passed no arguments in this line

This extra argument is a reference to the instance c1. It is automatically added when we call a method attribute from an instance using dot notation.

Remember this. Remember this. Remember this. Remember this.

Let’s update our code to have a look at the argument. We’ll make two instances to see the impact.

class Coordinate:
    x = 0
    y = 0

    def my_method(arg):
        print(arg)

c1 = Coordinate()
c2 = Coordinate()

c1.my_method()
c2.my_method()
<__main__.Coordinate object at 0x7fc3e3cd0310>
<__main__.Coordinate object at 0x7fc3e3cd02e0>

Notice the my_method() function now takes one argument (arg) and prints it.

The code no longer raises an error because we are handling the single argument that was passed automatically. The output is showing us that we are printing the two Coordinate instances. We can see that the two instances are different (because they have different locations in memory).

This is a (the?) core mechanism of classes and instances in python, all methods on a class automatically receive a copy of the instance as an additional first argument when called from an instance using dot notation (e.g. c1.my_method()).

By convention, python programmers always name this argument self to avoid confusion.

What this means is that every method defined on a class should add self as the first argument.

The above code is for demonstration purposes only. Never use a different variable name for this. Always use self.

So every method on an instance can use self to access and modify the instance attributes.

All subsequent arguments are treated as with normal functions.

class Coordinate:
    x = 0
    y = 0

    def my_method(self, *args, **kwargs):
        print(args, kwargs)


c1 = Coordinate()
c2 = Coordinate()

c1.my_method(1, 2, 3)
c2.my_method(hello='world')
(1, 2, 3) {}
() {'hello': 'world'}

The advantage of this is that methods all have access to instance attributes (including other methods). This allows for methods that modify our instance data.

Here’s an example of a method Coordinate.invert() that swaps the values of x and y.

class Coordinate:
    x = 0
    y = 0

    def invert(self):
        self.x, self.y = self.y, self.x


c1 = Coordinate()
c1.x = 10
c1.y = 5

print(c1.x, c1.y)
c1.invert()
print(c1.x, c1.y)
10 5
5 10

We can see how this would be useful. We can define classes with methods which describe how to manage their instance data.

Class constructors

Notice that above we are setting the instance attributes manually after we create the instance.

c1 = Coordinate()
c1.x = 10
c1.y = 5

This is very awkward and annoying. Taking three lines of code to create our simple object. This could be done with simple variables just as easily. It makes us wonder, what is the point of classes?

We want to define a custom constructor method to allow us to pass arguments when we create a new Coordinate instance.

c1 = Coordinate(10, 5)

This syntax would be much more convenient, capturing the two literal values in one line of code to create a single object containing both values as attributes. But it won’t work automatically.

Trying this produces a reasonable error.

TypeError: Coordinate() takes no arguments

Our constructor method will need to allocate the provided arguments to the x and y attributes. So far, there is no way for it to know how to do this automatically.

All classes define a constructor method called __init__, which does nothing by default. When we create an instance of a class, we are implicitly calling this __init__ method.

Our class can be rewritten with a custom __init__ method to allow the more convenient syntax. Our new constructor takes the automatic self argument plus two more arguments, x, and y.

We no longer need the class attributes x and y since all our instances will define them directly on self.

class Coordinate:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def invert(self):
        self.x, self.y = self.y, self.x

Inside the class definition we have added a new __init__ method. Python automatically recognises functions with this name as the constructor for the class.

When we create new instances of our Coordinate class, python automatically calls __init__ with a fresh instance as the first argument. We can now pass in arguments for x and y like this.

origin = Coordinate(0, 0)
c1 = Coordinate(3, 8)
c2 = Coordinate(-4, 2)

print(f'({origin.x}, {origin.y})')
print(f'({c1.x}, {c1.y})')
print(f'({c2.x}, {c2.y})')
(0, 0)
(3, 8)
(-4, 2)

We can see our instances have been initialised with the x and y values (e.g. 0 and 0 in the first example). These correspond to the x and y arguments of the __init__ method. Inside the body of the method, we are setting attributes self.x and self.y.

Changing how instances are displayed

When we call print() on Coordinate instances, we get output like this.

origin = Coordinate(0, 0)
print(origin)
<__main__.Coordinate object at 0x7fec1d6d4280>

The print() function automatically calls str() on anything is is asked to print.

The terminal can only display strings. This is also why input always returns a string.

We can see that passing any object into the str() constructor will generate a string representation of that object.

print([str(i) for i in (1.1, True, Coordinate(0, 0))])
['1.1', 'True', '<__main__.Coordinate object at 0x7fabd3cafc10>']

Wait… how does the str() function know how to convert my custom object into a string?

The str() constructor will call the __str__ method of any object which is passed to it.

Remember, everything is an object.

The object’s __str__ method must return a string representation of the object.

Think about this…

Anything can be converted to a string and anything can be passed into the print() method because all classes have a default __str__ method.

We can demonstrate this by simply calling the __str__ method on some objects, they all have one.

print([i.__str__() for i in (1.1, True, Coordinate(0, 0))])
['1.1', 'True', '<__main__.Coordinate object at 0x7fabd3cafc10>']

Note this is i.__str__(), not str(i). But they are equivalent because str(i) calls i.__str__().

This is really nice for us as developers. Our objects are responsible for telling the str() constructor what a string representation should be. So, we can define how our classes are converted into strings.

The default __str__ implementation is something like this:

def __str__(self):
    md = self.__class__.__module__
    cls = self.__class__.__name__
    return f"<{md}.{cls} object at {id(self):#x}>"

It provides enough information to work out what something is and where it is defined, but its not very convenient if we want to see the instance attributes of our Coordinate.

We can give our class a customised __str__ method to return a customised string.

class Coordinate:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def invert(self):
        self.x, self.y = self.y, self.x

    def __str__(self):
        return f'({self.x}, {self.y})'

for x in range(3):
    for y in range(3):
        print(Coordinate(x, y))
(0, 0)
(0, 1)
(0, 2)
(1, 0)
(1, 1)
(1, 2)
(2, 0)
(2, 1)
(2, 2)

Now anywhere we print a Coordinate instance, it will be appear as a pair of numbers in brackets.

The __repr__ method

There is a related method __repr__ which is very similar to __str__ but is intended to produce a more explicit string representation of an object that indicates the type as well as the values of instance attributes. Just as the built-in str() function calls __str__, there is a similar built-in repr() function which will call __repr__ and return the resulting string.

You can think of __str__ as returning a user-friendly representation whilst __repr__ returns a developer-friendly representation.

If we implement this then we can add some more context to the string.

class Coordinate:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def invert(self):
        self.x, self.y = self.y, self.x

    def __str__(self):
        return f'({self.x}, {self.y})'

    def __repr__(self):
        return f"Coordinate(x={self.x}, y={self.y})"


for x in range(3):
    for y in range(3):
        print(repr(Coordinate(x, y)))
Coordinate(x=0, y=0)
Coordinate(x=0, y=1)
Coordinate(x=0, y=2)
Coordinate(x=1, y=0)
Coordinate(x=1, y=1)
Coordinate(x=1, y=2)
Coordinate(x=2, y=0)
Coordinate(x=2, y=1)
Coordinate(x=2, y=2)

The output shows a more explicit representation of our object (which could otherwise be confused with a tuple).

We have followed the convention of representing our instances as the code that would create them. This works nicely for simple classes, though sometimes it can be useful to use a different approach.

This can also be generated within an f-string using the !r formatting.

for x in range(3):
    for y in range(3):
        print(f"{Coordinate(x, y)!r}")

The above code produces the same result

Operators

The exact same idea is used to implement operators.

Consider this simple expression.

a - b

Python actually interprets expressions like this as function calls.

a.__sub__(b)

In python, the above two examples are exactly equivalent.

Obviously, the first one is a neater way of doing it and is always preferred when we want to do subtraction. The key point here is that our Coordinate object can define how it interacts with operators. Currently, it cannot be part of a subtraction operation.

a = Coordinate(1, 2)
b = Coordinate(2, 1)

a - b
Traceback (most recent call last):
  File "path/to/coordinate.py", line 15, in <module>
    a - b
TypeError: unsupported operand type(s) for -: 'Coordinate' and 'Coordinate'

But, we can implement a Coordinate.__sub__(self, other) method to define how our instances interact. It will take the usual self argument (a in this case), plus a second argument other which will be the second operand in the expression (b in this case).

Again, examples should make it clear what is happening. In this case we can define a method that returns a new Coordinate instance with calculated x and y values.

class Coordinate:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def invert(self):
        self.x, self.y = self.y, self.x

    def __sub__(self, other):
        return Coordinate(self.x - other.x, self.y - other.y)

    def __str__(self):
        return f'Coordinate({self.x}, {self.y})'

    def __repr__(self):
        return f"Coordinate(x={self.x}, y={self.y})"

c1 = Coordinate(1, 2)
c2 = Coordinate(3, 4)
c3 = c1 - c2

print(f"{c1} - {c2} = {c3}")
(1, 2) - (3, 4) = (-2, -2)

We can implement addition, multiplication, division etc. in the same way using __mul__, __add__, __truediv__ and many other special methods which each correspond to a specific operator.

Add operator methods to the Coordinate class

Add the operator methods __add__, __mul__ and __truediv__.

Check they produce the following outputs.

a = Coordinate(2, 1)
b = Coordinate(10, 5)

print(a + b)
print(b - a)
print(a * b)
print(b / a)
(12, 6)
(8, 4)
(20, 5)
(5.0, 5.0)

Instances of our new type can be passed around and used in our programmes just like any value now.

Implement a Line class

We can now use our Coordinates class to build another level of complexity.

Create a Line class which takes two Coordinate instances into its constructor and saves them (e.g.c1, and c2).

Implement a Line.length() method to calculate the distance between the points. Implement a Line.slope() method to calculate the gradient between the points.

Check they produce the following outputs.

c1 = Coordinate(1, 3)
c2 = Coordinate(2, 4)
line = Line(c1, c2)

print(f"{line = }\n{line.gradient() = }\n{line.length() = }")
line = Line((1, 3) -> (2, 4))
line.gradient() = 1.0
line.length() = 1.4142135623730951

Hint: try using the math.hypot function.

Our Coordinate class returns new instances from methods such as __add__ and __sub__. If we wanted to develop a more efficient and fully mutable object, we need to return self in these methods after making a modification in order to yield the modified instance which (depending on the details) can be more efficient than generating a whole new instance.

We can also implement augmented addition (+=) and subtraction (-=) with __iadd__ and __isub__. These will also usually need to mutate the instance and return self.

Create a class for a shopping list

Create a ShoppingList class. Your class should have a list of items (strings) and a title as the main instance attributes. These should be passed in as arguments to the __init__ constructor.

You might add some other optional keyword arguments.

Use code from previous exercises to add a __str__ method that prints a formatted version of the list.

The list should have methods to allow it to be updated (i.e. __add__ and __sub__ methods and/or the __iadd__ and __isub__ methods).

Hint: since our list is mutable, it should return self at the end of these methods.

We should be able to use the class as follows.

shopping = ShoppingList('apples', 'bananas', 
                        title='fruit', pad=10)
shopping += 'cherries'
shopping -= 'apples'
print(shopping)
********************
*      fruit       *
********************
*     bananas      *
*     cherries     *
********************

You may find it simpler to take a list as the first argument

shopping = ShoppingList(['apples', 'bananas'], 
                        title='fruit', pad=10)

Though it is easy enough to collect all the positional arguments together and generate a list from them.

If you get stuck, ask your tutor or the module leader.