Storing data in files provides a simple persistence mechanism
Data can be stored in simple text files, e.g. shopping.txt.
apples
bananas
cherries
We can use comma-separated files, e.g. shopping.csv, for more structure.
name, quantity
apples, 5
bananas, 6
cherries, 25
JSON format is even more structured. Here’s an example shopping.json.
[{"name": "apples", "quantity": 5},
{"name": "bananas", "quantity": 6},
{"name": "cherries", "quantity": 25}]
Accessing files with open()
and close()
Write mode, 'w'
file = open('shopping.txt', 'w')
file.write("apples")
file.write("bananas")
file.write("cherries")
file.close()
Read mode, 'r'
file = open('shopping.txt', 'r')
print(file.read())
file.close()
applesbananascherries
Using with
, a context manager
A context manager is an object that defines the context of a with
statement.
The built-in function open()
is the most commonly used context manager in python.
It manages both opening and closing a file so the context of the code block includes an open file handle.
with open('shopping.txt', 'w') as file:
file.write("apples")
file.write("bananas")
file.write("cherries")
This code will always automatically close your files and should always be used when accessing files.
with open('shopping.txt', 'r') as file:
print(file.read())
applesbananascherries
What is this file object?
The open()
function returns different object types, depending on the mode.
for mode in ['r', 'w', 'rb', 'wb']:
with open('shopping.txt', mode) as f:
print(f"{mode}: {f}")
r: <_io.TextIOWrapper name='shopping.txt' mode='r' encoding='UTF-8'>
w: <_io.TextIOWrapper name='shopping.txt' mode='w' encoding='UTF-8'>
rb: <_io.BufferedReader name='shopping.txt'>
wb: <_io.BufferedWriter name='shopping.txt'>
Notice that text file objects are the default and that they default to utf-8
encoding,
In most cases the default read (r
) and write (w
) modes are all you will ever need.
However, if you are working with files with binary formats, then it is possible.
Markup languages such as HTML are all about human-readability
Creating line-oriented files
Notice that the result we got when we wrote data into our 'shopping.txt'
file was all in one line.
applesbananascherries
We can add literal newline characters into our files to write line-oriented data.
shopping = ['apples', 'bananas', 'cherries']
with open('shopping.txt', 'w') as file:
for item in shopping:
file.write(f"{item}\n")
This produces a result which is clearly easier to parse.
apples
bananas
cherries
Using print()
with files
The print()
function takes an optional file
argument.
shopping = ['apples', 'bananas', 'cherries']
with open('shopping.txt', 'w') as file:
for item in shopping:
print(item, file=file)
This produce an equivalent result.
apples
bananas
cherries
This is because print()
takes a end
argument which defaults to a newline character, \n
.
Parsing line-oriented data
With a line-oriented file, we can extract lines, one at a time using the readline()
method.
Each call will read data from the file up to and including the next '\n'
character.
If we print the output, then we end up adding an extra newline character.
with open('shopping.txt', 'r') as file:
apples = file.readline()
print(apples)
apples
<- notice the extra newline>
Calling readlines()
will generate a list of all the line-oriented data within a file, with the '\n'
characters intact.
with open('shopping.txt', 'r') as file:
data = file.readlines()
print(data)
['apples\n', 'bananas\n', 'cherries\n']
Files are iterable
Files are iterable, so we can convert them to a list by passing a file object into the list
constructor.
with open('shopping.txt', 'r') as file:
shopping = list(file)
print(shopping)
['apples\n', 'bananas\n', 'cherries\n']
Looping is also possible. Which will keep the memory footprint lower.
with open('shopping.txt', 'r') as file:
for line in file:
print(line, end="")
apples
bananas
cherries
Again, in both cases the newline characters are kept.
Parsing data in other ways
To remove the newline characters we can call str.split('\n')
on the entire file contents.
with open('shopping.txt', 'r') as file:
shopping = file.read().split('\n'):
print(shopping)
['apples', 'bananas', 'cherries']
However, if our file is empty this will give a single item list containing one empty string.
This is why the string method splitlines()
exists.
with open('shopping.txt', 'r') as file:
shopping = file.read().splitlines()
print(shopping)
['apples', 'bananas', 'cherries']
Using pathlib
Many complex issues arise when managing files and folders for a cross-platform application.
An excellent tool in the python standard library is the pathlib
module.
The core pathlib
tool is the Path
class which provides an object-oriented interface representing locations in the filesystem.
from pathlib import Path
my_path = Path('folder1', 'folder2', 'filename.txt')
print(repr(my_path))
print(repr(my_path.absolute()))
PosixPath('folder1/folder2/filename.txt')
PosixPath('/home/graeme/Teaching/GAMR1520/folder1/folder2/filename.txt')
Notice we are using repr()
above to print the string representation of the path object.
Note that the result will be platform dependent and the file may or may not exist at this point.
mkdir()
and touch()
We can see that the file doesn’t exist.
my_path = Path('folder1', 'folder2', 'filename.txt')
print(my_path.exists())
False
We can create the containing folders and the file.
my_path.mkdir(parents=True)
my_path.touch()
print(my_path.exists)
True
iter_dir()
, is_dir()
and is_file()
We can iterate over the file system.
from pathlib import Path
here = Path('.')
for p in here.iterdir():
if p.is_dir():
print(str(p).ljust(20), "(folder)", sep="\t")
if p.is_file():
print(str(p).ljust(20), "(file)", sep="\t")
shopping.txt (file)
list.txt (file)
string.txt (file)
folder1 (folder)
my_experiments (folder)
script.py (file)
Joining paths
The Path.joinpath()
method can be used to create new Path
objects.
root = Path('folder')
my_path = root.joinpath('filename.txt')
Alternatively, we can use the slash operator (/
) to join paths.
This is equivalent to the above.
root = Path('folder')
my_path = root / 'filename.txt'
As we shall see, this makes manipulating files and generating directory structures very simple.
Path.open()
Conveniently, the Path
object includes an open
method which calls the built-in open
function for us.
So we can create and populate multiple files with a simplified loop.
root = Path('folder')
root.mkdir()
all_lists = {
'fruit': ['apples', 'bananas', 'cherries'],
'colours': ['aquamarine', 'blue', 'cyan'],
'animals': ['armadillo', 'baboon', 'cat']
}
for title, my_list in all_lists.items():
my_path = root / f'{title}.txt'
with my_path.open('w') as my_file:
for item in my_list:
print(item, file=my_file)
What if I have more complex data?
animals.py
animals = [{'name': 'Anteater', 'description': 'Eats ants'},
{'name': 'Bear', 'description': 'Grizzly'},
{'name': 'Chimp', 'description': 'Chump'},
{'name': 'Dog', 'description': 'Friend'}]
animals.csv
name,description
Anteater,Eats ants
Bear,Grizzly
Chimp,Chump
Dog,Friend
animals.json
[{"name": "Anteater", "description": "Eats ants"}, {"name": "Bear", "description": "Grizzly"}, {"name": "Chimp", "description": "Chump"}, {"name": "Dog", "description": "Friend"}]
Write as CSV
The csv
module takes some getting used to, but is really very simple.
from pathlib import Path
from csv import DictWriter
animals = [{'name': 'Anteater', 'description': 'Eats ants'},
{'name': 'Bear', 'description': 'Grizzly'},
{'name': 'Chimp', 'description': 'Chump'},
{'name': 'Dog', 'description': 'Friend'}]
path = Path('animals.csv')
with path.open('w') as file:
writer = DictWriter(file, fieldnames=['name', 'description'])
writer.writeheader()
writer.writerows(animals)
Write as JSON
import json
from pathlib import Path
import json
animals = [{'name': 'Anteater', 'description': 'Eats ants'},
{'name': 'Bear', 'description': 'Grizzly'},
{'name': 'Chimp', 'description': 'Chump'},
{'name': 'Dog', 'description': 'Friend'}]
path = Path('animals.json')
with path.open('w') as file:
json.dump(animals, file, indent=2)
JSON to CSV
Say you need to read from a JSON file and write into a CSV file.
This requires json.load()
and a csv.DictWriter
object.
One with
clause can open both files.
from pathlib import Path
from csv import DictWriter
import json
inpath = Path('animals.json')
outpath = Path('animals_from_json.csv')
with inpath.open('r') as infile, outpath.open('w') as outfile:
animals = json.load(infile)
writer = DictWriter(outfile, fieldnames=['name', 'description'])
writer.writeheader()
writer.writerows(animals)
We need to write the CSV header row before we write all the data.
CSV to JSON
Converting in the other direction requires a csv.DictReader
object and json.dump()
.
Notice the
csv.DictReader
object will use the header row to determine the field names. Also, because it’s iterable, we can just pass it intolist()
to load all the data.
from pathlib import Path
from csv import DictReader
import json
inpath = Path('animals.csv')
outpath = Path('animals_from_csv.json')
with inpath.open('r') as infile, outpath.open('w') as outfile:
animals = list(DictReader(infile))
json.dump(animals, outfile, indent=2)