Reading and Writing to Files
The process of reading and writing data to a file is a complex task. Python, like most programming languages, allows for loading data from external sets and save data to files. The reading process can be carried out using the built-in functions open(), read() and close() or - in the case of data with a fixed structure - usually tabular, you can use ready-made functions supporting this process. First, we will analyze the process of reading data using built-in functions and then we will show an example of external solutions.
To read a text file, you need to execute three commands:
open() to connect to the file (nothing is loaded)
read() to read the entire contents of the file into one variable as text
close() to close the file after reading is complete
In addition, the reading or writing process requires finding the correct file, or - in the case of reading - creating a new file.
As an example, we'll load a small text file from the same directory as our script.
# -*- coding: utf -8 -*-
f = open('file.txt')
contents = f.read()
f.close()
contents
'This is the content of the file
Line 2
Line 3
There is an empty line above.
'
The open() function connects to the file but does not load the data. It only assigns a file pointer to the object f. This is due to the fact that the data set can be very large and the programmer should keep control over its loading. The read() function reads all the content from source f(file) into the variable content. Then the source is closed.
The '
' character within the variable means the end of the line, to break such a line into separate ones, use the split() function, indicating the character used for breaking.
contents.split('
')
#Out: ['This is the content of the file',
'Line 2',
'Line 3',
'',
'There is an empty line above.'
'']
For large files, it is better to use a line-by-line reading procedure. Reading line by line is also an operation used when we want to read specific lines from a file:
f = open('file.txt')
f.readline()
f.readline()
f.readline()
f.close()
#Out: 'This is the content of the file
'
Line 2
Line 3
This way we can control which lines are saved for further processing.
The open() function by default opens a text file for reading, thus calling open() is equivalent to open(file, 'rt'). Files can also be opened in other modes, such as binary mode ('b') or write mode ('w'), write and append mode ('a'), or create new file mode ('x').
Reading and Processing Tabular Files
Since Python loads data in the form of a single text variable, if we intend to calculate the loaded data, it should be appropriately transformed, usually in tabular form, and the data types should be modified from text to numeric. The already known split() and variable conversion functions are used for this purpose. The procedure for reading data is as follows:
1. In the first step, a connection to the file is established (the content is displayed).
2. The data is divided into a list of text strings, relative to the '
' character, and then each line is divided into internal lists based on a comma (separator in csv). The result is displayed as a nested list.
3. The connection is closed.
4. 1 line (header) is removed from the list.
5. The last empty line is removed from the list (usually found in text files).
f = open('tab.csv') #1
data = f.read()
data
data = data.split('
') #divide by lines
data = [l.split(',') for l in dane] #2
data
f.close() #3
header = data.pop(0) #4
data.pop() #5
Out: 'Street, House_no, code, city
Grasshopers, 20, W1S L20, Grassfields
Ladybirds, 15, Z81 S22, Sunshines
Beetles, 10, UL1 SW2, Beetlejuices
'
Out: [['Street', 'House_no', 'code', 'city'],
['Grasshopers', ' 20', ' W1S L20', ' Grassfields'],
['Ladybirds', ' 15', ' Z81 S22', ' Sunshines'],
['Beetles', ' 10', ' UL1 SW2', ' Beetlejuices'],
['']]
Out: ['']
Data is stored as a list of lists, where each nested list is a single row of different types of data. Since it is not convenient to operate on rows containing different types of data, the nested lists can be transrormed into a columnar form (transposed) using the zip() function. The data is passed with *, i.e. we expand the passed list to the following form: data[0], data[1], ..., data[n-1].
data = zip(*data)
trans = list(data)
trans
Out: [('Grasshopers', 'Ladybirds', 'Beetles'),
(' 20', ' 15', ' 10'),
(' W1S L20', ' Z81 S22', ' UL1 SW2'),
(' Grassfields', ' Sunshines', ' Beetlejuices')]
in the last step the second and third lines can be casted into integer type. Here, the mapping function comes in handy:
numbers = tuple(map(int, trans[1]))
numbers
Out: (20, 15, 10)
Using the csv reader Function