Computer files, or resources for recording discrete data, are usually ubiquitous in python. File handling in python treats files as either textual or binary files and there is no limit to the size of files python can work with. In this post, we will be discussing textual files while in subsequent posts we will discuss binary files and how python handles them. Seven basic functions for handling textual files are discussed.
The Built-in Python Open File function
The built-in python open file function is the first function you will encounter when you want to open any sort of file in python. It is used to open a file and it returns a file object. The syntax for the python open file function is open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
but for working on textual files, we will focus on the parameters file and mode.
The file parameter to the open file function represents the pathname of the file absolute or relative to the current working directory. The pathname depends on the file system of the operating system. If the file is not found on the call to open file, the function returns a FileNotFoundError exception.
The mode parameter specifies the mode in while the file is to be opened. The default mode is ‘r’ which means open for reading text. Other values are ‘w’ meaning open for truncating and writing to the file, and ‘a’ meaning open for appending to the end of the file. Other modes are ‘b’ meaning open binary file and ‘+’ meaning open for updating (reading and writing). If you want to write to the file without truncating it, then use the combination mode, ‘r+’, which just writes to it at the beginning of the file. If you want to write starting from the end, then use ‘a’.
Now, let’s use some examples.
I will be working with the following text file, eba.txt, that is in my working directory. The contents of the file are:
Nothing beats a plate of eba as we generally want to eat it but it often gets stuck in our throats where it is eaten without a good soup that is oily and makes the eba which is a gelly and very hard to move smoothly down our throats
You can also download the file here or copy and paste it if you want to use it so you can get the same results as I did.
Now, let’s just open the file without doing any reading or writing. Those functions will come later. After opening the file, we will then close it. It is good practice to always close your files.
try:
fobj = open('eba.txt', 'r')
except FileNotFoundError:
print('File doesn\'t exist.')
else:
print('File opened successfully and file object created.')
finally:
fobj.close()
A more pythonic way to do the above, that is, open the file resource and then close it automatically would be writing the following line of code:
with open('eba.txt', 'r') as fobj:
print('File opened successfully.')
If the file opened successfully, the print will run but if not, you will get a stack trace about a FileNotFound Error.
So you now know how to open textual files. What remains is to do something with them while they are open. The remaining functions will deliberate on that.
The Python Read File Function
The syntax for the python read file function is read(size=-1)
where size specifies the number of bytes to read from the file. The default is -1 which means read all the contents of the file as string of characters (we are dealing with textual files here) and return all the contents of the file. If size is specified, then it returns the number of size of the characters from the file. If size is not specified but empty, the python read file function returns all the characters from the file. Be careful when using this feature because if the file is very large, it could interfere with your system memory.
So, for some examples. Remembering we are using the eba.txt file which contents I posted above.
Suppose we want to read only 20 bytes from the file. We will use this code:
with open('eba.txt', 'r') as fobj:
s = fobj.read(20)
print(s)
Our output would be:
Nothing beats a plat
Just the first 20 bytes in the file. Later, I will show you how to read from any position with random access to the file.
The Python Readline function
The syntax for the python readline function is readline(size=-1)
and unlike read function, it reads and returns one line from the stream. It starts with the first line. You can customize it by specifying size and then the number of bytes in size will be read. The end of line is usually determined by the newline character of the python open file function. The default is to resort to the system defined newline character. For most implementations, the default newline is okay.
Now, to use some examples to illustrate. Imagine we had this code to just read the first line from the text file given.
with open('eba.txt', 'r') as fobj:
s = fobj.readline()
print(s)
The output we will get on the terminal is:
Nothing beats a plate of eba
This is the first line of the eba.txt file.
You will notice when you run the above that a new line is printed for each line. You could remove that new line which was created when ‘\n’ was encountered by calling the strip function on the string object returned by the python readline function. Compare the output for the code below using strip function and that for the code above without the strip function on your machine.
with open('eba.txt', 'r') as fobj:
s = fobj.readline()
print(s.strip())
Using the strip function on the string now strips away the added newline and gives a more beautiful rendering.
The Python Readlines Method
The python readlines method is in plural because it reads multiple lines. The syntax for readlines is readlines(hint=-1)
which states that the readlines function reads and returns a list of lines from the stream. The hint parameter is to tell the python readlines function how many lines to read if you want to customize it but the default is to read all the lines and return them as a list. Please, use this function carefully. In fact, if your file is very large, it could have detrimental effect on your system memory. This is because to return the lines, it first needs to create a list of all the lines and this takes memory space.
An example to show how the readlines method works.
with open('eba.txt', 'r') as fobj:
s_list = fobj.readlines()
print(s_list)
Which gives the following list as output:
['Nothing beats a plate of eba\n',
'as we generally want to eat it\n',
'but it often gets stuck in our throats\n',
'where it is eaten without a good soup\n',
'that is oily and makes the eba\n',
'which is a gelly and very hard \n',
'to move smoothly down our throats']
It is recommended that you avoid using readlines because there are other ways to go about reading all the lines from your files without impacting on memory. One of them is to use a python for loop to iterate through the file object. This is because a python file object is already an iterable.
The above could be achieved with the following for loop code:
with open('eba.txt', 'r') as fobj:
for line in fobj:
print(line)
We have been reading and reading from files. Now, we want to write to files. We will now use the python write to file method.
The Python Write to File Method
The syntax for the python write to file method is write(s)
which specifies writing the string, s, to the file and returning the number of bytes written.
The ability to write to the stream or file depends on whether it supports writing. To make this possible, we need to specify this support when opening the file and creating a file object. This is made possible by specifying the writable mode on the open file function (the open file function was explained above). The writable modes are:
r+ | Update the file i.e read and write to the file. When the write function is called, it writes the specified string , s, to the beginning of the file. |
w | It truncates the file first and then writes the string s to the file. You lose all your former file contents with this mode. |
a | Append the string, s, to the end of the file. It writes onto the last line. If you want it to write to a new line at the end, you need to add a newline character at the beginning of the string, s. |
Now, compare the following codes on your machine and see how they run:
with open('eba.txt', 'a') as fobj:
s = fobj.write('This line was written.')
with
with open('eba.txt', 'w') as fobj:
s = fobj.write('This line was written.')
and with:
with open('eba.txt', 'r+') as fobj:
s = fobj.write('This line was written.')
You will notice that the way contents of the file, eba.txt, was written to differs based on the specified mode of the open function. The python write to file method is one of the methods you will most often use when working with files.
The Python seek function
With this method, you can change the current stream position so that when you call the python read file or python write to file methods, it doesn’t carry out those operations from the start of the file which is the default. The syntax for the python seek method is seek(offset, whence=SEEK_SET)
where offset is the position you want the stream to go to. Seek method returns the current position of the stream.
For example, if you want to read the eba.txt file from the 35th byte or character in the file and then output the next 55 characters or bytes, you could change the current stream position using seek to be 35 and then do a read with size 55. Here will be the code:
with open('eba.txt', 'r+') as fobj:
num = fobj.seek(35)
s = fobj.read(55)
print(s)
The output you would get from the eba.txt file is:
generally want to eat it but it often gets stuck in ou
Showing just those 55 bytes of characters.
The last method we will consider is truncate.
The Python truncate file method
With the python truncate file method, you are able to change the size of the file. The syntax for the truncate file method is truncate(size=None)
where size is the new size of the file. Where size is not specified, the file is truncated from the beginning of the file to the position of the stream. If size is lesser than original file size, the file is truncated but if higher than original, the size is extended and the unfilled areas are filled with zeros. For the python truncate file method to be operational, the file must support updating or writing, which you have to do by making the file open in writable mode as described above.
The truncate method acts like the write method.
So, I have given you ideas on what you can do with your files and file objects. The next post will be on how to handle python directories. Please, watch out for it. And subscribe to this blog so you can get regular updates when I post new articles.
Happy pythoning.