5 Python Directory Handling Techniques

Directories and files are crucial to a programmer who wants a resource for his programs. That is why it is necessary after discussing python’s file handling methods, one should also undertake an understanding of python’s directory handling methods or routines. In this post, I will describe 5 routine ways one can handle directories using the methods provided in python such as the python make directory method and get working directory methods.

To be able to run any of the commands in this post, you first need to import the os module into your interpreter. To do that you use the code: import os.

Getting the python current directory

The current directory is the directory from which the python interpreter is operating. It depends on how you launch your interpreter or your editor. To know your current working directory is easy. You just need to call the python get current working directory method, os.getcwd(). Here is an example:


import os

working_dir = os.getcwd()
print(working_dir)

The code above will make the current working directory to be printed on your terminal.

You might desire to change your current working directory. Maybe you want to do some experiment on some programs and want to run them on a directory you intend to delete later; I do that all the time. Changing the current working directory is easy with python. You use the python change working directory method, whose syntax is: os.chdir(path). It states that you have to provide a path as an argument for the directory you want to switch to. Path should be based on the path specification of your operating system. It is wise to make path a string in all cases.

An example will suffice.


import os

os.chdir('C:\\Users\\Michael\\Desktop\\')
print(os.getcwd())

Notice above that I double escaped the backslash character. This is because the backslash is a special character. When I run the above code, it changed my working directory to ‘C:\Users\Michael\Desktop\’. Also, I am working on a windows 10 computer in case you are using Unix, Linux or Mac.

Creating New Directories with Python

There are occasions you want to create new directories, or what some call make new directories in python. Python can do this very easily when you use the right methods. There are two methods provided in python: the python make directory method, os.mkdir, and the python make directory recursive method, os.makedirs, which acts recursively by creating more than one directory as long as the directories do not already exist.

The syntax of the os.mkdir method is: os.mkdir(path, mode=0o777, *, dir_fd=None) where path is the name of the directory you want to create. You can leave the other keyword defaults as is because on some systems the mode parameter is just ignored and directory file descriptors, dir_fd, are not implemented.

Supposing we want to create a directory called, new_dir, we could try the following:


import os

try:
    os.mkdir('new_dir')
except FileExistsError:
    print('Directory already exists.')
else:
    print('Directory created successfully.')

I used a try statement to make sure that the directory doesn’t exist before creating it. This is because a FileExistsError is raised if the directory already exists. That gives peace of mind.

The os.makedirs method is also used to create directories but it does this recursively. That means, you can use it to create successive directories. The syntax of the method is: os.makedirs(name, mode=0o777, exist_ok=False). Path is the name of the directory you want to create. It has a different argument though from the python make directory method that is worth mentioning. It has an exist_ok keyword argument which you can set to True if you want to create subdirectories of an already existing directory. Let’s use an example:


import os

try:
    os.makedirs('new_dir\\second_dir\\third_dir', exist_ok=True)
except FileExistsError:
    print('Directory already exists.')
else:
    print('Directory created successfully.')

If you run the above on your machine, it creates the directories second_dir and third_dir (remember new_dir has already been created) and prints: ‘Directory created successfully.’ This is because I set the exist_ok argument to True i.e it should create subdirectories even where a directory already exists. The exist_ok argument comes convenient.

How to remove a directory in python

The methods under this category come in handy when you no longer need a directory. You can programmatically remove directories using python with the python remove directory method, os.rmdir, and the python remove directory recursive method, os.removedirs. The latter removes directories recursively. I didn’t tell you in the earlier post on file handling, but you can also remove files if you want to using the python remove file method, os.remove. I will describe all three here.

To remove a single directory, you use the python remove directory, os.rmdir, method. The syntax of the method is: os.rmdir(path, *, dir_fd=None). Path is the name of the directory you want to remove. With this method, you cannot remove directory trees or directories that are not empty otherwise it will raise OSError exception. If the directory does not exist, it will raise a FileNotFoundError.

When I wanted to remove the new_dir created earlier with child directories like this:


import os

try:
    os.rmdir('new_dir')
except OSError:
    print('Directory not empty.')
else:
    print('Directory successfully removed.')

It printed out: ‘Directory not empty.’ That means I cannot remove a directory with child directories using this method. Not to worry, the second method, the python remove directory recursive method can do that: os.removedirs

The syntax of the os.removedirs method is: os.removedirs(name) where name is the name of the directory you want to remove.

In the example below, I wanted to remove all the directories and sub-directories we created when making directories.


import os

try:
    os.removedirs('new_dir\\second_dir\\third_dir')
except OSError:
    print('Directory not empty.')
else:
    print('Directory successfully removed.')

It ran successfully and printed: ‘Directory successfully removed.’ To ensure it doesn’t raise an OSError exception, you should make sure that the leaf directory, third_dir, is empty i.e it doesn’t contain any files.

Now, let’s show the bonus method on how to remove a file.

The method for removing files is the python remove file, os.remove, method. The syntax is: os.remove(path, *, dir_fd=None). Path is the name of the file. If the file is already in use, the method raises an error. Note that the file name, path, should be relative to the current working directory.

In this example here, I want to remove a file that was used when we discussed the file handling methods in an earlier post:


import os

os.remove('eba.txt')
if os.path.isfile('eba.txt'):
    print('File not removed.')
else:
    print('File removed.')

It ran successfully and printed: ‘File removed.’

How to rename a directory

We can programmatically rename a file or directory in python. There are methods for both single file or directory, or multiple files or directories. The python rename method, os.rename, works for single file or directory, while the python rename recursive, os.renames, method works recursively.

The syntax for the os.rename method is: os.rename(src, dst, *, src_dir_fd=None, dst_dir_fd=None) where src means the source file or directory, and dst means the new name you intend to give the source. The dst or new name should not already exist otherwise the operation will raise an OSError exception or that of one of its subclasses depending on the operating system used.

Here is an example of usage:


import os

try:
    os.mkdir('new_dir')
    print('Directory created successfully.')
    print('Now attempting to rename it.')
    os.rename('new_dir', 'old_lady')
except FileExistsError:
    print('Directory already exists.')
except OSError:
    print('Couldn\'t rename the directory.')
else:
    print('new_dir changed successfully to old_lady.')

In the example above, I first created a new directory, new_dir, and when it went successfully without raising an error, I then attempted to rename it from new_dir to old_lady. If old_lady already exists, it will raise an OSError exception which I would handle by printing out: ‘Couldn’t rename the directory’ but if it doesn’t exist already, the renaming would run successfully, (which happened) and then print out: ‘new_dir changed successfully to old_lady.’

Now we can do this recursively. What if we create a directory tree with an empty leaf directory. We would have to use the python rename recursive, os.renames, method.

The syntax of the os.renames method is: os.renames(old, new) where old refers to the old name of the directory or directories and new refers to their new names.

Let’s take an example from above again. This time, we want to rename all the directories and sub-directories.


import os

try:
    os.makedirs('new_dir\\second_dir\\third_dir', exist_ok=True)
    print('Directories created successfully.')
    print('Attempting renaming of the directories created.')
    os.renames('new_dir\\second_dir\\third_dir', 'my_first\\my_second\\my_third')
except FileExistsError:
    print('Directories already exists.')
except OSError:
    print('Couldn\'t rename the directories.')
else:
    print('Renamed all three directories recursively.')

From the above, you could see that I first created a directory tree, new_dir\second_dir\third_dir, and when it was created successfully, I tried an attempt at renaming all the directories recursively using a second try statement. If you do not have the necessary permissions to rename the directory, then the operation will fail. But if the permissions are available and the directories exist as stated in the names for the old directory, then they will be renamed and the code will print: ‘Renamed all three directories recursively.’

You can be creative and try out your own examples to see how it will run.

How to list all the files and nested directories of a directory

I am using windows, so Linux or Unix users pardon me if my example is Windows based. If on windows you want to list the contents of a directory, you use the command ‘dir’ on the command line and it gives you a listing. You can do the same with python. Python has two methods for doing so: a python list directory, os.listdir, method and an optimized python scan directory, os.scandir, method.

It is recommended that you use the python scan directory, scandir, method for most cases, but let me show a working example of the python list directory method. The syntax of the python list directory method is: os.listdir(path='.') where path is the name of the directory whose contents you want to list. The path parameter is optional and where omitted, it defaults to the current working directory. The method returns a list of all the files and directories that are contained in the directory named path.

Here is an example:


import os

dir_list = os.listdir()
for file in dir_list:
    if os.path.isfile(file):
        print(f'{file} is a file.')
    else:
        print(f'{file} is a directory.')

The above code first returns a list of all the files and directories in the current working directory as dir_list. Then I iterate through the list in a for loop and print out whether an item is a file or a directory. This gives you a listing that is similar to the windows ‘dir’ command line .

Now, for the optimized python scan directory, scandir, method. The syntax of the optimized scan directory method is os.scandir(path='.') where path is the name of the directory. Scandir returns an iterator which yields objects that correspond to the files or nested directories in the path name. You can return the object name, whether they are files or directories, from the objects yielded. (If you want a refresher on iterators or on python generators that yields objects then click on the corresponding links.). Having objects that have file types and attributes increases code performance provided the operating system can provide this information.

Since the iterator produced by the python scan directory method is a resource, you need to close it or garbage collect it by calling the close method, scandir.close(), but you could do this better by using a with statement.

In the example below, we will list the contents of the current working directory again, but this time showing how to do it with the python scan directory method working as a generator.


import os

with os.scandir() as my_dir:
    for item in my_dir:
        if item.is_dir(follow_symlinks=False):
            print(f'{item.name} is a directory.')
        else:
            print(f'{item.name} is a file.')

I used the with statement so that python will automatically close the iterator immediately the operation ends. You will notice that each object, item, yielded also has attributes of their own. In this example, item object has name attribute in item.name, and also the is_dir method in item.is_dir. This is because the objects are os.DirEntry objects. In the item.is_dir method, in order not to follow symbolic links and list a directory having files as a file, I switched the follow_symlinks parameter to False. This makes it possible to accurately get all directory listings.

Now you have been equipped to use python’s directory handling functions. Go experiment with what you can do with them.

Happy pythoning.

7 Important File Handling Functions In Python

Computer files, or resources for recording discrete data, are usually ubiquitous in python. File handling in python treats files as either textual or binary files and there is no limit to the size of files python can work with. In this post, we will be discussing textual files while in subsequent posts we will discuss binary files and how python handles them. Seven basic functions for handling textual files are discussed.

The Built-in Python Open File function

The built-in python open file function is the first function you will encounter when you want to open any sort of file in python. It is used to open a file and it returns a file object. The syntax for the python open file function is open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None) but for working on textual files, we will focus on the parameters file and mode.

The file parameter to the open file function represents the pathname of the file absolute or relative to the current working directory. The pathname depends on the file system of the operating system. If the file is not found on the call to open file, the function returns a FileNotFoundError exception.

The mode parameter specifies the mode in while the file is to be opened. The default mode is ‘r’ which means open for reading text. Other values are ‘w’ meaning open for truncating and writing to the file, and ‘a’ meaning open for appending to the end of the file. Other modes are ‘b’ meaning open binary file and ‘+’ meaning open for updating (reading and writing). If you want to write to the file without truncating it, then use the combination mode, ‘r+’, which just writes to it at the beginning of the file. If you want to write starting from the end, then use ‘a’.

Now, let’s use some examples.

I will be working with the following text file, eba.txt, that is in my working directory. The contents of the file are:

Nothing beats a plate of eba
as we generally want to eat it
but it often gets stuck in our throats
where it is eaten without a good soup
that is oily and makes the eba
which is a gelly and very hard 
to move smoothly down our throats

You can also download the file here or copy and paste it if you want to use it so you can get the same results as I did.

Now, let’s just open the file without doing any reading or writing. Those functions will come later. After opening the file, we will then close it. It is good practice to always close your files.


try:
    fobj = open('eba.txt', 'r')
except FileNotFoundError:
    print('File doesn\'t exist.')
else:
    print('File opened successfully and file object created.')
finally:
    fobj.close()

A more pythonic way to do the above, that is, open the file resource and then close it automatically would be writing the following line of code:


with open('eba.txt', 'r') as fobj:
    print('File opened successfully.')

If the file opened successfully, the print will run but if not, you will get a stack trace about a FileNotFound Error.

So you now know how to open textual files. What remains is to do something with them while they are open. The remaining functions will deliberate on that.

The Python Read File Function

The syntax for the python read file function is read(size=-1) where size specifies the number of bytes to read from the file. The default is -1 which means read all the contents of the file as string of characters (we are dealing with textual files here) and return all the contents of the file. If size is specified, then it returns the number of size of the characters from the file. If size is not specified but empty, the python read file function returns all the characters from the file. Be careful when using this feature because if the file is very large, it could interfere with your system memory.

So, for some examples. Remembering we are using the eba.txt file which contents I posted above.

Suppose we want to read only 20 bytes from the file. We will use this code:


with open('eba.txt', 'r') as fobj:
    s = fobj.read(20)
    print(s)

Our output would be:

Nothing beats a plat

Just the first 20 bytes in the file. Later, I will show you how to read from any position with random access to the file.

The Python Readline function

The syntax for the python readline function is readline(size=-1) and unlike read function, it reads and returns one line from the stream. It starts with the first line. You can customize it by specifying size and then the number of bytes in size will be read. The end of line is usually determined by the newline character of the python open file function. The default is to resort to the system defined newline character. For most implementations, the default newline is okay.

Now, to use some examples to illustrate. Imagine we had this code to just read the first line from the text file given.


with open('eba.txt', 'r') as fobj:
    s = fobj.readline()
    print(s)

The output we will get on the terminal is:

Nothing beats a plate of eba

This is the first line of the eba.txt file.

You will notice when you run the above that a new line is printed for each line. You could remove that new line which was created when ‘\n’ was encountered by calling the strip function on the string object returned by the python readline function. Compare the output for the code below using strip function and that for the code above without the strip function on your machine.


with open('eba.txt', 'r') as fobj:
    s = fobj.readline()
    print(s.strip())

Using the strip function on the string now strips away the added newline and gives a more beautiful rendering.

The Python Readlines Method

The python readlines method is in plural because it reads multiple lines. The syntax for readlines is readlines(hint=-1) which states that the readlines function reads and returns a list of lines from the stream. The hint parameter is to tell the python readlines function how many lines to read if you want to customize it but the default is to read all the lines and return them as a list. Please, use this function carefully. In fact, if your file is very large, it could have detrimental effect on your system memory. This is because to return the lines, it first needs to create a list of all the lines and this takes memory space.

An example to show how the readlines method works.


with open('eba.txt', 'r') as fobj:
    s_list = fobj.readlines()
    print(s_list)

Which gives the following list as output:


['Nothing beats a plate of eba\n', 
'as we generally want to eat it\n', 
'but it often gets stuck in our throats\n', 
'where it is eaten without a good soup\n', 
'that is oily and makes the eba\n', 
'which is a gelly and very hard \n', 
'to move smoothly down our throats']

It is recommended that you avoid using readlines because there are other ways to go about reading all the lines from your files without impacting on memory. One of them is to use a python for loop to iterate through the file object. This is because a python file object is already an iterable.

The above could be achieved with the following for loop code:


with open('eba.txt', 'r') as fobj:
    for line in fobj:
        print(line)

We have been reading and reading from files. Now, we want to write to files. We will now use the python write to file method.

The Python Write to File Method

The syntax for the python write to file method is write(s) which specifies writing the string, s, to the file and returning the number of bytes written.

The ability to write to the stream or file depends on whether it supports writing. To make this possible, we need to specify this support when opening the file and creating a file object. This is made possible by specifying the writable mode on the open file function (the open file function was explained above). The writable modes are:

r+	Update the file i.e read and write to the file. When the write function is called, it writes the specified string , s, to the beginning of the file.
w	It truncates the file first and then writes the string s to the file. You lose all your former file contents with this mode.
a	Append the string, s, to the end of the file. It writes onto the last line. If you want it to write to a new line at the end, you need to add a newline character at the beginning of the string, s.

Now, compare the following codes on your machine and see how they run:


with open('eba.txt', 'a') as fobj:
    s = fobj.write('This line was written.')

with


with open('eba.txt', 'w') as fobj:
    s = fobj.write('This line was written.')

and with:


with open('eba.txt', 'r+') as fobj:
    s = fobj.write('This line was written.')

You will notice that the way contents of the file, eba.txt, was written to differs based on the specified mode of the open function. The python write to file method is one of the methods you will most often use when working with files.

The Python seek function

With this method, you can change the current stream position so that when you call the python read file or python write to file methods, it doesn’t carry out those operations from the start of the file which is the default. The syntax for the python seek method is seek(offset, whence=SEEK_SET) where offset is the position you want the stream to go to. Seek method returns the current position of the stream.

For example, if you want to read the eba.txt file from the 35th byte or character in the file and then output the next 55 characters or bytes, you could change the current stream position using seek to be 35 and then do a read with size 55. Here will be the code:


with open('eba.txt', 'r+') as fobj:
    num = fobj.seek(35)
    s = fobj.read(55)
    print(s)

The output you would get from the eba.txt file is:

generally want to eat it
but it often gets stuck in ou

Showing just those 55 bytes of characters.

The last method we will consider is truncate.

The Python truncate file method

With the python truncate file method, you are able to change the size of the file. The syntax for the truncate file method is truncate(size=None) where size is the new size of the file. Where size is not specified, the file is truncated from the beginning of the file to the position of the stream. If size is lesser than original file size, the file is truncated but if higher than original, the size is extended and the unfilled areas are filled with zeros. For the python truncate file method to be operational, the file must support updating or writing, which you have to do by making the file open in writable mode as described above.

The truncate method acts like the write method.

So, I have given you ideas on what you can do with your files and file objects. The next post will be on how to handle python directories. Please, watch out for it. And subscribe to this blog so you can get regular updates when I post new articles.

Happy pythoning.

Utilizing Python reduce and accumulate functions as accumulators

Accumulators have a notable reputation in computing history. The earliest machines by Gottfried Leibniz and Blaise Pascal were based on the concept of accumulators. If you are familiar with your python functions, you would know that the python sum function acts as an accumulator when it comes to addition. But I would like to explain two functions in this post that you can use as accumulators for any operation. These functions are the python reduce function from the functools module and the python accumulate function from the itertools module.

The basic function of these two functions is that they take a function and an iterable as arguments, and sometimes an initializer, and then successively carry out the operations of the function on two items in the iterable at a time, storing the result in a variable, and then doing the operation on the next item, storing the result and so on and so forth until you get to the final item and then output the final result. They have different ways of working though, which I will explain.

First, I will start with the python reduce function.

The python reduce function.

The syntax of the python reduce function is functools.reduce(function, iterable[, initializer]) and what it does is to apply the function to the items of the iterable from left to right, and it eventually reduces the iterable to a single value. The function returns the accumulated value of the result returned by the operation of the function that serves as its argument. The function must take only two arguments. The initializer parameter is optional. I will explain it below.

Let’s take the simplest accumulator, the sum function using a lambda function, and see how we can use it to illustrate how the reduce function works.

What the reduce function does is that it is using the lambda function to sum up the items of the iterable, this time, a list. First, it takes 1, the first item and binds it to x, then 2 and binds it to y, then it adds x + y, i.e 1 + 2 and binds the result, 3, to x again. Next it takes 3, and binds it to y, and adds x + y which this time is 3 + 3 which equals 6 and then it gives you the total result. So, you can now visualize how the successive addition is carried out.

You can click the following links if you want a refresher on python lambda functions or on python iterables.

As you can see from the syntax above, sometimes you can supply an initializer to the reduce function. The initializer takes the first value when the function is called. And if the iterable is empty, the initializer will serve as the default.

Let’s use an example with an initializer and see how it runs. This time we want our initializer to be 4.

You can see from the example above that the result of the summation of the list becomes 10. This is because we used an initializer of 4. What is happening here is that when reduce runs, it first binds the initializer to x, therefore x becomes 4. Then it binds y to 1 and sums them to give 5 and binds this result to x. It then binds y to 2 in the list, the next item, sums x and y to give 7 and binds this result to x. it then binds 3, the next item in the list to y and then sums x and y to give 10, the final result. It then returns 10.

Simple, not so. Very easy and fascinating. But don’t be in a hurry. It gets more fascinating when you realize that you can carry out operations on just anything you want. I used sum function to make you get acquainted with this. Any program that needs to accumulate successive results can be used with the reduce function.

Let’s take an interesting amortization example. If I owe $1000 and I pay off $100 annually at an interest rate of 5%, how much would I be owing at the end of four years? Reduce can help you get the result quickly. Let’s see how.

From the code above, you can see that I used an initializer of 1000 and the iterable was a list with the regular payments as the items.

Now, the question comes - since accumulators store successive results before giving the total result, can we be able to get those results before the total? Yes, we can. That is when the python accumulate function from the itertools module comes in.

The python accumulate function.

In fact, you can say that the python reduce and accumulate functions are cousins except for one difference: python accumulate gives you the ability to get the result of successive operations instead of having to wait for the final result. It acts like a generator in this instance.

The syntax of the python accumulate function is: itertools.accumulate(iterable[, func, *, initial=None]). As you can see from the syntax, the python accumulate function uses an iterable to create an iterator and applies the function to each of the elements in the iterator. That is what gives it the behavior of a generator. To get refreshers on these two concepts, you can check out this post on iterators, and also this post on generators. Just like for the python reduce function, the function used in the python accumulate method should be a function that accepts only two items and operates on these two items. Python accumulate method also takes an initializer, the initial argument, which is optional.

So, using the accumulate function, let’s do our amortization again but this time returning the results of successive accumulation instead of waiting for the final or total accumulation.

If you read the code above, you will notice that I cast the iterator returned by the python accumulate function to a list so that I can print out each of the results. Also, one feature of the accumulate function is that it returns the first item in the cashflow list, so during the iteration of the amount owing list, I ignored this first item. Apart from those two notations, we have our results just similar but a little differently from the python reduce function. This time, we can calculate the balance due at the end of each year rather than wait until the end of the fourth year.

If you notice when the yearly balance printed, each of the amounts was to two decimal places. I did that with a nice python string formatting syntax, {amount_owing_list[i]:.2f}, on line 8. To learn how, you can read an earlier post on python string formatting Part 1 and python string formatting Part 2 and you would be sure to be able to do it yourself.

So, that’s it. You can see that python as a language has powerful capabilities. Go experiment with it. Have fun with python.

See you at the next post. If you want to receive new post updates, just subscribe with your email. Happy pythoning.

Search