Search

5 Python Directory Handling Techniques

Directories and files are crucial to a programmer who wants a resource for his programs. That is why it is necessary after discussing python’s file handling methods, one should also undertake an understanding of python’s directory handling methods or routines. In this post, I will describe 5 routine ways one can handle directories using the methods provided in python such as the python make directory method and get working directory methods.

python directory methods

 

To be able to run any of the commands in this post, you first need to import the os module into your interpreter. To do that you use the code: import os.

Getting the python current directory

The current directory is the directory from which the python interpreter is operating. It depends on how you launch your interpreter or your editor. To know your current working directory is easy. You just need to call the python get current working directory method, os.getcwd(). Here is an example:


import os

working_dir = os.getcwd()
print(working_dir)

The code above will make the current working directory to be printed on your terminal.

You might desire to change your current working directory. Maybe you want to do some experiment on some programs and want to run them on a directory you intend to delete later; I do that all the time. Changing the current working directory is easy with python. You use the python change working directory method, whose syntax is: os.chdir(path). It states that you have to provide a path as an argument for the directory you want to switch to. Path should be based on the path specification of your operating system. It is wise to make path a string in all cases.

An example will suffice.


import os

os.chdir('C:\\Users\\Michael\\Desktop\\')
print(os.getcwd())

Notice above that I double escaped the backslash character. This is because the backslash is a special character. When I run the above code, it changed my working directory to ‘C:\Users\Michael\Desktop\’. Also, I am working on a windows 10 computer in case you are using Unix, Linux or Mac.

Creating New Directories with Python

There are occasions you want to create new directories, or what some call make new directories in python. Python can do this very easily when you use the right methods. There are two methods provided in python: the python make directory method, os.mkdir, and the python make directory recursive method, os.makedirs, which acts recursively by creating more than one directory as long as the directories do not already exist.

The syntax of the os.mkdir method is: os.mkdir(path, mode=0o777, *, dir_fd=None) where path is the name of the directory you want to create. You can leave the other keyword defaults as is because on some systems the mode parameter is just ignored and directory file descriptors, dir_fd, are not implemented.

Supposing we want to create a directory called, new_dir, we could try the following:


import os

try:
    os.mkdir('new_dir')
except FileExistsError:
    print('Directory already exists.')
else:
    print('Directory created successfully.')        

I used a try statement to make sure that the directory doesn’t exist before creating it. This is because a FileExistsError is raised if the directory already exists. That gives peace of mind.

The os.makedirs method is also used to create directories but it does this recursively. That means, you can use it to create successive directories. The syntax of the method is: os.makedirs(name, mode=0o777, exist_ok=False). Path is the name of the directory you want to create. It has a different argument though from the python make directory method that is worth mentioning. It has an exist_ok keyword argument which you can set to True if you want to create subdirectories of an already existing directory. Let’s use an example:


import os

try:
    os.makedirs('new_dir\\second_dir\\third_dir', exist_ok=True)
except FileExistsError:
    print('Directory already exists.')
else:
    print('Directory created successfully.')        

If you run the above on your machine, it creates the directories second_dir and third_dir (remember new_dir has already been created) and prints: ‘Directory created successfully.’ This is because I set the exist_ok argument to True i.e it should create subdirectories even where a directory already exists. The exist_ok argument comes convenient.

How to remove a directory in python

The methods under this category come in handy when you no longer need a directory. You can programmatically remove directories using python with the python remove directory method, os.rmdir, and the python remove directory recursive method, os.removedirs. The latter removes directories recursively. I didn’t tell you in the earlier post on file handling, but you can also remove files if you want to using the python remove file method, os.remove. I will describe all three here.

To remove a single directory, you use the python remove directory, os.rmdir, method. The syntax of the method is: os.rmdir(path, *, dir_fd=None). Path is the name of the directory you want to remove. With this method, you cannot remove directory trees or directories that are not empty otherwise it will raise OSError exception. If the directory does not exist, it will raise a FileNotFoundError.

When I wanted to remove the new_dir created earlier with child directories like this:


import os

try:
    os.rmdir('new_dir')
except OSError:
    print('Directory not empty.')
else:
    print('Directory successfully removed.')        

It printed out: ‘Directory not empty.’ That means I cannot remove a directory with child directories using this method. Not to worry, the second method, the python remove directory recursive method can do that: os.removedirs

The syntax of the os.removedirs method is: os.removedirs(name) where name is the name of the directory you want to remove.

In the example below, I wanted to remove all the directories and sub-directories we created when making directories.


import os

try:
    os.removedirs('new_dir\\second_dir\\third_dir')
except OSError:
    print('Directory not empty.')
else:
    print('Directory successfully removed.')        

It ran successfully and printed: ‘Directory successfully removed.’ To ensure it doesn’t raise an OSError exception, you should make sure that the leaf directory, third_dir, is empty i.e it doesn’t contain any files.

Now, let’s show the bonus method on how to remove a file.

The method for removing files is the python remove file, os.remove, method. The syntax is: os.remove(path, *, dir_fd=None). Path is the name of the file. If the file is already in use, the method raises an error. Note that the file name, path, should be relative to the current working directory.

In this example here, I want to remove a file that was used when we discussed the file handling methods in an earlier post:


import os

os.remove('eba.txt')
if os.path.isfile('eba.txt'):
    print('File not removed.')
else:
    print('File removed.')    

It ran successfully and printed: ‘File removed.’

How to rename a directory

We can programmatically rename a file or directory in python. There are methods for both single file or directory, or multiple files or directories. The python rename method, os.rename, works for single file or directory, while the python rename recursive, os.renames, method works recursively.

The syntax for the os.rename method is: os.rename(src, dst, *, src_dir_fd=None, dst_dir_fd=None) where src means the source file or directory, and dst means the new name you intend to give the source. The dst or new name should not already exist otherwise the operation will raise an OSError exception or that of one of its subclasses depending on the operating system used.

Here is an example of usage:


import os

try:
    os.mkdir('new_dir')
    print('Directory created successfully.')
    print('Now attempting to rename it.')
    os.rename('new_dir', 'old_lady')
except FileExistsError:
    print('Directory already exists.')
except OSError:
    print('Couldn\'t rename the directory.')
else:
    print('new_dir changed successfully to old_lady.')    

In the example above, I first created a new directory, new_dir, and when it went successfully without raising an error, I then attempted to rename it from new_dir to old_lady. If old_lady already exists, it will raise an OSError exception which I would handle by printing out: ‘Couldn’t rename the directory’ but if it doesn’t exist already, the renaming would run successfully, (which happened) and then print out: ‘new_dir changed successfully to old_lady.’

Now we can do this recursively. What if we create a directory tree with an empty leaf directory. We would have to use the python rename recursive, os.renames, method.

The syntax of the os.renames method is: os.renames(old, new) where old refers to the old name of the directory or directories and new refers to their new names.

Let’s take an example from above again. This time, we want to rename all the directories and sub-directories.


import os

try:
    os.makedirs('new_dir\\second_dir\\third_dir', exist_ok=True)
    print('Directories created successfully.')
    print('Attempting renaming of the directories created.')
    os.renames('new_dir\\second_dir\\third_dir', 'my_first\\my_second\\my_third')
except FileExistsError:
    print('Directories already exists.')
except OSError:
    print('Couldn\'t rename the directories.')
else:
    print('Renamed all three directories recursively.')                

From the above, you could see that I first created a directory tree, new_dir\second_dir\third_dir, and when it was created successfully, I tried an attempt at renaming all the directories recursively using a second try statement. If you do not have the necessary permissions to rename the directory, then the operation will fail. But if the permissions are available and the directories exist as stated in the names for the old directory, then they will be renamed and the code will print: ‘Renamed all three directories recursively.’

You can be creative and try out your own examples to see how it will run.

How to list all the files and nested directories of a directory

I am using windows, so Linux or Unix users pardon me if my example is Windows based. If on windows you want to list the contents of a directory, you use the command ‘dir’ on the command line and it gives you a listing. You can do the same with python. Python has two methods for doing so: a python list directory, os.listdir, method and an optimized python scan directory, os.scandir, method.

It is recommended that you use the python scan directory, scandir, method for most cases, but let me show a working example of the python list directory method. The syntax of the python list directory method is: os.listdir(path='.') where path is the name of the directory whose contents you want to list. The path parameter is optional and where omitted, it defaults to the current working directory. The method returns a list of all the files and directories that are contained in the directory named path.

Here is an example:


import os

dir_list = os.listdir()
for file in dir_list:
    if os.path.isfile(file):
        print(f'{file} is a file.')
    else:
        print(f'{file} is a directory.')    

The above code first returns a list of all the files and directories in the current working directory as dir_list. Then I iterate through the list in a for loop and print out whether an item is a file or a directory. This gives you a listing that is similar to the windows ‘dir’ command line .

Now, for the optimized python scan directory, scandir, method. The syntax of the optimized scan directory method is os.scandir(path='.') where path is the name of the directory. Scandir returns an iterator which yields objects that correspond to the files or nested directories in the path name. You can return the object name, whether they are files or directories, from the objects yielded. (If you want a refresher on iterators or on python generators that yields objects then click on the corresponding links.). Having objects that have file types and attributes increases code performance provided the operating system can provide this information.

Since the iterator produced by the python scan directory method is a resource, you need to close it or garbage collect it by calling the close method, scandir.close(), but you could do this better by using a with statement.

In the example below, we will list the contents of the current working directory again, but this time showing how to do it with the python scan directory method working as a generator.


import os

with os.scandir() as my_dir:
    for item in my_dir:
        if item.is_dir(follow_symlinks=False):
            print(f'{item.name} is a directory.')
        else:
            print(f'{item.name} is a file.')      

I used the with statement so that python will automatically close the iterator immediately the operation ends. You will notice that each object, item, yielded also has attributes of their own. In this example, item object has name attribute in item.name, and also the is_dir method in item.is_dir. This is because the objects are os.DirEntry objects. In the item.is_dir method, in order not to follow symbolic links and list a directory having files as a file, I switched the follow_symlinks parameter to False. This makes it possible to accurately get all directory listings.

Now you have been equipped to use python’s directory handling functions. Go experiment with what you can do with them.

Happy pythoning.

7 Important File Handling Functions In Python

Computer files, or resources for recording discrete data, are usually ubiquitous in python. File handling in python treats files as either textual or binary files and there is no limit to the size of files python can work with. In this post, we will be discussing textual files while in subsequent posts we will discuss binary files and how python handles them. Seven basic functions for handling textual files are discussed.

python file handling functions

 

The Built-in Python Open File function

The built-in python open file function is the first function you will encounter when you want to open any sort of file in python. It is used to open a file and it returns a file object. The syntax for the python open file function is open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None) but for working on textual files, we will focus on the parameters file and mode.

The file parameter to the open file function represents the pathname of the file absolute or relative to the current working directory. The pathname depends on the file system of the operating system. If the file is not found on the call to open file, the function returns a FileNotFoundError exception.

The mode parameter specifies the mode in while the file is to be opened. The default mode is ‘r’ which means open for reading text. Other values are ‘w’ meaning open for truncating and writing to the file, and ‘a’ meaning open for appending to the end of the file. Other modes are ‘b’ meaning open binary file and ‘+’ meaning open for updating (reading and writing). If you want to write to the file without truncating it, then use the combination mode, ‘r+’, which just writes to it at the beginning of the file. If you want to write starting from the end, then use ‘a’.

Now, let’s use some examples.

I will be working with the following text file, eba.txt, that is in my working directory. The contents of the file are:

Nothing beats a plate of eba
as we generally want to eat it
but it often gets stuck in our throats
where it is eaten without a good soup
that is oily and makes the eba
which is a gelly and very hard 
to move smoothly down our throats

You can also download the file here or copy and paste it if you want to use it so you can get the same results as I did.

Now, let’s just open the file without doing any reading or writing. Those functions will come later. After opening the file, we will then close it. It is good practice to always close your files.


try:
    fobj = open('eba.txt', 'r')
except FileNotFoundError:
    print('File doesn\'t exist.')
else:
    print('File opened successfully and file object created.')
finally:
    fobj.close()            

A more pythonic way to do the above, that is, open the file resource and then close it automatically would be writing the following line of code:


with open('eba.txt', 'r') as fobj:
    print('File opened successfully.')

If the file opened successfully, the print will run but if not, you will get a stack trace about a FileNotFound Error.

So you now know how to open textual files. What remains is to do something with them while they are open. The remaining functions will deliberate on that.

The Python Read File Function

The syntax for the python read file function is read(size=-1) where size specifies the number of bytes to read from the file. The default is -1 which means read all the contents of the file as string of characters (we are dealing with textual files here) and return all the contents of the file. If size is specified, then it returns the number of size of the characters from the file. If size is not specified but empty, the python read file function returns all the characters from the file. Be careful when using this feature because if the file is very large, it could interfere with your system memory.

So, for some examples. Remembering we are using the eba.txt file which contents I posted above.

Suppose we want to read only 20 bytes from the file. We will use this code:


with open('eba.txt', 'r') as fobj:
    s = fobj.read(20)
    print(s)

Our output would be:

Nothing beats a plat

Just the first 20 bytes in the file. Later, I will show you how to read from any position with random access to the file.

The Python Readline function

The syntax for the python readline function is readline(size=-1) and unlike read function, it reads and returns one line from the stream. It starts with the first line. You can customize it by specifying size and then the number of bytes in size will be read. The end of line is usually determined by the newline character of the python open file function. The default is to resort to the system defined newline character. For most implementations, the default newline is okay.

Now, to use some examples to illustrate. Imagine we had this code to just read the first line from the text file given.


with open('eba.txt', 'r') as fobj:
    s = fobj.readline()
    print(s)

The output we will get on the terminal is:

Nothing beats a plate of eba

This is the first line of the eba.txt file.

You will notice when you run the above that a new line is printed for each line. You could remove that new line which was created when ‘\n’ was encountered by calling the strip function on the string object returned by the python readline function. Compare the output for the code below using strip function and that for the code above without the strip function on your machine.


with open('eba.txt', 'r') as fobj:
    s = fobj.readline()
    print(s.strip())
 

Using the strip function on the string now strips away the added newline and gives a more beautiful rendering.

The Python Readlines Method

The python readlines method is in plural because it reads multiple lines. The syntax for readlines is readlines(hint=-1) which states that the readlines function reads and returns a list of lines from the stream. The hint parameter is to tell the python readlines function how many lines to read if you want to customize it but the default is to read all the lines and return them as a list. Please, use this function carefully. In fact, if your file is very large, it could have detrimental effect on your system memory. This is because to return the lines, it first needs to create a list of all the lines and this takes memory space.

An example to show how the readlines method works.


with open('eba.txt', 'r') as fobj:
    s_list = fobj.readlines()
    print(s_list)

Which gives the following list as output:


['Nothing beats a plate of eba\n', 
'as we generally want to eat it\n', 
'but it often gets stuck in our throats\n', 
'where it is eaten without a good soup\n', 
'that is oily and makes the eba\n', 
'which is a gelly and very hard \n', 
'to move smoothly down our throats']

It is recommended that you avoid using readlines because there are other ways to go about reading all the lines from your files without impacting on memory. One of them is to use a python for loop to iterate through the file object. This is because a python file object is already an iterable.

The above could be achieved with the following for loop code:


with open('eba.txt', 'r') as fobj:
    for line in fobj:
        print(line)

We have been reading and reading from files. Now, we want to write to files. We will now use the python write to file method.

The Python Write to File Method

The syntax for the python write to file method is write(s) which specifies writing the string, s, to the file and returning the number of bytes written.

The ability to write to the stream or file depends on whether it supports writing. To make this possible, we need to specify this support when opening the file and creating a file object. This is made possible by specifying the writable mode on the open file function (the open file function was explained above). The writable modes are:

r+ Update the file i.e read and write to the file. When the write function is called, it writes the specified string , s, to the beginning of the file.
w It truncates the file first and then writes the string s to the file. You lose all your former file contents with this mode.
a Append the string, s, to the end of the file. It writes onto the last line. If you want it to write to a new line at the end, you need to add a newline character at the beginning of the string, s.

Now, compare the following codes on your machine and see how they run:


with open('eba.txt', 'a') as fobj:
    s = fobj.write('This line was written.')

with


with open('eba.txt', 'w') as fobj:
    s = fobj.write('This line was written.')

and with:


with open('eba.txt', 'r+') as fobj:
    s = fobj.write('This line was written.')

You will notice that the way contents of the file, eba.txt, was written to differs based on the specified mode of the open function. The python write to file method is one of the methods you will most often use when working with files.

The Python seek function

With this method, you can change the current stream position so that when you call the python read file or python write to file methods, it doesn’t carry out those operations from the start of the file which is the default. The syntax for the python seek method is seek(offset, whence=SEEK_SET) where offset is the position you want the stream to go to. Seek method returns the current position of the stream.

For example, if you want to read the eba.txt file from the 35th byte or character in the file and then output the next 55 characters or bytes, you could change the current stream position using seek to be 35 and then do a read with size 55. Here will be the code:


with open('eba.txt', 'r+') as fobj:
    num = fobj.seek(35)
    s = fobj.read(55)
    print(s)

The output you would get from the eba.txt file is:

generally want to eat it
but it often gets stuck in ou

Showing just those 55 bytes of characters.

The last method we will consider is truncate.

The Python truncate file method

With the python truncate file method, you are able to change the size of the file. The syntax for the truncate file method is truncate(size=None) where size is the new size of the file. Where size is not specified, the file is truncated from the beginning of the file to the position of the stream. If size is lesser than original file size, the file is truncated but if higher than original, the size is extended and the unfilled areas are filled with zeros. For the python truncate file method to be operational, the file must support updating or writing, which you have to do by making the file open in writable mode as described above.

The truncate method acts like the write method.

So, I have given you ideas on what you can do with your files and file objects. The next post will be on how to handle python directories. Please, watch out for it. And subscribe to this blog so you can get regular updates when I post new articles.

Happy pythoning.

Utilizing Python reduce and accumulate functions as accumulators

Accumulators have a notable reputation in computing history. The earliest machines by Gottfried Leibniz and Blaise Pascal were based on the concept of accumulators. If you are familiar with your python functions, you would know that the python sum function acts as an accumulator when it comes to addition. But I would like to explain two functions in this post that you can use as accumulators for any operation. These functions are the python reduce function from the functools module and the python accumulate function from the itertools module.

python reduce and accumulate functions
 

The basic function of these two functions is that they take a function and an iterable as arguments, and sometimes an initializer, and then successively carry out the operations of the function on two items in the iterable at a time, storing the result in a variable, and then doing the operation on the next item, storing the result and so on and so forth until you get to the final item and then output the final result. They have different ways of working though, which I will explain.

First, I will start with the python reduce function.

The python reduce function.

The syntax of the python reduce function is functools.reduce(function, iterable[, initializer]) and what it does is to apply the function to the items of the iterable from left to right, and it eventually reduces the iterable to a single value. The function returns the accumulated value of the result returned by the operation of the function that serves as its argument. The function must take only two arguments. The initializer parameter is optional. I will explain it below.

Let’s take the simplest accumulator, the sum function using a lambda function, and see how we can use it to illustrate how the reduce function works.

What the reduce function does is that it is using the lambda function to sum up the items of the iterable, this time, a list. First, it takes 1, the first item and binds it to x, then 2 and binds it to y, then it adds x + y, i.e 1 + 2 and binds the result, 3, to x again. Next it takes 3, and binds it to y, and adds x + y which this time is 3 + 3 which equals 6 and then it gives you the total result. So, you can now visualize how the successive addition is carried out.

You can click the following links if you want a refresher on python lambda functions or on python iterables.

As you can see from the syntax above, sometimes you can supply an initializer to the reduce function. The initializer takes the first value when the function is called. And if the iterable is empty, the initializer will serve as the default.

Let’s use an example with an initializer and see how it runs. This time we want our initializer to be 4.

You can see from the example above that the result of the summation of the list becomes 10. This is because we used an initializer of 4. What is happening here is that when reduce runs, it first binds the initializer to x, therefore x becomes 4. Then it binds y to 1 and sums them to give 5 and binds this result to x. It then binds y to 2 in the list, the next item, sums x and y to give 7 and binds this result to x. it then binds 3, the next item in the list to y and then sums x and y to give 10, the final result. It then returns 10.

Simple, not so. Very easy and fascinating. But don’t be in a hurry. It gets more fascinating when you realize that you can carry out operations on just anything you want. I used sum function to make you get acquainted with this. Any program that needs to accumulate successive results can be used with the reduce function.

Let’s take an interesting amortization example. If I owe $1000 and I pay off $100 annually at an interest rate of 5%, how much would I be owing at the end of four years? Reduce can help you get the result quickly. Let’s see how.

From the code above, you can see that I used an initializer of 1000 and the iterable was a list with the regular payments as the items.

Now, the question comes - since accumulators store successive results before giving the total result, can we be able to get those results before the total? Yes, we can. That is when the python accumulate function from the itertools module comes in.

The python accumulate function.

In fact, you can say that the python reduce and accumulate functions are cousins except for one difference: python accumulate gives you the ability to get the result of successive operations instead of having to wait for the final result. It acts like a generator in this instance.

The syntax of the python accumulate function is: itertools.accumulate(iterable[, func, *, initial=None]). As you can see from the syntax, the python accumulate function uses an iterable to create an iterator and applies the function to each of the elements in the iterator. That is what gives it the behavior of a generator. To get refreshers on these two concepts, you can check out this post on iterators, and also this post on generators. Just like for the python reduce function, the function used in the python accumulate method should be a function that accepts only two items and operates on these two items. Python accumulate method also takes an initializer, the initial argument, which is optional.

So, using the accumulate function, let’s do our amortization again but this time returning the results of successive accumulation instead of waiting for the final or total accumulation.

If you read the code above, you will notice that I cast the iterator returned by the python accumulate function to a list so that I can print out each of the results. Also, one feature of the accumulate function is that it returns the first item in the cashflow list, so during the iteration of the amount owing list, I ignored this first item. Apart from those two notations, we have our results just similar but a little differently from the python reduce function. This time, we can calculate the balance due at the end of each year rather than wait until the end of the fourth year.

If you notice when the yearly balance printed, each of the amounts was to two decimal places. I did that with a nice python string formatting syntax, {amount_owing_list[i]:.2f}, on line 8. To learn how, you can read an earlier post on python string formatting Part 1 and python string formatting Part 2 and you would be sure to be able to do it yourself.

So, that’s it. You can see that python as a language has powerful capabilities. Go experiment with it. Have fun with python.

See you at the next post. If you want to receive new post updates, just subscribe with your email. Happy pythoning.

Using Python Regex To Validate Roman Numerals

Python regex, or sometimes called python regular expressions, are expressions written in python that are made to match a specific pattern in a string. They are a widely used feature in the world of UNIX and is provided by many programming languages. Python is not left out. Some of the advantages of using python regex are that with just one pattern you can validate any kind of input. Something we will be doing in this post. It keeps your code cleaner because it usually involves fewer lines of code, and furthermore saves you the stress of writing numerous lines of if else statements.

python regex with roman numerals
 

If you want a guide to regular expressions in python and some functions that come with the use of python regex, I will encourage you to read it up in this post, that describes the basic syntax, and then this other post on the methods we will be using, the python re match method.

In today’s post, we are going to show how to use python regex to validate Roman numerals based on its rules.

Roman Numerals and Its Rules

Roman numerals are a numeral system that originated in ancient Rome. They were popular and became the usual ways of writing numbers even down to the late middle ages in Europe. The numbers use Latin alphabets to represent numbers and these alphabets are combined according to set rules. In the modern usage of Roman numerals, seven alphabets are used to designate numbers and they are:

Symbol Value
I 1
V 5
X 10
L 50
C 100
D 500
M 1000

Some of the rules for writing valid Roman numerals which we will be using for validation are:

  1. The Roman numerals I, X and C can be repeated up to 3 times in succession to form the numbers but repetition of V, L, or D is invalid.
  2. To form numbers a digit of lower value can be placed before or after the digit of higher value and digits of lower value that can be used for this are I, X, and C.
  3. You should add up all the digits in a group when a digit of lower value is placed after or to the right of a digit of higher value. Digits of similar values placed together are also added.
  4. Subtract the value of lower digit from the value of higher value when a digit of lower value is placed to the left or before a digit of higher value. Note that V is never written to the left of X.

So, now that we have the rules we need to form the python regular expressions, let’s do the Roman numerals validation which is the juicy part.

Validating any Roman numeral

When you run the code below, you need to input a string as a Roman numeral when you are prompted. You will get a result indicating whether the string is a valid Roman numeral or not. If it is an invalid Roman numeral, you will get a message that says: “Invalid Roman Numeral” but if it is valid, you will get a message that says: “Your roman numeral was valid. Welcome.”

Now, let’s run it and have fun. After you have tried running it, I will give a brief explanation of the lines of code. Note that this code takes only 8 lines. If I had needed to use a python if else statement, that would have taken more than that which would not be clean.

Now, that you have taken some time running the above code and seeing how it works, let me explain some of the parts. I think I don’t need to explain the python re match method because you have read it from the link I gave above. So, I will just explain the pattern.

The key to the pattern matching above is the python regex pattern which is denoted as:

regex_pattern = r"^(?=[MDCLXVI])M*(C[MD]|D?C{0,3})(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$"

The ^ symbol that starts the pattern states that we should start from the beginning of the string while the corresponding $ symbol at the end says that we end at the end of the string. So, we presume that each string passed to the code will only be a single regular expression pattern, otherwise you will get invalid code. Now after the ^ symbol is a lookahead assertion, (?=[MDCLXVI])). Read up this blog post on python lookahead assertions if you want a refresher.

What the python lookahead assertion does is that it says starting at the beginning of the string we want to look ahead and state that any symbol we will be getting must either be an M, D, C, L, X, V or I. Yes, the only symbols that should be allowed to start the string are the seven symbols of the Roman numerals and nothing else. Note that the characters in python lookahead assertion are not captured. So, right now, we have not captured any match.

The next symbol is to match the thousands place. I denote this with the pattern: M*. It states that for the thousands place in the number, we need to match for M either 0 or more times. If the number is not a thousand or multiple of it, then M is zero but if it is then M is 1 or more, so we get a match for this. Unfortunately, I cannot guarantee you that this pattern will match beyond 3999, this is because from 4000, we need a very special thousand Roman numeral symbol to denote this which the pattern cannot cover. But you can try 1999 (MCMXCIX) and see that it matches. Because of the limitation in the thousands place, we could replace M* above with M{0,3} to state that we cannot go beyond 3999.

The next symbol to match is the hundreds place from 100 to 999. I denote the hundreds place with (C[MD]|D?C{0,3}) pattern. What this pattern says is the for a hundred place match, either C (100), should be to the left of M (1000) or D(500), or C should come after an optional D (500), but not more than three consecutive Cs.

The next is the tens place which runs from 10 to 99. The symbol for it is: (X[CL]|L?X{0,3}). This states that the tens place can either be an X (10) before a C (100) or L (50), or it can come after an optional L (50) and if this is the case in not more than 3 consecutive Xs.

The next is the units place which is between 1 and 9. Remember there is no 0 in roman numerals. The symbol for it is: (I[XV]|V?I{0,3}). What the symbol is stating is that the units place is denoted either by an I (1) appearing to the left of an X (10) or V (5), or it appears to the right of an optional V (5) and if that is the case not more than 3 times.

Well, that is it. Enjoy validating your Roman numerals with this simple tool.

I hope you do leave a comment about your results.

Happy pythoning.

Using The Python String Format Method: Format Specifications Part 2

In an earlier post, I showed how to use field names and conversion fields to format values in replacement fields. Today, I will continue that discussion by showing how to use format specifications, the optional and last feature of replacement fields, in the python string format method.

The format specification for python string format method
 

The format specifications represent how the value in the replacement field should be presented. It includes details such as the width of the field, its alignment, padding, conversion etc. Each value type is given its own specification. Also note that each format specification can include nested replacement fields but the level of nesting should not be deep.

You use a colon, :, to denote the start of a format specification. The format specification has 8 flags and I will denote each of them in their order of precedence. Note that each of the flags are optional.

  1. The fill flag
  2. This flag is used as the first flag. Use it to denote what you want to use to fill the space in the presentation of the value of the object. Any character can be used as the fill character and if it is omitted, it defaults to a space. Note that a curly brace cannot be a fill character except the curly brace is in a nested replacement field. The fill character kicks in when the value of the object cannot fill the specified width of the replacement field otherwise it doesn’t apply. So, you use it with other flags.

  3. The align flag
  4. The represents the alignment of the value of the object. You could either right align, left align, center or cause a padding to fill the available space. The different options are presented below:

    < Used for left alignment of the value in the available space. The default for most objects
    > used for right alignment of the value in the available space. The default for numbers.
    = Forces a padding to be placed after the sign but before the digits. Only valid for numeric types. If you precede the field width (explained below) with 0, then this becomes the default.
    ^ forces the value to be centered within the available space.

    To make the alignment option meaningful, you must specify a minimum field width. Here are some examples. They all come with minimum field width of 20.

  5. The sign flag
  6. This is only used for numeric values. The various options are:

    + Use a sign for both positive and negative values.
    - Use a sign only for negative values (this is the default behavior)
    Space Show a leading space for positive numbers and a minus sign on negative numbers.

    Here are some examples.

  7. The alternate flag, #.
  8. Use this flag when you are doing value conversion and you want the alternate option to be specified. It is valid for integers, floats, decimal and complex types. We will come back to this when we get to the conversion flag and show how the alternate forms can be specified.

  9. The grouping flag.
  10. The grouping flag specifies the character to be used as a thousands separator. It has two options:

    _ Use this as a thousands separator for the integer types when ‘d’ is specified as the type flag (to be explained later) and floating point types. When the type flag for integer types is either ‘b’, ‘o’, ‘x’, or ‘X’, the separator is inserted after every four digits.
    , Use a comma as the thousands separator. You could use the ‘n’ type flag instead if you want a locale aware separator.

    Now some examples. I included the third example with ‘b’ as a type flag. ‘b’ as type flag means convert value to base 2. This will be explained below under type flags.

  11. The precision flag
  12. The precision flag is a decimal number that indicates how many digits should be displayed “after” the decimal point for a floating point value that has the type flag ‘f’ or ‘F’, or before and after the decimal point for a floating point value that has the type flag ‘g’ or ‘G’. Note that there is no precision for integer types. If the value is a non-numeric type, then this indicates the maximum field size of the replacement field.

    Now for some examples. Notice how it truncates the string type, s, when the precision is smaller than the number of characters.

  13. The type flag
  14. The type flag determines how the data should be presented. The type flag is specified for string types, integer types, and floating point types.

    For string types: The available options are...

    s The default type for strings and may be omitted
    None The same as ‘s’

    For integer presentation types: The options are...

    b Outputs the number in binary format
    c Converts the value to the corresponding Unicode character before printing.
    d Output the number in base 10 before printing.
    o Octal format. Output the number in base 8 and print.
    x Hex format. Output the number in base 16 using lower case letters for digits above 9
    X Hex format. Output the number in base 16 using Upper case letters for digit above 9.
    n Decimal format. The same as ‘d’ but it uses locale aware setting to insert appropriate thousands separator for the locale.
    None Same as ‘d’

    Note that except for ‘n’ and None, you can use any of the options above in addition to the floating point types below for integers. That is, you can have a mixture of both integers and floating points.

    Now, let’s use some examples.

    When discussing the alternate flag, #, I stated that there are times when you want alternate conversion forms to be specified. For example, for binary, octal and hexadecimal outputs the alternate flag, #, will result in an output of ‘0b’, ‘0o’, and ‘0x’. Let’s show this with examples.

    The alternate flag can also be applied to floats and complex numbers.

    Now finally, the options for floating point presentation types are:

    e Exponent notation. Print the number in scientific notation using the exponent, e, to denote it. The default precision is 6.
    E Exponent notation. Print the number in scientific notation using the exponent, E, to denote it.
    f Displays the number in fixed point notation. The default precision is 6.
    F Fixed point notation, just like ‘f’ but converts nan to NAN and inf to INF.
    g This is the general format. Uses fixed point or scientific format depending on the magnitude of the number.
    G General format, but in uppercase.
    n Same as ‘g’ but is locale aware in inserting appropriate thousands separator.
    % Percentage. Multiplies the number by 100 and displays it in fixed format, ‘f’, with a percent sign (%) following it.
    None Similar to ‘g’ except that fixed point notation when used has at least one digit past the decimal point.

    The following examples uses precision 2 then the default 6.

I hope you get creative in using this format specifications. They are very helpful when representing values. Note that python’s literal string formatting method, f-strings, are similar to the python string format method described here. You can interchange the two.

Using The Python String Format Method Like A Pro Part 1

How you format your text is important in text processing and python is not left out, giving you several options to make your output appear presentable. I decided to delve into the issue of python formatting in today’s post while reading some code. I appreciated the way the author applied python string formatting. So, I decided to devote two posts to string formatting because I believe my readers would be interested in it.

python string format method makes output presentable
 

In python you format your output using the format method of the string class. What is also called the python str.format method (or python string format method) to differentiate it from the python literal f-strings. A format string contains two types of features that would have to be sent to the output: literal text and replacement fields. Replacement fields are surrounded by curly braces, {}, and refers to objects that have to be formatted, while literal text refers to whatever you want to leave unchanged in the output. So, what we are interested in are replacement fields.

To give you an idea of what replacement fields are, read and run the following code:

You will see that in the string part of the python format method in the code above, there are two curly braces and they serve as replacement fields whose values are provided by the parameters, name and age, of the python format method. We are going to be discussing how you can format your output based on the replacement fields and parameters.

The syntax of the python string format method

The syntax of the python string format method is: template.format(p0, p1, k0=v0, k1=v1) where template refers to the string you want to format. As I said before, the template consists of both literal text and replacement fields. Replacement fields are denoted by whatever is in curly brackets, {}. The arguments p0 and p1 refers to the positional arguments while k0 and k1 refers to the keyword arguments. Positional and keyword arguments are used to insert values into the replacement fields in the template. We will cover all these and give you ideas on how to use them.

The replacement fields have three optional features: field names, conversion fields that are preceded by an exclamation point, !, and format specifications. Today’s post will cover how to specify the field names and conversion fields while the next post will be on format specifications.

The field names in the string replacement fields.

The replacement field starts with an optional field name. The field name refers to the object whose value is to be inserted. The object is specified in the parameter of the format method. The field name is either a number or a keyword.

  1. Where the field name is a number:
  2. An example to illustrate this is below:

    
    name = 'Michael'
    age = 29
    print('Hello, you name is {0} and your age is {1}'.format(name, age))
    

    You can see that in the template above, there are two curly braces or replacement fields. The first has the number 0 and the second has the number 1. The curly brace with 0 refers to the first positional argument which is found as a parameter to the format method and here this is the variable, name, while the curly brace with 1 refers to the second positional argument which is the variable, age.

    If you so desire, you can choose to leave out the numbering of the curly braces and python will insert them on your behalf. Like this:

    
    name = 'Michael'
    age = 29
    print('Hello, you name is {} and your age is {}'.format(name, age))
    
  3. Where the field name is a keyword.
  4. The python string format method provides for instances where you can specify keyword arguments as parameters and the replacement fields requires you to specify the keywords. An example is below:

    print('Hello, you name is {name} and your age is {age}'.format(name='Michael', age=29))

    You can see now that I have inserted the keywords into the curly braces because the parameters are keyword arguments.

    Using keywords as arguments is super powerful. It gives you the ability to change the ordering of the parameters in the replacement fields. For example, instead of following the ordering of the positional arguments, I could order the replacement fields as it suits my fancy:

    Check out the code above and the one before it. See how I interchanged the ordering of the keyword arguments in the replacement fields. We could try another example to show you how powerful this is.

    print('In {country}, there are {number} million people speaking {language}.'.format(language='English', number=300, country='USA'))

    Now, let’s insert it into the embedded python interpreter so you can run it:

    With keyword arguments you are not constrained to any sort of ordering. You choose how you want it to be. You can check out this post if you want a refresher on positional and keyword arguments.

    Note: What if you want to have the brace as a literal text in the template? Simple, just double brace it.

    print('This is doubling the braces {{{name}}} for {name}'.format(name='Michael'))

    I doubled the braces for the first replacement field. Let’s run it to see how it would appear on the embedded interpreter.

    When you run it, you will notice that braces now literally appears in the output.

    Now, what if your parameters are lists or an object with attributes whose value you want to show on output? The next two sections below will show you how.

  5. Where the parameter to format is a list.
  6. To make the output appear as you want it to, you can specify the parameter as a keyword argument or a positional argument. Look at the code below and see how. First, I specify it as a keyword argument. That means, you need to implicitly specify the list in the parameter and index it in the replacement field. But if you want it as a positional argument, you need to specify the index as parameter.

    What python does when you specify it either way is to call the __getitem__() method of the list. I discussed about this method in an earlier post on sequences.

  7. When the object has attributes with values.
  8. When the object in the parameter has an attribute whose value you want to format, you can directly call the attribute in the replacement field. The code below shows how in the method get_fruit. What the 0.index and 0.fruit does is call the getattr() function of the object, self, in order to get the required value. In the code below I created a fruit class with a class attribute, index, so that whenever a fruit is created it is tagged with an index (instead of creating a list) and then the index is incremented to tag the next fruit.

Be creative. Play with your own objects to test how format calls attributes from the replacement field.

I think that’s all for field names. After the field names come an optional conversion field.

Syntax of the conversion field

The conversion field is optional, but if specified, it is preceded by an exclamation point, !, to differentiate it from the field name. It causes type conversion before any formatting of the replacement fields takes place. But one may ask – doesn’t every object have a default __format__() method? Yes, they do. But the creators of python realized that sometimes you want to force a specific string representation of an object.

There are three types of specifiers for the conversion field: !s, !r, and !a specifiers.

  1. The !s specifier:
  2. The !s conversion specifier gives you a string representation of the object in the replacement field. What it does is call str() on the object in the replacement field, converting it to a string. This is the default string formatting.

  3. The !r specifier
  4. You can use this when you want the true string representation of an object to be specified, and not just outputting it as a string. This representation contains information about the object such as the type and the address of the object. This specifier calls the repr() method of the object.

  5. The !a specifier
  6. This specifier also outputs the true string representation of an object but it replaces all non-ascii characters with \x, \u or \U. This specifier calls the ascii() method of the object. It works like the !r specifier if you have no non-ascii characters in the object.

Here is an example illustrating all three types. Notice how the object type appeared in the output for !r and !a.

As another illustration, you can compare the output of the !s and !r in a string with quotes showing or not showing.

In my use of the conversion fields, I have found that making them optional has served me well. So, they just come in for special cases of formatting.

Now, the third and last feature of the replacement field option is the format specifier which is explained in this post. This is where the real juice of replacement fields are stored.

Light Trapping Nano-Antennas That Could Change The Application Of Technology

Travelling at a speed of 186,000 mi/s, light can be extremely fast. Even Superman, the fastest creature on Earth, cannot travel at the speed of light. Humans have shown several times that they can control the direction of light by passing it through a refractory medium. But is it possible to trap light in a medium and change its direction just as you can trap sound in an echo device? Before now that possibility was theoretical but new research has shown that this could be practical. Since light is useful for information exchange and so many applications, the ability to control light, trap it or even change its direction could have several applications in science and technology.

outline from light trapping device
 

In a recent paper published in “Nature Nanotechnology”, some Stanford scientists who were working at the lab of Jennifer Dionne, an associate professor of materials science and engineering at Stanford University, have demonstrated an approach to manipulating light which has been successful in its ability to significantly slow the speed of light and also change its direction at will. The researchers structured silicon chips into fine nanoscale bars and these bars were used to trap lights. Later, the trapped light was released or redirected.

One challenge the researchers faced was that the silicon chips were transparent boxes. Light can be trapped in boxes but it is not so easy to do if the light is free to enter and leave at will just as you find in transparent boxes.

Another challenge that was faced by the researchers was in manufacturing the resonators. The resonators consist of a silicone layer atop a wafer of transparent sapphire. The silicon layer is extremely thin and it has the ability to trap lights very effectively and efficiently. It was preferred because it has low absorption in the near-infrared spectrum which was the light spectrum that the scientists were interested in. This region is very difficult to visualize due to inherent noise but it has useful applications in the military and technology industry. Underneath the silicone layer is a bottom layer of sapphire which is transparent and the sapphire are arranged in wafers. Then a nano-antenna was constructed through this sapphire using an electron microscopic pen. The difficulty in etching the pattern for the microscopic pen lies in the fact that if there is an imperfection then it will be difficult for it to direct light as the sapphire layer is transparent.

The experiment would be a failure if the box of silicon allowed the leakage of light. There should be no possibility of that. Designing the structure on a computer was the easy part but the researchers discovered the difficulty lay in the manufacturing of the system because it has a nano-scale structure. Eventually they had to go for a trade-off with a design that gave good light trapping performance but could be possible with existing manufacturing methods.

The usefulness of the application

The researchers have over the years tinkered with the design of the device because they were trying to achieve significant quality factors. They believed that this application could have important ramifications in the technological industry if it was made practical. Quality factors are a measure of describing the resonance behavior involved in trapping light and in this case it is proportional to the lifetime of the light.

According to the researchers, the quality factors that were demonstrated by the device was close to 2,500 and if you compare this to similar devices, one could say that the experiment was very successful because it is two times order-of-magnitude or 100 times higher than previous devices.

According to Jennifer Dionne at Stanford University, by achieving a high quality factor in the design of the device, they have been able to place it at a great opportunity of making it practical in many technology applications. Some of these applications include those in quantum computing, virtual reality and augmented reality, light-based Wi-Fi, and also in the detection of viruses like SARS-CoV-2.

An example of how this technology could be applied is in biosensing. Biosensing is an analytical device used for the detection of biomolecules that combines a biological component with a physicochemical component. A single molecule is very small that essentially it is quite invisible but if light is used as a biosensor and passed over the molecule hundreds or even thousands of times, then the chances of creating a detectable scattering effect is increased, thereby making the molecule discernible.

According to Jennifer Dionne, her lab is working on applying the light device on the detection of Covid-19 antigens and antibodies produced by the body. Antigens are molecules produced by viruses that trigger an immune response while antibodies are proteins produced by the immune systems in response to the antigens. The ability to detect a single virus or very low concentration of multitudes of antibodies comes from the light – molecule interaction created by the device. The nanoresonators are designed to work independently so that each micro-antenna can detect different types of antibodies simultaneously.

The areas of application of this technology is immense. Only the future can predict the possibilities when other scientists start experimenting with what was discovered. I think this innovation is a game changer.

Materials for this post was taken from the Stanford University website.

A Concise Guide To Python Loops

In a post this week, while discussing control flow in python, I wrote about repetitive control structures in python which consist of the python while and for loops. But I received some text message where a reader said my post was not concise enough; that I left off some features of python loops. I agreed with him. This was because my focus was just in showing how control structures work in python and not on showing all the features of python loops. So, in this post, I have decided to write a concise guide on python loops.

python while loop and python for loop
 

As I said earlier in the other post, when you want to repeatedly iterate over some block of code you use loops. In python, you can either use a python for loop or a python while loop. After showing examples of both loops, I will then concisely explain what situations both loops can be used that makes them similar and different.

The python for loop

In order to be more concise and cover all situations, I will use the syntax of the documentation reference in defining a python for loop.


for_stmt ::=  "for" target_list "in" expression_list ":" suite
              ["else" ":" suite]

This python for loop syntax states that a for loop is denoted by the “for” keyword. Also, on evaluation of the iterable that would be used in the for loop, expression_list, an iterator is created consisting of all the items to be used in the looping construct. Then for each iteration, the target_list is bound to each of the items in the expression_list iterator, and it will be used in the suite which is the block of code that is to be repeated. A python for loop can have an optional else clause. The else clause, when denoted, is called when the loop has completed all its iterations.

Now a picture is worth a thousand words. Let’s illustrate the syntax above with an often used syntax:


# please note that the else clause is optional
for variable in iterable:
    block of code to execute
else:
    block of code when for clause ends    

Just as in the documentation’s syntax, the variable is assigned each item in the iterable during an iteration until the iteration ends. Most times, the variable is used in the block of code to execute.

Let’s show how the iteration works using a python for loop example with an iterable, this time a list, and printing out each iteration of the list to show how they are passed to variable.

When you run the code above, you can see that each item in the list is printed out in the block of code. This is because for each iteration, the item variable is bound to the first fruit, then the next fruit, and so on subsequently.

What if for each iteration we want to do something with the items, like multiply each item in the sequence. Here is code that shows you how.

You can see that the python for loop iterates through each of the numbers and prints them out.

Now, let’s show how the often ignored else clause can be used. When the loop finishes its iteration we can specify an else clause with a block of code that will be executed. The else clause in a for loop is similar to the else clause in a try statement and not to the else clause in an if statement.

I told you this promises to be a concise guide. So, I will show one more example of the use of an else clause. What if we had a for loop that when a condition is satisfied, it breaks out of the loop but has to execute another block of code after it breaks out of the loop. For example, imagine having some numbers in a range, like 1 to 10, and writing a function that states whether each number is a prime or a composite. A prime does not have a factor between 2 and the number, except 1 and the number itself. So, using this property we will factor all the numbers from 2 to that number and use the result of remainder division to state whether a number is a prime or a composite. I will use two loops for this example. See how the else clause is used to achieve this effect.

Notice that the else block is triggered when the second for loop goes to completion because the number is not a composite number. Anything that is not a composite is prime. But if it is a composite number, we break out of the loop to the outer enclosing loop to start another outer iteration.

There is something I introduced in the above code which I have not talked about. That is, using the python range function in a for loop to iterate over numbers. Yes, the range and for loops come in very handy when you want to iterate over numbers in python for loops. Use them at your convenience. The syntax of a range function is: range(start, stop[, step]). The start is the number to start the iteration from. The stop is the number at which to stop the iteration. Stop is not included in the iteration. The step signifies how you pick the numbers, maybe you want to pick every second number from the start of the range etc. When there is only one argument to the range function, it is understood to refer to the stop. The default for start, which is 0, is assumed, and the default for step, which is 1, is assumed. When there are two positional arguments, it is understood that the first is the start and the second is the stop. Then the default for the step, 1, is used. Note that you cannot make step to be zero or it will give a ValueError. To illustrate, let’s use examples.

The range function is a handy tool to use with the for loop when you are dealing with numbers. You use it to create a sequence of numbers to be used in the iteration directly, as we did in the for loop above. Or you can use it to create a set of integer indices that can be used on the sequence itself. Let’s show an example of using it to create indices for use in the iterable. Here you create the argument for the range using the length function called on the iterable. I discussed about this in an earlier post on how powerful the length function is. Now, for an example.

Item variable above are integers created by the range function as indices to the fruits list.

Finally, before I end the discussion on for loops, I have to tell you that for loops can be nested. I showed an example in the loop above that looks for prime numbers. But another example of a nested for loop would not be bad.

I promise to be concise, right? So, let’s take on python while loops.

What are python while loops?

The while loop or while statement is used for repeated execution of a block of code as long as an expression is True. Here is the syntax of a python while loop according to the documentation:


while_stmt ::=  "while" assignment_expression ":" suite
                ["else" ":" suite]

You can see from the above that you begin the while loop with the while keyword and this is followed by the expression you want to evaluate for whether it is True or False. Note that the expression must be a Boolean expression. As long as the expression is True, the while loop will continue executing the statements in the suite or block of code. Also, notice that a while loop also has an optional else clause. This else clause can be used to signify block of code you want to run when the while loop finishes running or when the expression evaluates to false.

The common syntax for python while loop used by many authors is:


# the else clause is often not used. Optional
while condition_is_true:
    body_of_code
else:
    body_of_code    

This is the flow of control of a while loop. First, the condition is checked whether it is True. If it is False, the while loop is not executed but flow of control moves to the next statement in the code. If the condition is True, the body of code in the while loop is executed until it becomes False.

Note that if the body of code in the while loop is executed and the condition does not become False during the time the loop is running, you will enter an infinite loop. That is, a loop that never ends. If you enter an infinite loop, just press CTRL+C on your machine and it will stop execution of the loop. But you can prevent infinite loops.

How to prevent infinite python while loops.

To prevent infinite loops occurring in your python while loops, you need to use a counter at the condition or Boolean expression. Then you have to initialize the counter before the while loop and increment or decrement the counter in the body of the loop.

Now, let’s do all this with some examples. First, showing the use of a counter in the condition of the loop to test for True.

Let’s take some points from the code above. But, first I will encourage you to run it to see that it works. First, before the loop we initialized the counter to 1, our starting number. Then we used the counter to test for the condition that we have not gone beyond the last number, 5, by saying we want counter to be true when it is less than 6. Then in the body of the while loop we incremented the counter so that on each iteration the counter keeps moving towards 5 and when it moves beyond 5 to 6, the condition becomes False so we exit out of the loop. What if we had not incremented the counter in the body of the loop? We would have entered an infinite loop. If we had not initialized the counter before the loop condition, we would have gotten a NameError exception. I want you to test these two error conditions on your own machine.

Now, if we do not want to use a counter, we could use a variable that is bound to a Boolean value in the condition of the while loop. We need to initialize the variable also before the while loop is entered. Then in the body of the while loop we change the switch for the variable so that the condition can become False when we want the loop to stop execution. Here is an example.

Notice that I am doing the same calculation but this time using a variable that is bound to a Boolean value. We initialized the variable before the while loop and makes sure it is True in the condition. Then in the body of the loop, when our condition is satisfied and we want the loop to stop execution, we switched the variable so that the condition becomes False.

There are two more statements about loops that need to be considered. The python break and continue statements. But I don’t want to repeat myself. Programmers aim to reuse code, so I would encourage you to go to the post on control flow where I discussed break and continue statements in loops. I believe I was very concise in explaining those concepts in that post. You can find the explanation at the end of the post.

Now finally, the similarities between for loops and while loops along with their differences.

Similarities and differences between python while and for loops.

First, their similarities. The basic similarity between a while loop and a for loop is that you can end either loops early via a break statement. Whenever you call a break statement on either loop, it stops execution of the loop enclosing the break statement.

There are three differences I have noticed between the two loops:

A python for loop has a finite number of iterations and you can know how many iterations it will perform. A while loop might have an infinite number of iterations and you might not be able to count how many iterations it will go through.

Although both loops can use a counter, the counter for a while loop must be initialized before the loop and then incremented or decremented in the body of the loop.

You can rewrite a python for loop using a python while loop but you might not be able to rewrite a while loop using a for loop except in some cases.

So, I can now rest in peace. That is my concise guide to loops in python.

Happy pythoning.

An Unstructured, Random Python Cipher That Seems Unbreakable

Today, we will be cracking codes with python. While researching on this post, I came up on an article about Caesar’s cipher. Caesar’s cipher is a means of encrypting messages using a mapping from the original alphabets to the encrypted alphabets with the original alphabets shifted by some keys either to the left or the right to produce the encrypted alphabets. The author said that Caesar’s cipher, which was one of the earliest forms of cryptography, could be broken by a brute force method. I said to myself: “That’s cool. It could be broken because Caesar’s cipher has a key with a structure. What about if the key has no definite structure?” So, I decided to write a program that is inspired by Caesar’s cipher but with a random key that has no structure. Rather than use the python chr and ord functions, I decided that a better way for my concept to work was to randomize a translation table. But to have a random translation table, I needed to first create it.

 

keys to a cipher in python

How do you create a translation table that has no structure when mapping from source to destination strings and is random? Well, before I begin explaining how, I should explain the functions we are going to use. The functions are python’s randint, maketrans and translate functions.

The python randint function.

The python randint function is a random number generator in python and one of the methods of the python random module. It generates a random integer each time it runs. To use it, you have to import the python random module. The syntax of the randint function is random.randint(a, b) where randint generates integers between a and b inclusive. If you want an indepth coverage of the python randint function and other functions of the python random module, you could do well to read it up on an earlier post. So in my solution today, I will be using the randint function to generate python random numbers.

The next function is the maketrans function.

What is the python maketrans function?

Python has two types of maketrans functions - the static byte.maketrans method and the static str.maketrans method. The earlier belongs to byte objects and the latter to string objects. Both are used to make translation tables for mapping characters.

  • Bytes.maketrans:
  • The syntax is bytes.maketrans(from, to). It will map each python character in the from string of bytes to its equivalent python character in the to string of bytes while making a translation table to be used by the python translate function. From and to must be bytes objects with the same length. To create a translation table that maps ‘a’ to ‘e’, ‘b’ to ‘f’, and ‘c’ to ‘g’, in bytes, we could write the following code:

    
    original = b'abc'
    end = b'efg'
    translation_table = bytes.maketrans(original, end)
    

    When we have a translation table, the work of doing the actual translation is nearly complete.

  • Str.maketrans:
  • The syntax is str.maketrans(x[, y[, z]]) where y and z are optional arguments. When using the python str.maketrans function you are making translation tables that maps python characters or Unicode ordinals to other python characters, Unicode ordinals or None. Note that Unicode ordinals are mappings of characters as integers. For example, ordinal 97 is character ‘a’ while ordinal 98 is character ‘b’.

    When only x is used as the argument in the python str.maketrans method, you must supply a dictionary to str.maketrans method to make a translation table. Note that all translation tables are dictionaries that maps the source to the destination. Here is an example:

    Like before, we are mapping ‘a’ to ‘e’, ‘b’ to ‘f’, and ‘c’ to ‘g’. In the translation table, the Unicode ordinals for the characters are used to identify the characters.

    What if you specify two parameters to maketrans i.e x and y. When you do so, both x and y must be python strings of equal length. You need to specify a string that contains the keys in order for maketrans to properly understand how to create the translation table. Maketrans will create a translation table mapping characters in x, the source, to characters in y at the same index. An example is below:

    If you want some characters to be mapped to None in the translation table, then you have to specify the third argument, z, when calling str.maketrans. Any character in z is mapped to None. Here is an example where d is mapped to None. Any character mapped to None is deleted during the translation of the actual message.

So, I believe you now understand how to create translation tables and you know that the translation tables uses the Unicode ordinals for the characters. Therefore, instead of specifying characters, you could just write out the Unicode ordinals if you know them.

The next step is to do the translation. You use the python translate function to do the actual translation.

How to use the python translate function.

To do the actual translation from the translate table, you use the python translate function. There are two types of python translate functions, the bytes.translate and str.translate, but I suggest you stick to just str.translate because most of the messages you will be translating will be python strings.

The syntax for str.translate is str.translate(map) where str is the message you want to translate and map is the translation table you will be using to do the mapping of the characters in the message. Notice from above that the translation table is a dictionary of Unicode ordinals to Unicode ordinals, strings, or None.

What the translate function does is to take each character in the message, look for its corresponding key in the translation table. If it exists, it replaces that character in the message with its value in the translation table. If the character does not exist in the translation table, the character in the message is left as it is. If in the translation table the character is mapped to None, it is deleted in the message.

Now, that’s a mouthful. Let’s illustrate all the above with examples.

First, let’s translate a message containing the characters in the source above. For example, supposing our message is ‘abccbaaaab’, how would it be translated?

When you run the above you would notice that ‘abccbaaaab’ is translated to be ‘efggfeeeef’ since we are replacing all ‘a’ with ‘e’, all ‘b’ with ‘f’ and all ‘c’ with ‘g’.

Let’s take another example where some characters in the message do not appear in the translation table and also some characters in the translation table are mapped to None.

If you look closely at the code, you will notice that the message has four characters but the translation has just three characters. ‘a’ and ‘b’ were translated as per the translation table to ‘e’ and ‘f’ respectively. In the translation table, ‘d’ is mapped to None so in the translation it is removed. While there is an ‘e’ in the message but there is no key ‘e’ in the translation table so the ‘e’ character is left as is, untranslated.

So, we have what it takes for us to do our unstructured, random python cipher.

This is the source code. You can run the code before understanding the logic stated below just to see how it works. Run it more than once and see that each time you get a different encryption scheme.

This is the logic behind the code. We will first create our source string for the translation table. That will be all the lower case alphabets. We will then cast this source string to a list and use it as a list of values we are going to use for the destination string or replacement string. Since all the alphabets are 26 characters, we enter a loop in line 7 which will create a replacement string 26 times. For each iteration of the loop, we will create a random index between 0 and 25. The index variable serves as an index to the values_list which will be used to create the replacement strings or destination strings. When we have a random index, we will then check if that index has a value in the values_list (line 9). If it has a value, we place that value in the destination_list, adding it as a stack. That means the first value will be the replacement for ‘a’ in the source string. After placing that value in the destination_list, we then substitute its corresponding value in the values_list with None; to tell the code that we have come to that index. Each time that the index gets a value in the values_list that is None (we have already used it), it moves one step in the values_list modulo 26, looking for an unused value in the values_list until it finds one. When it finds one, it stops looking (lines 12 -19). This step of stepping through the values_list looking for a value makes the arrangement of the replacement strings unstructured or without any pattern. That makes it difficult to use a mathematical formula to crack the code. I was thinking about this when I wrote a blog post on the unbreakable code and internet security that describes one-way functions. One-way functions are unbreakable codes; they are functions that go one way and cannot be reversed. Making my replacement strings unstructured mimics this behavior.

When the destination_list is completely populated, we then convert the list to a string (line 20). So, right now, we have our source and destination or replacement strings which makes it possible to create the translation tables for encryption and decryption. (Lines 24 – 28). With the translation tables created, we do the actual encryption and decryption using a specified message and voila, it works. (Lines 31 – 36)

You will notice that each time you run the code, you will get a different encryption scheme because the translation tables are randomized. That makes it difficult to break. What anyone using the code would have to do is to run it once, save the state of the translation tables in a database and use the translation tables for encryption and decryption. The weak point of my code is protecting the translation tables from hackers laying a hold on it, otherwise I think it would be very difficult to hack this scheme.

I challenge anyone to hack it without peeking at the translation tables.

If you want the source code for the unstructured, random python cipher, you can download it here.

Matched content