Search

Showing posts with label python programming. Show all posts
Showing posts with label python programming. Show all posts

Leveraging the Power of Loops with Loop Invariants.

Have you gotten into a situation where it was hard to keep track of what is wrong with your loop because the loop either runs forever without terminating or it gives results that were not expected? One way to leverage on your loops and show that your loops correctly give the desired result is to prove your loops with loop invariants.

python loop invariants

 

What is a loop invariant

A loop invariant is a proof technique that is used to show that the algorithm for a loop is correct. Many people think that it is a very hard concept because it is usually expressed in mathematical terms, but they never know that it is not only simple, but it gives you power with your loops. To derive a loop invariant, you should already have the algorithm for the loop and because this is the case, loop invariant proofs are intuitive ways of proving that an algorithm is correct because the programmer already has a base to start from.

A loop invariant starts with a statement about a loop. Let’s call that statement about the loop, L. In the loop invariant we would prove the following things about L:

a. L is true before the loop is entered.

b. For each iteration of the loop, L is true at the end of any iteration.

c. When the loop exits, the statement L, should also be true.

Then, at the exit of the loop or when it terminates, we should have reached our goal when formulating L.

Now let’s illustrate the properties above visually to show how it relates to an example loop:

    
# the Loop Invariant, L, must be true here
while ( TEST CONDITION ):
    # top of the loop
      ...
    # bottom of the loop
    # the Loop Invariant,L, must be true here
# Termination + Loop Invariant = Goal

Now sometimes people confuse what we call the loop invariant with the test condition that terminates the loop. They are two different entities and should not be exchanged one for the other.

Now that we have the properties for a loop invariant, let us illustrate this with an example code on how we can use a loop invariant to justify the correctness of an algorithm.

For example, given the function below, smallest, which is used to find the smallest index at which the target, value, occurs in the sequence, S.

    
def smallest(S, value):
    ''' returns the smallest index at which 
    value is found in sequence, S '''
    n = len(S)
    j = 0  # loop invariant true
    while j < n:
        if S[j] == value:
            return j  # match found at index j
        j += 1  # loop invariant true
    return -1  #loop invariant true

Now look at the algorithm and ask yourself: What can be true before the loop, can be true at the end of each iteration in the loop, and can also be true when the loop terminates (when j >= n)? I labelled the loop invariants as comments to help you. You can see that before the start of the loop, we have not found the smallest index for value. If each iteration in the loop ends, that means we have not still found the smallest index for value. And if the loop eventually terminates and returns -1, we still have not found the smallest index for value. Therefore, the loop invariant here is, L: the smallest index for value has not been found.

So, at loop termination, where j >= n, we have reached our goal which is to prove that there is no smallest index for value in S. But if there is, we have also reached another goal which is to show that the smallest index, j, for value exists. It also helps us to walk through the logic of the loop to show that in whatever of the outcomes happens, we will reach our goal.

Why would this prove that the algorithm is correct? Because in all these instances where the loop invariant is true, it promises us that the loop will terminate successfully and never go into an endless loop and also that it will return the expected index for value, j, if there is value in the sequence, S.

I believe you have gotten the drift of loop invariants and how they help in proving loop algorithms.

Let’s do one more example, this time using a for loop. In the code below, we want to prove that the function, sum_values, will always return the sum of the items in the sequence, S, provided the items are ints.

    
def sum_values(S):
    ''' returns the sum of the items in S
    where items are ints '''
    total = 0 # loop invariant true
    for item in S:
        total += item  # loop invariant true
    return total # loop invariant true

Now, what would be the loop invariant? Look carefully at the code. You will see that before the loops is entered, we declared a variable total and assigned it the value 0. Now, total is supposed to give us the total sum of the items in S. So, before the loop is entered, total gave 0, which is the total sum if we had not accessed the loop. This is true. Now when we entered the loop, at the end of each iteration of the variable item, total will give us the sum of all the items to that point. This is also true at the end of each iteratioin. So, the loop invariant that total gives the total sum is true. When the loop exits, that is, when the items in S are finished, total will return giving us the total sum of the items in S. This is also true for the loop invariant. So, we have proved that our algorithm is correct. Total which the function returns will always give us the total sum of the items in the sequence, S.

Interesting, not so? Yes, and very good way to prove your loops. I tell you, I never have loops that give false results or run into an infinite loop since I discovered loop invariants. That is why I thought it was pertinent to share it with you. It would be selfish for me not to.

I hope you enjoyed this post and would like to use loop invariants to prove your algorithms. You will surely enjoy doing so. If you would like to receive regular updates from my blog, just subscribe by email. Hope to see you soonest.

Happy pythoning.

Check If Python List Contains Unique Elements Or Items

While going through a Reddit group post on python recently, I saw a question that was similar to the element uniqueness problem and the way the answer was formulated struck me as odd. I thought the implementation was not optimal or efficient, so I volunteered an answer that I knew was better.

python unique elements in list

 

The question was: how do I check to see if my python list contains elements or items that are unique or not duplicates?

And the response code that was given was this:

    
def unique_bad(S):
    ''' S a list or sequence 
    returns True if all the elements are unique
    '''
    for i in range(len(S)):
        for j in range(i+1, len(S)):
            if S[i] == S[j]: 
                return False
    return True

There is nothing wrong with the code because it really does the job, but it does the job badly. How do we know that? Let’s take the worst case scenario for the code. The code has two loops, the outer loop with the i variable and the inner loop with the j variable. Depending on the length of the sequence, the outer loop takes at least n-1 iterations in the worst case. The inner loop takes n(n+1)/2 iterations in the worst case owing from the fact that it does 1+2+3…+n-1+n iterations after the outer loop. So, in the asymptotic sense, the worst case running time for this code is O(n2). This is not really too bad. A quadratic running time for a program is fair enough. But I called it bad because there is a better way to implement this if we know the nature of the problem.

You can see that for each outer iteration, i, the code looks for a duplicate in the inner iteration, j, and this is very repetitive. In the fortunate case where two duplicates lie side by side, the code would exit early but if not, it would take a longer time. So, taking from this idea, why don’t we make it that if there were duplicates they should lie side by side. This could help us to write code that takes a shorter worst case running time.

In fact, we can do so by first sorting the list. When we sort the list, if there were duplicates they would lie side by side and then when we iterate through it looking for duplicates, we would catch duplicates very early.

That is what this better code seeks to do.

    
def unique_better(S):
    ''' S a list or sequence 
    returns True if all the elements are unique
    '''
    new = sorted(S)
    for j in range(1, len(new)):
        if new[j-1] == new[j]:
            return False
    return True        

You can see that I sorted the python list or sequence first and then iterated through the list looking for duplicates. Now the loop runs in linear time rather than quadratic time. If you look through python’s documentation you would discover that the worst case running time of sorted function is O(n long n) while the loop is O(n) time so the code runs in O(n log n) time which is far better and more effective than the earlier code which runs in quadratic time.

So a little understanding of the nature of the problem can lead one to writing better code.

I just wanted to point this out because the question on Reddit intrigued me. It was an example of some of the problems that sorting can easily solve if only the nature of the problem was understood.

Happy pythoning.

Emulating Lists In Your Python Classes With MutableSequence

Python provides for ways we can emulate the native data structures in our own defined classes. In this post, I will show you one way you can emulate a list. A list is a mutable sequence that has items which are ordered by index.

python mutablesequence

 

In my post on operator overloading and dunder methods, I showed you a way a class can behave like a list. But we can extend that concept a little further by implementing a much simpler emulator. This is because in the post I was concentrating on illustrating operator overloading and dunder methods and not on making the emulation simpler.

The recommended way to emulate a list in your python defined classes is to inherit from the MutableSequence class which is found in the collections.abc module. This class will give your classes the power to act like lists with all the methods and functions that you can find in a list. But since this class is an abstract base class (I discussed abstract base classes on the python abstraction post), there are some abstract methods that first need to be implemented by your defined class. These abstract methods are __getitem__(), __setitem__(), __delitem__(), __len__(), and insert. You will notice that these are dunder methods which make your class to have the functionality of a list.

I will use an example to illustrate this process, a Student class, that inherits from MutableSequence. A student object has a name and a list of grades.

So, the constructor for the Student class will go like this:

    
def __init__(self, name):
        self.name = name
        self.grades = []

Then to implement the __getitem__() abstract dunder method from the MutableSequence ABC, we used this code:

    
def __getitem__(self, key):
        ''' retrieves an item by its index, key'''
        return self.grades[key]

It returns the grade or item at the given key or index.

Then to implement the __setitem__() abstract dunder method, we used this code:

    
def __setitem__(self, key, value):
        ''' set the item at index, key, to value '''
        checkvalue(value) 
        self.grades[key] = value

The method is passed the key and value we want to modify and before it does so, it first checks to see whether the value is an int. The checkvalue function is defined within the module. If the value is not an int, checkvalue returns a TypeError but if no error was returned on the check, then __setitem__ proceeds to modify the value at the given key. Here is the checkvalue code:

    
def checkvalue(value):
        if not isinstance(value, int):
            raise TypeError('Pass in an int object')

Now to implement the __delitem__() abstract dunder method, we used this code:

    
def __delitem__(self, key):
        ''' removes the item at index, key '''
        del self.grades[key]

As you can see it simply removes the grade or item at the given key.

Then to implement the __len__() method we used the following code:

    
def __len__(self):
        return len(self.grades)

The statement in the body of the __len__() dunder method just calls on the built-in len method on our grades list and returns the result.

Now finally, we will implement the insert() abstract method to complete the inheritance using this code:

    
def insert(self, key, value):
        ''' add an item, value, at index, key. '''
        checkvalue(value)
        self.grades.insert(key, value)

Using this method you can now insert a grade at any given position in the grades list of the Student class.

Now, our Student class has implemented all the required abstract methods from the MutabeSequence class. That makes it a list emulator. So, with all these done, we can now use the student objects as a list.

Below is a complete code that you can run showing just how it was done. In the driver code, the student object called on the append, reverse, pop and sum functions which are all native list object functions. You can read the driver code and then run it to see how beautifully the student object, michael, now acts like a list with the above mentioned functions.

You can download the code here, student_mutablesequence.py, so that you can try out other list functions to see how it perfectly emulates a list.

Now, you have one more arsenal that shows you the wonderful capability of python to give power to user classes through various means. In fact, you can emulate any container data type in python with the abstract base classes in the collections.abc module. Check it out and experiment with doing so. You will surely find it useful one day.

Happy pythoning.

Operator Overloading and Dunder methods in Python

In python, operator overloading means giving an extended meaning to an operator beyond its native meaning. An example that comes to mind is the + operator. You must have noticed that it has several meaning depending on the type of its operands. If the operands are ints or floats, like in ‘2+3’ then the + operator means add both ints together, but if the operands are strings like in ‘dog’+’cat’ then + operator means to concatenate both strings together. You can also use the + operator on lists or any other sequence type. This is what operator overloading is all about. Now, would it not be nice if we can carry out this same functionality on our newly created classes? So, python has created dunder methods that make operator overloading possible on newly created classes and not only on built-in data types.

python operator overloading

 

Dunder methods (the full name is double underscore) are a set of methods that have a double underscore preceding the name and a double underscore after the name. If you read my post on OOP in classes and objects, you must have seen it when I implemented one with the __init__() method. Some people prefer to call dunder methods special methods or magic methods. These set of methods enable you to enrich your newly created classes so that they can emulate the behavior of the built-in data types.

Would it not be nice to be able to create a class that would emulate all the fancy methods of the list class, or even the string class? That would make your new class not only more flexible but a very interesting object.

Below is a list of some dunder methods in the python data model. I will be implementing them with examples. If you want the complete list, you can consult the python data model reference.

Common Syntax Dunder or Special Method Forms
a + b a.__add__(b); RHS: b.__radd__(a)
a - b a.__sub__(b); RHS: b.__rsub__(a)
a / b a.__truediv__(b); RHS: b.__rtruediv__(a)
v in a a.__contains__(v)
iter(a) a.__iter__()
next(a) a.__next__()
str(a) a.__str__()

To illustrate the table above before we begin creating example classes, let’s take for example the + operator and its dunder method, __add__(). Imagine you created a new class and you want to be able to add attributes of the class together. If you do not implement the __add__() dunder method or special method, python will raise an error. But the moment you implement the __add__() method, whenever python sees a statement like a + b where a and b are instances of the new class, it will then search for a definition of the dunder method, __add__(), and when it finds it, it implements it. Python gives deference to the object on the left side of the operator such that when it is looking for the definition, it first operates like a call of the form a.__add__(b) where a becomes the calling object and b becomes the parameter. Where object a doesn’t have an __add__() method, it then looks to the right side of the operator, at object b, to see whether b has the special method by using the alternative form, b.__radd__(a). Notice the difference in the names for the left hand side call and the right hand side call. When the right hand side call doesn’t implement the dunder method, python will then throw an error.

This special way of calling dunder methods makes it so that even where the right side and left side operands are different objects, provided they implement the dunder method for that operator, the operation can be executed. This distinction between __add__() and __radd__() makes it possible that where an operation for a class is non-commutative, it can be executed.

As you can see from the list I provided above, special or dunder methods are not restricted to operator overloading. There are special functions that are built-in to the native data types that we want our new classes to emulate. Like for example, the ability of our new classes to behave like sequences, or to have string representations. Dunder methods exists for that. For example, given the statement, str(a), which is a string representation of object a, new classes cannot automatically implement this behavior but the dunder method, __str__() when implemented will enable new classes to emulate string representation. Also, remember that an object can be an iterable if it implements the __iter__() method such that it can participate in for loops.

In order not to burden you with explanations, I will illustrate all the above with example classes: a Student and Grades class. A student object is simply an identity for a student with a collection of grades. The list data structure is used to hold this collection. A grade object is just a grade that is an underlying int. In our Grades instance objects, we want to be able to add two grade objects, and divide two grade objects when looking for the average grade. We also want to have a string representation of grades so it can be printed. For a student object, we want the student object to act like an iterator so it can participate in for loops, we want a means of printing out the contents of the grades list of any student and also be able to run the ‘in’ operator on a student to find out if a grade is in the list of grades. Instances of the student and grade objects cannot do this unless we implement the dunder methods for them.

So, using examples, we will show how to implement all the desired behavior for the grades and student objects. But first I will like you to see how the Student and Grades classes were defined here.

You can download the class definitions here if you desire an in-depth look, student_grade.py.

Following is a concise explanation of the several operator overloading we are doing and the dunder methods that were implemented.

1. Overloading the + operator using dunder method, __add__()

To be able to add grades together for a student so that we can get the sum of that student’s grades, we need to overload the + operator. We do this using __add__() dunder method, or special method. In the Student class, I defined a sum_of_grades() method that accumulates all the grades for a student and when adding grades together it calls the dunder method of the Grades class.

The __add_() dunder method (lines 56-57) just takes the grades of two grade objects and sums them up and returns the sum. Just that. In the sum_of_grades() method, lines 35-42, we create a total variable that is also a grade object which is used to accumulate the values. This method, after adding the grades together, returns the total grades.

With this example, I believe that now you should have a basic understanding of how operator overloading and dunder methods are implemented.

Let’s take more examples.

2. Overloading the division operator, /, with the __truediv__() dunder method

Another operation we want to perform on the student grades is to get the average grade after summing them up. That requires a division operation. Now to be able to do this in our grade objects, we need to overload the division operator, /, using the dunder method __true__div().

Let’s take an example. Run the code below to see how it goes.

One code that makes this possible is the implementation of __truediv__() method in lines 59-61. Where we just return the division of two grades. Then in the Student class definition, the method, average_grade, while doing the actual division, calls on the dunder method when a grade object is the numerator and another grade object is the denominator.

3. Dunder method, __str__() for string representation of objects

In order to carry out string representation of our grade and student objects, we have to implement __str__() dunder method. This makes it possible that we can print out to the screen grade objects or student objects. When print is called on a student or grade object, it casts the object to a string using str(object) and then this casting looks for the dunder method associated with the object. That is why we needed to implement this dunder method in our objects.

Let’s give an example of how this was done in the code. First, I will print out one grade object and another student object.

If you look at the last two lines in the driver code, we called print on math_grade, a Grades object, and also on michael, a Student instance. Print then called their dunder method and printed out what was implemented in the dunder method. Lines 44-45 is the implementation of the dunder method for Student instance objects while lines 63-64 is the implementation of the dunder method for Grades instance objects.

4. Dunder method to make the Student instance objects iterators

In my post on iterators, I made you to understand that for an object to be an iterator (that is act like a container that gives out its values when needed), it needs to implement the __next__() and __iter__() dunder methods. Any object that implements these two methods can participate in for loops, and can be treated like iterables because all iterators are iterables etc. In fact, many of the capabilities of containers will be possible for such an object and you know containers are one of the hearts of python data structures.

So, to make the Student instance objects iterators, I implemented the __next__() dunder method (lines 25-32) and the __iter__() dunder method which by default should return self in lines 22-23.

Now our student objects can participate in for loops like native data structures. We can also use the ‘in’ operator on our student objects.

The last five lines of the driver code shows how I used the student object, michael, in a for loop and also used the ‘in’ operator.

This brings us to an interesting area I would like you to be aware of. Containers in python are given some defaults when it comes to dunder methods. If you look at the list of operators and their dunder methods above, you would see that the ‘in’ operator has a dunder method, __contains__() but we did not have to implement this dunder method in our Student class. This is because when we made the Student class define functionality for iterators, making it like a container class, it then by default implemented the __contains__() dunder method.

If you read through the referenced data model, you will also find other instances of defaults that some classes can have. I really cannot go on describing all the operator overloading capabilities and dunder methods that exists. It is a very long list. I would recommend you reference the data model pages which link is above if you want to find out other examples.

I hope I increased your knowledge of operator overloading and dunder methods in this post. If you want to receive regular updates like this, just subscribe to the blog.

Happy pythoning.

Python Modules And Libraries: The Essentials

The pillar of modular programming in python is the module. A module, in its simplest form, is any script that has the .py extension attached to it. Each of these files has classes, functions and definitions which can be used by anyone who is using the module. Another name for modules in python programming are libraries. You must be familiar with several python libraries or modules like the os, math, numpy, matplotlib and random module because I have written articles on some of them. In this post, I will dwell on the essentials of python modules and libraries along with their benefits.

python modules and libraries

 

The benefits of python modules and libraries.

Imagine having to write every code yourself for any task you want to carry out if the function is not built-in. That would be time consuming, not so? Modules increase our productivity by allowing us to concentrate on the specific parts of our code which no one has written before. Since a lot of programmers are writing code all the time, we can leverage on our own productivity by being able to use their code rather than reinvent the wheel.

Also, python modules increase bug reporting. When classes, functions and definitions are defined within their namespaces in a module, you can easily isolate a bug in your code rather than bundling everything into one namespace in one file.

Modules also increases readability of code. It is easier to read a code that has fewer lines and is importing from other modules than to read a code that has so much lines that your eyes get tired. This is what modularity is about.

Getting Python modules list in your computer.

To get a listing of all the available python modules in your machine, at the python interpreter shell you can issue the command: help(‘modules’) and it will give you a list of all available python modules that are on your machine.

You would get a list such as the one from my machine whose graphic is below:

python modules and libraries


How to use modules in your own programs

For example you have a module or python library whose functions or classes you want to use. Imagine you are doing some math calculations and want to be able to use the pi definitions and sine functions and you know that a module named math has those definition and functions, how do you implement the module?

To do so, we use the import keyword. All you would have to do is to import the math module this way:

    
import math

Here is the syntax to import any module into your programs:

    
import module_name

Where module_name is the module you are importing into your program. When you import a module, python program loads the definitions from the module into the program’s namespace. In doing so, it gives you the ability to use definitions from the module in your program. For example, in the math module after importing it, I would want to use the pi definition as well as the square root definition. This is what I would do.

Notice in the program above that I used the definitions for pi and the square root function, sqrt(), that are included in the math module without having to write them myself. That is the power that modularization gives you.

Notice that in the code above, I first qualified the definitions in the module with the module name. This is to prevent name conflicts if I needed to use the same name in my program. But where you are sure there would be no name conflict, you could make it possible to use only pi or sqrt() without qualifying them by specifically importing them from the math module this way:

    
from math import pi, sqrt 

print('Pi is:', pi)

print('The square root of 36 is:', sqrt(36))

Notice that when importing the math module I used the keyword: ‘from’ and then specifically imported only the pi and sqrt function. It enabled me to use them without qualifying these definitions.

Most times, in order to avoid ambiguity, I rather prefer qualifying definitions rather than not qualify them.

There are times when the name of the module is very long and we don’t want to be using such a long name often. Then, we can assign an alias for the module name which will serve as a short hand notation. For example, in my xml parsing program from another post, the module name was very long, so I had to alias it using a short hand of the form:

    
import xml.etree.ElementTree as etree 

tree = etree.ElementTree(etree.fromstring(xml))

You can see that each time I wanted to use that module as qualifier, I used the shorthand rather than the complete long name. Let’s use a short hand for our math module:

Importing Modules that are not in your machine

What if a module you want to use is not already installed on your machine but you know the name of the module? Not to worry because python has a convenient way of installing modules into your machines that is easy. The program is called pip.

To import a module with module_name onto your machine, just use the following command on your terminal:

    
python -m pip3 install module_name

And pip will get the process running. It will even give you a report of how it is installing the said module in the background.

How to create your own modules

You can easily create your own modules with their definitions and functions. All you need to do is just write the definitions and functions in a file, name the file whatever name you wish and let the extension be .py. That’s all. That file can now be used as a module. If you want another program to use the definitions in the module, just save this file in the PYTHONPATH directory or save the file in the same directory as the program that will use it. That’s all.

Let me walk you through the process.

Supposing you wrote this cool factorial function, my_factorial(), and you want to be able to reuse them in subsequent programs. Then you could make it part of a module and name it fact_func.py. Then, save it in a directory where a program would use it. Then all you have to do is ask the program to import fact_func module and then you can use the my_factorial function.

When commands in a module should not be executed

There are times when you have written a module and you don’t want any program importing the module to have access to some definitions or functions when it is imported except when the module is used directly. For example, this might apply to some commands for testing the module. Python has a handy way for asking programs importing a module not to execute some commands. What you do is to insert those statements or commands in a separate section of the file. This section is delimited by the statement:

    
if __name__ == '__main__':
    '''statements in this section
    are not executed when the module
    is imported'''

Notice how I used the if statement above to delimit the section that instructs python not to execute any statement or function in that section.

Let’s do this with examples. For example, let’s take our factorial function. After writing the factorial function, I would want to write a function that tests the factorial function to make sure it is working properly. I don’t want any other program importing this module to see this test function but I want to keep it for record purposes in case I need to make changes to the factorial function in the future, then I could do the following:

Notice that I created a separate section for testing it beginning with the if… statement above. Then ran that test at that section. When anyone imports this module, they would not be able to see the test; I wouldn’t want anyone to see it except when I am debugging or rewriting the function, or if the module is used directly and not when imported.

What are the best python modules

I often read online about the top ten python modules and libraries everyone should know. That is just a fallacy and click bait. There is no top ten. Modules are created to fill a void where one exists. You choose a module provided the definitions and functions can help you in your program. If such a module has definitions and functions that you need, then it is a module that you would use otherwise you have no business with it. So, the top module for any programmer is the module that he would use to get a task done better.

But when it comes to data science, there are some modules which do the work best you have to be aware of like matplotlib for data visualization and plotting, as well as numpy for multi-dimensional arrays and numerical programming.

So, I hope I helped increase your python knowledge with this post.

Happy pythoning.

Python Abstraction: A Guide

The notion of abstraction comes from the ability to distil a complicated system into its component parts or fundamental parts. It involves hiding the complexity and showing only the fundamentals. For example, take a car. A car has several functionalities when you enter inside. It has a steering, a brake, a seat, lights etc. When abstracting a car, you break down these component parts and give them names that you use to describe them as well as their functionalities that makes up the whole, the car.

python abstraction

In python, when you specify a data type as an abstract data type, you are specifying the type of data that is stored, the operations supported on it, and the types of the parameters on the operations. This is called the public interface of the abstract data type. And python has a great deal of latitude in providing the specifications for an interface. In python, interface specification uses duck typing. That means, you don’t check the data types at compile time and there are no formal requirements that abstract classes need to be declared. Instead a programmer would have to assume, when in python, that an object supports a known set of behaviors which if they fail the interpreter will raise an error. Duck typing is attributed to a poet, James Whitcomb Riley, who stated that “when I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.”

Mechanism for supporting abstract data types.

Abstraction and interfaces in python are generally implemented by using abstract classes and interfaces. Formally, python provides an abstract base class (ABC). An abstract base class cannot be instantiated, but it defines one or more common methods that all subclasses of the class must have. You realize an ABC by declaring one or more concrete classes that inherit from the abstract base class while providing implementation of the methods declared by the ABC. To declare a class as an abstract base class, ABC, that class needs to inherit from the abc module, a module that provides formal support for ABCs.

When an ABC needs to be an interface, it needs to just provide the method names without any method body. Also, the method is decorated with @abstractmethod to tell python interpreter to look for implementation of that method in the subclasses or concrete classes inheriting from that ABC.

Let’s give an example of data abstraction in python. Let’s say there is a bank providing three methods of payment - payment by credit card, payment by mobile phone, and payment through social media like whatsapp. The banking app could provide a Payment ABC which is declared such and then provides an abstract method, payment, which needs only be implemented by concrete classes inheriting from this payment class.

Let’s code the Payment class ABC as a python abstract class.

    
from abc import ABC, abstractmethod

class Payment(ABC):

    def print_receipt(self):
        print('Receipt printed for payment', 
                  self.__class__.__name__)

    @abstractmethod
    def payment(self):
        pass 

You can see from the code above that I first imported ABC from the abc module. Then the Payment class inherits ABC class in that module, making it a python abstract class. The Payment class has two methods: a concrete method, print_receipt, that applies to all instances of the class including subclasses and an abstract method, payment, which will have to be implemented by concrete classes inheriting from it. I decorated the abstract method with the @abstractmethod decorator to tell python what to expect. Notice that in the print_receipt method, I made a reference to the class name of the object calling the method so that we can be able to find out which subclass we are printing the receipt for.

Now, let’s write concrete classes that inherit from the Payment class and see how they implement the payment method, abstracting its functionality so they can define their own way of payment.

You can see now that each subclass only needs to implement the method declared as an abstract method. If they don’t, python will give an error. After implementing that class, they behave differently while still being recognized as members of the Payment class.

Whenever an object which implements the abstract method invokes the abstract method, its own implementation is called and executed. That’s how python abstraction works. So, you don’t have to be peering about what is happening under the hood because all you need to know is that mobile payment, credit card payment and whatsapp payment work.

I believe you now have a good working knowledge of how abstraction in python works and can implement your own abstract classes and interfaces. Data abstraction in python allows your class to have several objects doing the same thing in their own distinct way. You need not know how it works but just that it works. Isn’t data abstraction in python beautiful? It’s a cool feature of the language.

In fact, python abstraction is one of the OOP concept every programmer needs to know. I have covered other OOP concepts like inheritance and also polymorphism in other posts.

Happy pythoning.

Constructing An XML Parser In Python

XML, or extensible markup language, is a markup language defining a set of rules for encoding documents such that they can be both human-readable and machine-readable. The World Wide Web Consortium has published a set of standards that define XML. You can reference the specifications here. Although XML was initially designed for documents, its use case has included several types of media and files.

python xml parser

 

A well-formed XML document among other things would include elements and attributes. An element, like in HTML, is a logical document component that begins with a start-tag and ends with an end-tag. A start-tag is denoted as <tag_name> while the end-tag as </tag_name>. An empty tag is a combination of both and is denoted as <tag_name />. An element could also have attributes within the start-tags or empty tags. Attributes are name-value pairs for the document and each name can only have one value. Example of an element with an attribute is <subtitle lang=’en’> where the subtitle element has the lang attribute with ‘en’ value. At the top of the XML document is a root element which is the entry into the document.

Now, in our code that parses XML we will only be dealing with elements and attributes.

To parse XML, python has an API for doing that. The module that implements the API is the xml.etree.ElementTree module, so you would have to import this module into your python file to use the API.

What the xml.etree.ElementTree module contains

This module is a simple API for parsing and creating XML in python. Although it is robust, it is not secure against maliciously constructed data. So, take note. Among several classes of interest, for our parsing activity we will be concentrating on two classes in this module – ElementTree which represents the whole XML document as a tree, and Element which represents a single node in the tree.

To import an XML document you could import it from a file or pass it as a string.

To import it from a file use the following code:

    
import xml.etree.ElementTree as etree
tree = etree.parse('data.xml')
root = tree.getroot()

while to get it directly from a variable as a string use the following code:

    
import xml.etree.ElementTree as etree
xml = 'data as string'
root = etree.fromstring(xml)

The root variable above refers to the root element in the XML document.

The ElementTree constructor

We will be using the ElementTree constructor to get to the root of our XML document, so it is worth mentioning here. The syntax for the constructor is xml.etree.ElementTree.ElementTree(element=None, file=None). The constructor can accept an element which serves as the root element as argument or you could pass it a file that contains the XML document. What it returns is the XML document as a tree that could be interacted with.

One interesting method of this class is the getroot() method. When you call this method on an ElementTree root, it returns the root element in the XML document. We will use the root element as our doorway into the XML document. So, take note of this method because we will be using it in our parsing code below.

That’s all we need from ElementTree class. The next class we will need is the Element class.

Objects of the Element class.

This class defines the Element interface. It’s constructor is xml.etree.ElementTree.Element(tag, attrib={}, **extra). But we will not be creating any elements but just using the attributes and methods. Use the constructor to create an element. But you can see from the constructor definition that an XML document element has two things: a tag and a dictionary of attributes. Objects of this class defines every element in the XML document.

Some interesting attributes and methods we will be using from this class are:

a. Element.attrib: This returns a dictionary that represents the attributes of the said element. What is included in the dictionary are name-value pairs of attributes in the Element or what some call Node in an XML document.

b. Element.iter(tag=None): this is the iterator for each element. It recursively iterates through the children of the element and gives you all the children, even the children of its children recursively. You could filter which result it can give by providing a tag argument that specifies the specific tag whose children you want to receive information about. It iterates over the element’s children in a depth-first order. But if you do not want to get the children in a recursive fashion but only want the first level children of any element, then you can use the next method below.

c. List(element): This is casting an element to a list. This casting returns a list of all the children, first level only, of the element. This method replaces the former Element.getchildren() method which is now deprecated.

So, I believe you now have a simple introduction into some of the features of the xml.etree.ElementTree module. Now, let’s implement this knowledge by parsing some XML documents.

The XML document we are going to parse is a feed for a blog. The XML document is given below:

    
<feed xml:lang='en'>
        <title>SolvingIt?</title>
        <subtitle lang='en'>
               Programming and Technology Solutions
                     </subtitle>
        <link rel='alternate' type='text/html' 
         href='https://emekadavid-solvingit.blogspot.com' />
        <updated>2020-09-12T12:00:00</updated>
        <entry>
            <author>
                <name>Michael Odogwu</name>
                <uri>
                https://emekadavid-solvingit.blogspot.com
                </uri>
            </author>
        </entry>
    </feed>   

You can reference this document in the code while reading the code. You can see that the XML document has elements or nodes and the root tag is named feed. The elements also have attributes.

The first task we are going to do is that we are going to find the score of the XML document. The score of the XML document is the sum of the score of each element. For any element, the score is equal to the number of attributes that it has.

The second task is to find the maximum depth of the XML document. That is, given an XML document, we need to find the maximum level of nesting in it.

So, here is the code that prints out the score and maximum depth of the XML document above. I want you to run the code and compare the result with what you would have calculated yourself. Then, after running the code, the next section is an explanation of relevant points in the code along with a link to download the script if you want to take an in-depth look at it.

Now, for an explanation of the relevant sections of the code. I will use the lines in the code above to explain it.

Line 1: We import the module, xml.etree.ElementTree and name it etree.

Lines 23-35: The XML document.

Line 36, 37: Using the fromstring method of the module, we import the xml document and pass it to the ElementTree constructor which then constructs a tree of the document. Then from the tree created we get the root element (or node) so that we can parse the document starting from the root element.

Line 39: We pass the root element to our function, get_attr_number, that calculates the score of the XML document.

Lines 3-8: What the get_attr_number function does is that it takes the root element or node and recursively iterates through it using node.iter() to get all the children, even the nested children. For each child element, it calculates the score for that child by finding out the length of the attribute dictionary in it, len(i.attrib) and then adds this score to the total score. It then returns the total score as the total variable.

Next is to find the maximum depth. In the XML tree, we take the root element, feed, to be a depth of 0. Take note.

Lines 41,42: Here the depth function is called, passing it the root element of the tree and the default level is noted as -1. Then maxdepth, a global variable, is printed out after the depth function has finished execution. I now describe the depth function.

Lines 12-20: When this function is called, it increases the level count by 1 and checks to see if the level is greater than the maxdepth variable in order to update maxdepth. Then for each node or element, if that element has children, list(elem), it calls the function, depth, recursively.

You can download the above code here, xmlparser.py.

Now, I believe you understand how the code works. I want you to be creative. Think of use cases of how you can use this module with other XML functions like creating XML documents, or writing out your own XML documents and parsing them in the manner done above. You can also check out another parser I wrote, this time an HTML parser.

Happy pythoning.

Validating Credit Cards With Python Regex

We have been exploring python regex in previous posts because it is an interesting area of programming. It gives me the adrenaline surge when I get a pattern right. I usually use regex101.com to confirm my patterns in python regex.

 

python regex

Because I have touched extensively on the syntax of python regex in previous posts, like on the syntax of python regex and the methods that are used in python regex, I will go straight to describing today’s task.

In today’s coding challenge, we want to validate credit card numbers. We know credit cards are 16 digits long, but what goes into their creation? For a credit card number to be valid, it needs to have the following characteristics:

1. It must start with 4, 5, or 6 (assuming that is the requirement for our bank MZ. Other banks have different requirements but we can scale)

2. It must contain exactly 16 digits.

3. It must only consist of digits, i.e, 0 – 9.

4. It may have digits in groups of 4 separated by a single hyphen, ‘-‘.

5. It must not use any other separator except the hyphen.

6. It must not have 4 or more consecutive repeated digits.

Now, that list of requirements is long. Yes, it is long. But we can scale to the requirements. Here is code that does just that. I would like you to read the code and then run it to see the output. Later, I will explain each of the regex patterns and what the code is doing.

Now that you have read it and run it, I sure hope you understand the code. Not to worry, I am here to walk you through the lines of the code. But first, let me explain some relevant details in regex that will help you understand the code. That is, the python regex meta characters that would be of help.

? This is called the optional match. This tells python to match 0 or 1 repetitions of the preceding regular expression.
{m} This tells python to match exactly m repetitions of the preceding regular expression.
[abc] Used to indicate a set of characters. Any character within the set is matched. This example matches either a, b, or c that are within the set.
(…) This matches whatever regular expression is inside the parenthesis and indicates the start and end of a group. The contents of the group can be retrieved after the match has been performed and can be matched later in the string with the \number special sequence.
\number Matches the contents of the group with this number. Group numbering starts from 1.
(?!...) matches if … doesn’t match next. This is the negative lookahead assertion. For example Isaac(?!Asimov) will match Isaac only if it is not followed by Asimov.

Now that I have explained all the relevant meta characters you need to understand the code, let’s go through the code, starting first with the patterns for the match.

On lines 4 and 5, you can see that I wrote two patterns we will be using to do the matches.

Line 4, the valid_structure pattern: r"[456]\d{3}(-?\d{4}){3}$”. First it indicates a set of characters to start off the pattern, [456]. That means the start of the pattern should be a 4, 5, or 6 based on our credit card characteristic. Then this should be followed by exactly 3 digits. \d indicates digits. After this we have a group which consists of an optional hyphen, -, and then exactly 4 digits. This group should be matched three times. When we calculate the digits together, this goes to 16. So, with the valid_structure pattern, we have satisfied nearly all the requirements, except the requirement that there should be no 4 consecutive repeats of any digit.

That is where no_four_repeats pattern comes in, on line 5. The pattern is r"((\d)-?(?!(-?\2){3})){16}". Now let’s go through it. First we did a global grouping. This is group 1. Then we grouped the digit at the start of the pattern; it will become group 2. What the pattern is saying is that a digit could be followed by a hyphen. Then we did a negative lookahead assertion in a group. In the negative lookahead assertion we said that the group should not include an additional hyphen and other digits exactly three times. What we are saying is that the grouping in the negative lookahead assertion should not exist in the string and if it does, there is no match. Then all the digits in the group should be exactly 16 digits. If you want a refresher on negative lookahead assertions, you can check this blog post on assertions in regular expressions.

The rest of the function following from the patterns is self explanatory. We pack the patterns into a tuple in line 6. From lines 8 to 12 we search for a match based on the patterns and the list of credit cards that was passed.

I hope you did find the code interesting. It was beautiful matching credit card numbers. I hope to bring you more like this if I find any challenge that seems interesting.

If you would like to receive instant updates when I post new blog articles, subscribe to my blog by email.

Happy pythoning.

Techniques for packing and unpacking in python

When I first started learning python, one of the cool things I learned was packing and unpacking of sequences like tuples. I fell in love with them that I ended up using them all the time. They were a feature I had found only in python and javascript.

python packing and unpacking

 

Packing and unpacking, also called destructuring i.e the ability to extract multiple values from data stored in objects, has lots of techniques I have discovered over the years. I will highlight some in this post.

a. Packing and unpacking of sequences

As a convenient way to assign variables with a single line of code, automatic packing is an often used technique that can serve you well. It usually is applied to sequences like tuples, lists, and ranges. For a refresher on sequences and how they are a subset of iterables, see this post.

For example, we could automatically pack and unpack this list into three variables, x, y, and z.

We could do a similar thing with a tuple, unpacking the tuple and assigning its values.

Most often where programmers apply this automatic packing and unpacking technique is when they want to return multiple values from a function. Take this function for quotients and remainder that returns a tuple.

You can see that when the call returns the output, a tuple, is automatically packed into two variables. You get to see this often in code and this is a cool feature from python.

The packing feature has its dual companion, the unpacking feature for sequences. I highlighted one above but let me show you another with unpacking in tuples.

See how python nicely unpacks the range and assigns it immediately to the variables on the left.

One thing you need to note when packing and unpacking with sequences is that the number of variables on the left should match the number of items you want to unpack or pack on the right.

You do not only pack and unpack on the right side of the assignment operator. You can also pack and unpack on the left. Take this example.

I first assigned the first element to variable ‘a’ and then packed the remaining items in the tuple into ‘d’. Just use your creativity.

You can unpack in a for loop. Remember, for loops need an iterable. That is why unpacking can be deployed there also. Take this list of tuples and see how one can unpack the tuples in a for loop.

This technique of unpacking in a for loop is what you are doing when you call the dictionary.items() method with a syntax like this:

This technique of packing and unpacking has resulted in the tradition of simultaneous assignments in python which has come to replace the need for temporary variables when you want to swap values among two variables. For example, in the former archived way of swapping, one could do it this way:

    
temp = y
y = x 
x = temp

This has come to be replaced with the more convenient and readable simultaneous assignment syntax that goes like this:

x, y = y, x

Don’t you just love the way python is evolving? I just adore it.

Now, packing and unpacking is not restricted to sequences. It extends to the arena of function arguments and function calls.

2. Packing and unpacking in functions

There are several techniques for packing and unpacking in functions and I will highlight some of the common ones.

a. Python argument tuple packing and unpacking

To carry out a packing of the arguments that are passed to the function, you just precede the formal parameter with an asterisk, *. Then any value or object that is passed to the function will be packed into a tuple.

Here is an example.

Because this argument packing can accept multiple types, you would be wise to assert the types you desire in your function.

Python argument tuple packing has its counterpart, python argument tuple unpacking. This is when on the calling side of the function you precede the object with an asterisk. The object would usually be a sequence or iterable. Then when the object is passed to the function it is unpacked.

Take this example:

Note that the number of formal arguments should be equal to the object when it is unpacked.

But who says we cannot do unpacking and packing at the same time. Yes, you can do packing and unpacking in function calls at the same time.

Let’s go to another data structure that is not a sequence – a dictionary.

b. Python argument dictionary packing and unpacking.

Just as you can do packing and unpacking for sequences, you can also do the same for dictionaries, but in the case of dictionaries you use double asterisks, **, to indicate that you are dealing with a key=value pair. While reading the python documentation several times, even on pylint if your IDE has pylint installed, you would find formal arguments specified as **kwargs. The **kwargs formal argument just says that the object that is passed would be packed into a dictionary.

Let’s take an example of a common case.

The keywords passed in the function call can be any number but the function will pack all of them into a dictionary that fits.

Now let’s illustrate the argument dictionary unpacking. This occurs at the opposite end, at the function call. This is when the object passed is preceded by double asterisk. This tells the function that the object, which is a dictionary, should be unpacked.

One final example.

f(**names) is just a shorthand for the explicit keyword arguments f(x=’A’, y=’A+’, z=’C’).

There are so many techniques which have sprung up from the principle of packing and unpacking iterables. If you have some you want to share which I didn’t mention, make a comment below and I will be sure to update this post.

Thanks. Happy pythoning.

Complete Methods For Python List Copy

After my post on python shallow and deep copy, a reader asked me: you can also copy a list with list slicing and it is fast. Which should I use?

Well, I decided to dedicate a post on the numerous ways you can copy a python list and then evaluate their timing to find out which is the fastest in order to give a concise answer.

python list copy

 

So, here are the different methods one after the other.

1. The built-in python list.copy method

This method is built-in for sequences and a list is a sequence. Because it is built-in, I guarantee you that like everything python built, it should be fast. I just love using this method whenever I need to copy a list. But let the timing will tell us the most efficient.

An example of how you can use it to copy a list is:

As you can see from the code names2 was independent of names after copying. So, it gives desired behavior. But I need to tell you a caveat. List.copy() does a shallow copy; it cannot recursively copy nested lists. Too bad.

2. Slicing the entire list

Yes, I said it again. When you slice the entire list, you eventually copy everything into another object. This method is so cool. The syntax is listname[:]. That’s all you need to do to copy.

Let’s try this with an example.

Yes, it is extremely convenient. It worked just as we expected, producing an independent list as output even when the original was changed. Like the first method, this method of slicing to copy python lists is shallow copy also. Too bad.

3. Using the built-in list constructor, list()

This is just like creating a list from another list. The syntax is list(originallist). It returns a new object, a list.

Here is an example.

4. Use the generic shallow copy method.

For the generic shallow copy method, you need to import the copy module: import copy. Then call the copy method of the module on the original list: copy.copy(originalist). I talked all about how to do this in the post on python shallow copy and deep copy. You can reference it for a refresher.

Here is an example.

So, as we expected. The returned list, names2, was independent of the original list, names. But as the name says, it does shallow copy. That means it cannot copy recursively. Like where we have a nested list, it cannot copy deep down but returns a reference to the nested items.

5. The generic deep copy method

This is the last and the method I use whenever I have lists in my custom classes and need to copy them. This method copies deep down, even to the nested items in a nested list. It is also a method of the copy module. You can read all about it in the link I gave above.

Let’s do an example.

I really need to do one more example with this method, to show that it copies deep down even to nested lists.

As you can see from the above nested list, when we change one of the nested items in the original list, the copy did not reflect that change to show that it was not copying by reference but copying deep down to the values.

Now that you are familiar with all the different ways to copy a list, which is the most time efficient?

First, I will have to tell you that if you have a nested list or an array, the only method you can use is the python deep copy method. That is the only method that copies everything in the nested list or array without leaving any references.

Now, for the other types of lists, that is, lists that are not nested, all the methods can be used so we will now try to find out which is more time efficient by timing their processes.

Which method is more time efficient?

To test it out, you have to run the code below and see for yourself.

You will notice that the built-in python list copy method was approximately faster than all the other methods of copying a list. That’s why I love using any function or method that is built-in specifically for any data type or data structure. But list slicing comes at a close second place. Although I would not want to use list slicing if I have a very large list.

That’s it. I hope you did enjoy this post. Watch out for other interesting posts. Just update via your email and they will surely land right in your inbox.

Happy pythoning.

Visualizing ‘Regression To The Mean’ In Python

Let’s take a philosophical bent to our programming and consider something related to research. I decided to consider regression to the mean because I have found that topic fascinating.

regression to the mean python

 

What is regression to the mean?

Regression to the mean, or sometimes called reversion towards the mean, is a phenomenon in which if the sample point of a random variable is extreme or close to an outlier, a future point will be close to the mean or average on further measurements. Note that the variable under measure has to be random for this effect to play out and to be significant.

Sir Francis Galton first described this phenomenon when he was observing hereditary stature in his book: “Regression towards mediocrity in hereditary stature.” He observed that parents who were taller than average in the community tend to give birth to children who became shorter or close to the community average height.

Since then, this phenomenon has been described in other fields of life where randomness or luck is also a factor.

For example, if a business has a highly profitable quarter in one year, in the next coming quarter it is likely not to do as well. If one medical trial suggests that a particular drug or treatment is outperforming all other treatments for a condition, then in a second trial it is more likely that the outperforming drug or treatment will perform closer to the mean the next quarter.

But the regression to the mean should not be confused with the gambler’s fallacy that states that if an event occurs more frequently than normal in the past, then in the future it is less likely to happen even where it has been established that in such events the past does not determine the future i.e they are independent.

I was thinking about regression to the mean while coding some challenge that involved tossing heads and tails along with calculating their probability, so I decided to add a post on this phenomenon.

This is the gist of what we are looking for in the code. Suppose we have a coin that we flip a set number of times and find the average of those times. Then we aggregate the flips for several trials. For each trial, we look for the averages that were extremes and find out if the average flip after that extreme regressed towards the mean. Note that the mean of the flip of a coin is 0.5 because the probability that a fair coin will come heads is ½ and the probability it will come tails is also ½.

So after collecting the extremes along with the trial that comes after it, we will want to see if the trials were regressing towards the mean or not. We do this visually by plotting a graph of the extremes and the trials after the extremes.

So, here is the complete code. I will explain the graph that accompanies the code after you run it and then provide a detailed explanation of the code by lines.

After you run the above code, you will get a graph that looks like that below.

regression to mean python


We drew a line across the 0.5 mark on the y-axis that shows when the points cross the average line. From the graph you will see rightly that for several occasions, when there are extremes above or below the average line, the next trial results in an flip that moved towards the mean line except for one occasion when it did not. So, what is happening here? Because the coin flip is a random event, it has the tendency to exhibit this phenomenon.

Now, let me explain the code I used to draw the visuals. There are two functions here, one that acts as the coin flip function and the other to collect the extremes and subsequent trials.

First, the code for the coin flip.

    
def flip(num_flips):
    ''' assumes num_flips a positive int '''
    heads = 0
    for _ in range(num_flips):
        if random.choice(('H', 'T')) == 'H':
            heads += 1
    return heads/num_flips

The function, flip, takes as argument a specified number of flips that the coin should be tossed. Then for each flip which is done randomly, it finds out if the outcome was a head or a tail. If it is a head, it adds this to the heads variable and finally returns the average of all the flips.

Then the next function, regress_to_mean.

    
def regress_to_mean(num_flips, num_trials):
    # get fractions of heads for each trial of num_flips
    frac_heads = []
    for _ in range(num_trials):
        frac_heads.append(flip(num_flips))
    # find trials with extreme results and for each 
    # store it and the next trial
    extremes, next_trial = [], []
    for i in range(len(frac_heads) - 1):
        if frac_heads[i] < 0.33 or frac_heads[i] > 0.66:
            extremes.append(frac_heads[i])
            next_trial.append(frac_heads[i+1])
    # plot results 
    plt.plot(extremes, 'ko', label = 'Extremes')
    plt.plot(next_trial, 'k^', label = 'Next Trial')
    plt.axhline(0.5)
    plt.ylim(0,1)
    plt.xlim(-1, len(extremes) + 1)
    plt.xlabel('Extremes example and next trial')
    plt.ylabel('Fraction Heads')
    plt.title('Regression to the mean')
    plt.legend(loc='best')
    plt.savefig('regressmean.png')
    plt.show()

This function is the heart of the code. It flips the coin a set number of times for a set number of trials, accumulating each average for each trial in a list. Then later, it finds out which of the averages is an extreme or outlier. When it gets an outlier, it adds it to the extremes list, and then adds the next trial to the next_trial list. Finally, we used matplotlib to draw the visuals. The visuals is a plot of the extremes and next_trial figures with a horizontal line showing the average line for the viewer to better understand what direction the next trial is expected to move to when there is an extreme.

I hope you sure enjoyed the code. You can run it on your machine or download it to study it, regress_to_mean.py.

Thanks for your time. I hope you do leave a comment.

Happy pythoning.

Matched content