Search

Check If Python List Contains Unique Elements Or Items

While going through a Reddit group post on python recently, I saw a question that was similar to the element uniqueness problem and the way the answer was formulated struck me as odd. I thought the implementation was not optimal or efficient, so I volunteered an answer that I knew was better.

python unique elements in list

 

The question was: how do I check to see if my python list contains elements or items that are unique or not duplicates?

And the response code that was given was this:

    
def unique_bad(S):
    ''' S a list or sequence 
    returns True if all the elements are unique
    '''
    for i in range(len(S)):
        for j in range(i+1, len(S)):
            if S[i] == S[j]: 
                return False
    return True

There is nothing wrong with the code because it really does the job, but it does the job badly. How do we know that? Let’s take the worst case scenario for the code. The code has two loops, the outer loop with the i variable and the inner loop with the j variable. Depending on the length of the sequence, the outer loop takes at least n-1 iterations in the worst case. The inner loop takes n(n+1)/2 iterations in the worst case owing from the fact that it does 1+2+3…+n-1+n iterations after the outer loop. So, in the asymptotic sense, the worst case running time for this code is O(n2). This is not really too bad. A quadratic running time for a program is fair enough. But I called it bad because there is a better way to implement this if we know the nature of the problem.

You can see that for each outer iteration, i, the code looks for a duplicate in the inner iteration, j, and this is very repetitive. In the fortunate case where two duplicates lie side by side, the code would exit early but if not, it would take a longer time. So, taking from this idea, why don’t we make it that if there were duplicates they should lie side by side. This could help us to write code that takes a shorter worst case running time.

In fact, we can do so by first sorting the list. When we sort the list, if there were duplicates they would lie side by side and then when we iterate through it looking for duplicates, we would catch duplicates very early.

That is what this better code seeks to do.

    
def unique_better(S):
    ''' S a list or sequence 
    returns True if all the elements are unique
    '''
    new = sorted(S)
    for j in range(1, len(new)):
        if new[j-1] == new[j]:
            return False
    return True        

You can see that I sorted the python list or sequence first and then iterated through the list looking for duplicates. Now the loop runs in linear time rather than quadratic time. If you look through python’s documentation you would discover that the worst case running time of sorted function is O(n long n) while the loop is O(n) time so the code runs in O(n log n) time which is far better and more effective than the earlier code which runs in quadratic time.

So a little understanding of the nature of the problem can lead one to writing better code.

I just wanted to point this out because the question on Reddit intrigued me. It was an example of some of the problems that sorting can easily solve if only the nature of the problem was understood.

Happy pythoning.

Emulating Lists In Your Python Classes With MutableSequence

Python provides for ways we can emulate the native data structures in our own defined classes. In this post, I will show you one way you can emulate a list. A list is a mutable sequence that has items which are ordered by index.

python mutablesequence

 

In my post on operator overloading and dunder methods, I showed you a way a class can behave like a list. But we can extend that concept a little further by implementing a much simpler emulator. This is because in the post I was concentrating on illustrating operator overloading and dunder methods and not on making the emulation simpler.

The recommended way to emulate a list in your python defined classes is to inherit from the MutableSequence class which is found in the collections.abc module. This class will give your classes the power to act like lists with all the methods and functions that you can find in a list. But since this class is an abstract base class (I discussed abstract base classes on the python abstraction post), there are some abstract methods that first need to be implemented by your defined class. These abstract methods are __getitem__(), __setitem__(), __delitem__(), __len__(), and insert. You will notice that these are dunder methods which make your class to have the functionality of a list.

I will use an example to illustrate this process, a Student class, that inherits from MutableSequence. A student object has a name and a list of grades.

So, the constructor for the Student class will go like this:

    
def __init__(self, name):
        self.name = name
        self.grades = []

Then to implement the __getitem__() abstract dunder method from the MutableSequence ABC, we used this code:

    
def __getitem__(self, key):
        ''' retrieves an item by its index, key'''
        return self.grades[key]

It returns the grade or item at the given key or index.

Then to implement the __setitem__() abstract dunder method, we used this code:

    
def __setitem__(self, key, value):
        ''' set the item at index, key, to value '''
        checkvalue(value) 
        self.grades[key] = value

The method is passed the key and value we want to modify and before it does so, it first checks to see whether the value is an int. The checkvalue function is defined within the module. If the value is not an int, checkvalue returns a TypeError but if no error was returned on the check, then __setitem__ proceeds to modify the value at the given key. Here is the checkvalue code:

    
def checkvalue(value):
        if not isinstance(value, int):
            raise TypeError('Pass in an int object')

Now to implement the __delitem__() abstract dunder method, we used this code:

    
def __delitem__(self, key):
        ''' removes the item at index, key '''
        del self.grades[key]

As you can see it simply removes the grade or item at the given key.

Then to implement the __len__() method we used the following code:

    
def __len__(self):
        return len(self.grades)

The statement in the body of the __len__() dunder method just calls on the built-in len method on our grades list and returns the result.

Now finally, we will implement the insert() abstract method to complete the inheritance using this code:

    
def insert(self, key, value):
        ''' add an item, value, at index, key. '''
        checkvalue(value)
        self.grades.insert(key, value)

Using this method you can now insert a grade at any given position in the grades list of the Student class.

Now, our Student class has implemented all the required abstract methods from the MutabeSequence class. That makes it a list emulator. So, with all these done, we can now use the student objects as a list.

Below is a complete code that you can run showing just how it was done. In the driver code, the student object called on the append, reverse, pop and sum functions which are all native list object functions. You can read the driver code and then run it to see how beautifully the student object, michael, now acts like a list with the above mentioned functions.

You can download the code here, student_mutablesequence.py, so that you can try out other list functions to see how it perfectly emulates a list.

Now, you have one more arsenal that shows you the wonderful capability of python to give power to user classes through various means. In fact, you can emulate any container data type in python with the abstract base classes in the collections.abc module. Check it out and experiment with doing so. You will surely find it useful one day.

Happy pythoning.

Is There Life On Venus? Or Are We Alone?

Recently, it was reported in the media that phosphine has been discovered in the acidic clouds of planet Venus. According to scientists, phosphine is a biosignature for life; that means it can only be produced by living organisms. This discovery raises several questions in the minds of many, some of which are: “Are we alone? How substantial is this evidence?” Only time will answer this question when the research is well investigated. According to Carl Sagan, a famous astronomer and astrobiologist: “extraordinary claims require extraordinary evidence.” And that certainly applies to this situation.

alien life on venus

 

What type of planet is Venus?

Planet Venus is the second planet from the sun in our solar system. As you would expect, the temperatures there are very high. Temperatures can range up to 900 degrees Fahrenheit on the surface; so hot that it could melt lead. There are heavy clouds that swirl around the planet that are so acidic that we could not even measure them using our pH scale on earth. That is why Venus is referred to as a hell scape. It would be difficult to imagine life as we know it being on Venus.

Despite that fact, astronomers have voiced that possibility in the past. It was formerly proposed by Carl Sagan, an astronomer, and Harold Morowitz, a biologist, that there was a possibility that microbes could be existing in its acidic clouds; swirling around the planet. So, to confirm this idea, probes were sent to Venus to check out the planet and true to type, those probes melted on entering the planet. So, scientists have concentrated their search for life on the planet to browsing its clouds for microbial life.

What do we know about the gas, phosphine?

Phosphine gas that was discovered on the clouds of Venus is a toxic and explosive molecule with a lingering odor of garlic and dead fish. The gas was discovered on planet Venus at temperatures that were close to that on planet Earth. But the discovery was not much. The researchers describe it as: “finding some tablespoonfuls in an Olympic size swimming pool.” Yet, that amount is enough to pique our curiosity. This is because of how the gas is made here on Earth.

Phosphine gas on earth is made from either of two paths: as a natural byproduct of life, or it is manufactured artificially to produce fumigants and other biochemicals. As a byproduct of life, it is made by oxygen-hating microbes who live in swamps and marshes. It has been noticed by scientists that all living beings contain these microbes and they have called this gas the “biosignature of life”. So, with the reputation phosphine has earned, finding it on planet Venus raises a possibility: “could there be alien life on planet Venus, even if it is restricted to gas-eating microbes?”

Wise to be cautious

The data that has been collected on the presence of phosphine gas in Venus is not substantial to make astrobiologists certain that there could be alien life on planet Venus, although one could say that the potential is there – just potential. The gas could be coming from something else rather than life. An international team of researchers have set out to simulate possibilities for the existence of the gas, modeling scenarios like lightning strikes and meteors bombarding the clouds to see if such could produce any amount of phosphine on the surface of the planet but they came up short. Therefore, one can say that this detection is extraordinary. If nothing else can explain it, then alien life could be the answer. But considering the nature of Venus – a harsh place for life to inhabit – it would take a really strongly acid-loving microbe to be living in those clouds.

That is why scientists are not saying there is alien life yet. The astronomy community has gone this path before of proclaiming that there is alien life only to be disappointed. So, they would rather be cautious and optimistic rather than put a foot out. Also there are still details about the research that needs to be explored.

First, other researchers need to give credence to the claim that the gas is really phosphine. Venus clouds are surrounded by sulfur dioxide and this could influence the readings. Also, observations of the Venetian atmosphere would have to be done to confirm the existence of phosphine gas.

If it is really confirmed that the gas detected is phosphine, then the next step for researchers is to determine the source of the gas. This is really important for an hypothesis to be drawn. It would be foolhardy to run to conclusions at an early stage and say the source is biology. Other possibilities have to be explored and confirmed. If in reality scientists agree that the source is biology, then Venus would have to be explored and discovered. Missions would be sent to Venus to discover where on the clouds the microbes could be existing and if they could lead us to other areas on Venus. The microbe-hunting missions have to be well planned to prevent contaminating the Venetian clouds.

As it is, the data gathered so far cannot answer these questions. Therefore, we will have to wait and see what future research would turn up. If we could find habitable life on other planets, it would help man understand his place in the universe. It would help us understand what it means to be alive; what conditions prompt life and how we can extend it. There is a possibility that Venus would be a planet of future interest to astronomers and astrobiologists as they explore the recent findings about phosphine on the planet.

But right now, we don’t have any definite answer to the question: “Are we alone on the solar system?” We might never have an answer. But exploring the possibilities will open up new vistas of knowledge and expand our ability to solve some of the pressing challenges of planet earth.

The video below is an interesting news commentary on this discovery. Enjoy it.


 

Operator Overloading and Dunder methods in Python

In python, operator overloading means giving an extended meaning to an operator beyond its native meaning. An example that comes to mind is the + operator. You must have noticed that it has several meaning depending on the type of its operands. If the operands are ints or floats, like in ‘2+3’ then the + operator means add both ints together, but if the operands are strings like in ‘dog’+’cat’ then + operator means to concatenate both strings together. You can also use the + operator on lists or any other sequence type. This is what operator overloading is all about. Now, would it not be nice if we can carry out this same functionality on our newly created classes? So, python has created dunder methods that make operator overloading possible on newly created classes and not only on built-in data types.

python operator overloading

 

Dunder methods (the full name is double underscore) are a set of methods that have a double underscore preceding the name and a double underscore after the name. If you read my post on OOP in classes and objects, you must have seen it when I implemented one with the __init__() method. Some people prefer to call dunder methods special methods or magic methods. These set of methods enable you to enrich your newly created classes so that they can emulate the behavior of the built-in data types.

Would it not be nice to be able to create a class that would emulate all the fancy methods of the list class, or even the string class? That would make your new class not only more flexible but a very interesting object.

Below is a list of some dunder methods in the python data model. I will be implementing them with examples. If you want the complete list, you can consult the python data model reference.

Common Syntax Dunder or Special Method Forms
a + b a.__add__(b); RHS: b.__radd__(a)
a - b a.__sub__(b); RHS: b.__rsub__(a)
a / b a.__truediv__(b); RHS: b.__rtruediv__(a)
v in a a.__contains__(v)
iter(a) a.__iter__()
next(a) a.__next__()
str(a) a.__str__()

To illustrate the table above before we begin creating example classes, let’s take for example the + operator and its dunder method, __add__(). Imagine you created a new class and you want to be able to add attributes of the class together. If you do not implement the __add__() dunder method or special method, python will raise an error. But the moment you implement the __add__() method, whenever python sees a statement like a + b where a and b are instances of the new class, it will then search for a definition of the dunder method, __add__(), and when it finds it, it implements it. Python gives deference to the object on the left side of the operator such that when it is looking for the definition, it first operates like a call of the form a.__add__(b) where a becomes the calling object and b becomes the parameter. Where object a doesn’t have an __add__() method, it then looks to the right side of the operator, at object b, to see whether b has the special method by using the alternative form, b.__radd__(a). Notice the difference in the names for the left hand side call and the right hand side call. When the right hand side call doesn’t implement the dunder method, python will then throw an error.

This special way of calling dunder methods makes it so that even where the right side and left side operands are different objects, provided they implement the dunder method for that operator, the operation can be executed. This distinction between __add__() and __radd__() makes it possible that where an operation for a class is non-commutative, it can be executed.

As you can see from the list I provided above, special or dunder methods are not restricted to operator overloading. There are special functions that are built-in to the native data types that we want our new classes to emulate. Like for example, the ability of our new classes to behave like sequences, or to have string representations. Dunder methods exists for that. For example, given the statement, str(a), which is a string representation of object a, new classes cannot automatically implement this behavior but the dunder method, __str__() when implemented will enable new classes to emulate string representation. Also, remember that an object can be an iterable if it implements the __iter__() method such that it can participate in for loops.

In order not to burden you with explanations, I will illustrate all the above with example classes: a Student and Grades class. A student object is simply an identity for a student with a collection of grades. The list data structure is used to hold this collection. A grade object is just a grade that is an underlying int. In our Grades instance objects, we want to be able to add two grade objects, and divide two grade objects when looking for the average grade. We also want to have a string representation of grades so it can be printed. For a student object, we want the student object to act like an iterator so it can participate in for loops, we want a means of printing out the contents of the grades list of any student and also be able to run the ‘in’ operator on a student to find out if a grade is in the list of grades. Instances of the student and grade objects cannot do this unless we implement the dunder methods for them.

So, using examples, we will show how to implement all the desired behavior for the grades and student objects. But first I will like you to see how the Student and Grades classes were defined here.

You can download the class definitions here if you desire an in-depth look, student_grade.py.

Following is a concise explanation of the several operator overloading we are doing and the dunder methods that were implemented.

1. Overloading the + operator using dunder method, __add__()

To be able to add grades together for a student so that we can get the sum of that student’s grades, we need to overload the + operator. We do this using __add__() dunder method, or special method. In the Student class, I defined a sum_of_grades() method that accumulates all the grades for a student and when adding grades together it calls the dunder method of the Grades class.

The __add_() dunder method (lines 56-57) just takes the grades of two grade objects and sums them up and returns the sum. Just that. In the sum_of_grades() method, lines 35-42, we create a total variable that is also a grade object which is used to accumulate the values. This method, after adding the grades together, returns the total grades.

With this example, I believe that now you should have a basic understanding of how operator overloading and dunder methods are implemented.

Let’s take more examples.

2. Overloading the division operator, /, with the __truediv__() dunder method

Another operation we want to perform on the student grades is to get the average grade after summing them up. That requires a division operation. Now to be able to do this in our grade objects, we need to overload the division operator, /, using the dunder method __true__div().

Let’s take an example. Run the code below to see how it goes.

One code that makes this possible is the implementation of __truediv__() method in lines 59-61. Where we just return the division of two grades. Then in the Student class definition, the method, average_grade, while doing the actual division, calls on the dunder method when a grade object is the numerator and another grade object is the denominator.

3. Dunder method, __str__() for string representation of objects

In order to carry out string representation of our grade and student objects, we have to implement __str__() dunder method. This makes it possible that we can print out to the screen grade objects or student objects. When print is called on a student or grade object, it casts the object to a string using str(object) and then this casting looks for the dunder method associated with the object. That is why we needed to implement this dunder method in our objects.

Let’s give an example of how this was done in the code. First, I will print out one grade object and another student object.

If you look at the last two lines in the driver code, we called print on math_grade, a Grades object, and also on michael, a Student instance. Print then called their dunder method and printed out what was implemented in the dunder method. Lines 44-45 is the implementation of the dunder method for Student instance objects while lines 63-64 is the implementation of the dunder method for Grades instance objects.

4. Dunder method to make the Student instance objects iterators

In my post on iterators, I made you to understand that for an object to be an iterator (that is act like a container that gives out its values when needed), it needs to implement the __next__() and __iter__() dunder methods. Any object that implements these two methods can participate in for loops, and can be treated like iterables because all iterators are iterables etc. In fact, many of the capabilities of containers will be possible for such an object and you know containers are one of the hearts of python data structures.

So, to make the Student instance objects iterators, I implemented the __next__() dunder method (lines 25-32) and the __iter__() dunder method which by default should return self in lines 22-23.

Now our student objects can participate in for loops like native data structures. We can also use the ‘in’ operator on our student objects.

The last five lines of the driver code shows how I used the student object, michael, in a for loop and also used the ‘in’ operator.

This brings us to an interesting area I would like you to be aware of. Containers in python are given some defaults when it comes to dunder methods. If you look at the list of operators and their dunder methods above, you would see that the ‘in’ operator has a dunder method, __contains__() but we did not have to implement this dunder method in our Student class. This is because when we made the Student class define functionality for iterators, making it like a container class, it then by default implemented the __contains__() dunder method.

If you read through the referenced data model, you will also find other instances of defaults that some classes can have. I really cannot go on describing all the operator overloading capabilities and dunder methods that exists. It is a very long list. I would recommend you reference the data model pages which link is above if you want to find out other examples.

I hope I increased your knowledge of operator overloading and dunder methods in this post. If you want to receive regular updates like this, just subscribe to the blog.

Happy pythoning.

Python Modules And Libraries: The Essentials

The pillar of modular programming in python is the module. A module, in its simplest form, is any script that has the .py extension attached to it. Each of these files has classes, functions and definitions which can be used by anyone who is using the module. Another name for modules in python programming are libraries. You must be familiar with several python libraries or modules like the os, math, numpy, matplotlib and random module because I have written articles on some of them. In this post, I will dwell on the essentials of python modules and libraries along with their benefits.

python modules and libraries

 

The benefits of python modules and libraries.

Imagine having to write every code yourself for any task you want to carry out if the function is not built-in. That would be time consuming, not so? Modules increase our productivity by allowing us to concentrate on the specific parts of our code which no one has written before. Since a lot of programmers are writing code all the time, we can leverage on our own productivity by being able to use their code rather than reinvent the wheel.

Also, python modules increase bug reporting. When classes, functions and definitions are defined within their namespaces in a module, you can easily isolate a bug in your code rather than bundling everything into one namespace in one file.

Modules also increases readability of code. It is easier to read a code that has fewer lines and is importing from other modules than to read a code that has so much lines that your eyes get tired. This is what modularity is about.

Getting Python modules list in your computer.

To get a listing of all the available python modules in your machine, at the python interpreter shell you can issue the command: help(‘modules’) and it will give you a list of all available python modules that are on your machine.

You would get a list such as the one from my machine whose graphic is below:

python modules and libraries


How to use modules in your own programs

For example you have a module or python library whose functions or classes you want to use. Imagine you are doing some math calculations and want to be able to use the pi definitions and sine functions and you know that a module named math has those definition and functions, how do you implement the module?

To do so, we use the import keyword. All you would have to do is to import the math module this way:

    
import math

Here is the syntax to import any module into your programs:

    
import module_name

Where module_name is the module you are importing into your program. When you import a module, python program loads the definitions from the module into the program’s namespace. In doing so, it gives you the ability to use definitions from the module in your program. For example, in the math module after importing it, I would want to use the pi definition as well as the square root definition. This is what I would do.

Notice in the program above that I used the definitions for pi and the square root function, sqrt(), that are included in the math module without having to write them myself. That is the power that modularization gives you.

Notice that in the code above, I first qualified the definitions in the module with the module name. This is to prevent name conflicts if I needed to use the same name in my program. But where you are sure there would be no name conflict, you could make it possible to use only pi or sqrt() without qualifying them by specifically importing them from the math module this way:

    
from math import pi, sqrt 

print('Pi is:', pi)

print('The square root of 36 is:', sqrt(36))

Notice that when importing the math module I used the keyword: ‘from’ and then specifically imported only the pi and sqrt function. It enabled me to use them without qualifying these definitions.

Most times, in order to avoid ambiguity, I rather prefer qualifying definitions rather than not qualify them.

There are times when the name of the module is very long and we don’t want to be using such a long name often. Then, we can assign an alias for the module name which will serve as a short hand notation. For example, in my xml parsing program from another post, the module name was very long, so I had to alias it using a short hand of the form:

    
import xml.etree.ElementTree as etree 

tree = etree.ElementTree(etree.fromstring(xml))

You can see that each time I wanted to use that module as qualifier, I used the shorthand rather than the complete long name. Let’s use a short hand for our math module:

Importing Modules that are not in your machine

What if a module you want to use is not already installed on your machine but you know the name of the module? Not to worry because python has a convenient way of installing modules into your machines that is easy. The program is called pip.

To import a module with module_name onto your machine, just use the following command on your terminal:

    
python -m pip3 install module_name

And pip will get the process running. It will even give you a report of how it is installing the said module in the background.

How to create your own modules

You can easily create your own modules with their definitions and functions. All you need to do is just write the definitions and functions in a file, name the file whatever name you wish and let the extension be .py. That’s all. That file can now be used as a module. If you want another program to use the definitions in the module, just save this file in the PYTHONPATH directory or save the file in the same directory as the program that will use it. That’s all.

Let me walk you through the process.

Supposing you wrote this cool factorial function, my_factorial(), and you want to be able to reuse them in subsequent programs. Then you could make it part of a module and name it fact_func.py. Then, save it in a directory where a program would use it. Then all you have to do is ask the program to import fact_func module and then you can use the my_factorial function.

When commands in a module should not be executed

There are times when you have written a module and you don’t want any program importing the module to have access to some definitions or functions when it is imported except when the module is used directly. For example, this might apply to some commands for testing the module. Python has a handy way for asking programs importing a module not to execute some commands. What you do is to insert those statements or commands in a separate section of the file. This section is delimited by the statement:

    
if __name__ == '__main__':
    '''statements in this section
    are not executed when the module
    is imported'''

Notice how I used the if statement above to delimit the section that instructs python not to execute any statement or function in that section.

Let’s do this with examples. For example, let’s take our factorial function. After writing the factorial function, I would want to write a function that tests the factorial function to make sure it is working properly. I don’t want any other program importing this module to see this test function but I want to keep it for record purposes in case I need to make changes to the factorial function in the future, then I could do the following:

Notice that I created a separate section for testing it beginning with the if… statement above. Then ran that test at that section. When anyone imports this module, they would not be able to see the test; I wouldn’t want anyone to see it except when I am debugging or rewriting the function, or if the module is used directly and not when imported.

What are the best python modules

I often read online about the top ten python modules and libraries everyone should know. That is just a fallacy and click bait. There is no top ten. Modules are created to fill a void where one exists. You choose a module provided the definitions and functions can help you in your program. If such a module has definitions and functions that you need, then it is a module that you would use otherwise you have no business with it. So, the top module for any programmer is the module that he would use to get a task done better.

But when it comes to data science, there are some modules which do the work best you have to be aware of like matplotlib for data visualization and plotting, as well as numpy for multi-dimensional arrays and numerical programming.

So, I hope I helped increase your python knowledge with this post.

Happy pythoning.

Matched content