Search

Generating Power Through Python Generator Functions And Iterators

In my post on python iterators, I mentioned that one limitation of using python iterators in user defined objects is that they do not allow you to have more than one pass at the iterator when you have encountered the StopIteration exception. To conveniently overcome that limitation and give you programming power, the creators of python decided to design an object that is not only an iterable and iterator, but is also a function that yields values. That object is a python generator.

Python generators are like this steel frame
 

In this post, I will describe what a generator is and the advantages conferred on your programming when you use generators.

What are python generators?

Python generators are a special class of python functions that make the task of writing iterators very simple. While regular python functions will compute a value and return it, a python generator will return an iterator that returns a stream of values. To get up to speed with iterators, you can read up this post on python iterators. In regular functions you use a return statement to return a value, but in python generators you use a yield statement to indicate each element that is to be returned in a series.

The simple definition of a generator is any function that contains a yield keyword.

Let’s illustrate this with examples.

For example, take the function of computing the factors of a number.


def factors(n):
    ''' returns all the factors of n as a list '''
    results = []
    for k in range(1, n+1):
        if n % k == 0:
            results.append(k)
    return results

In the function, factors, given a number we divide it by every number between 1 and that number. Whenever any number divides it without a remainder, that number is a factor and we store that number in the results list. At the end of the iterative division, we return the results list containing all the factors of the given number.

I want you to notice the following deficiencies of regular functions like this. 1. We had to populate the results list with all the numbers, waiting until everything was complete and then store all the values in memory. That takes up time and memory space. 2. When the function returned its results, all the variables used in the namespace of the function were garbage collected or thrown away. We can not get them again unless we call the function another time. 3. We cannot pause and resume the function if we want to.

What if we had a function that has the ability to overcome the deficiencies above and has the ability to be iterable? That is where a python generator comes in. Now, let’s use a generator to compute the factors of a number this time around. Take note of where the yield keyword is placed in the generator function.


def factors(n):
    for k in range(1, n+1):
        if n % k == 0:
            yield k

Note that in the generator function, the return statement has been replaced by a yield statement. Also, we do not need to populate the results with all the factors like we did with the regular function but since the generator function produces an iterator that iterates through the values, we yield each of the factors as needed to the iterator. What this means is that since a python generator function produces a generator iterator, we could use the generator function in a for loop.

Take the following code as an example.

Notice that the generator function produced a gen_iterator that was itself an iterable since it implements the __iter__() and also the __next__() method. The for loop was only automatically calling on those methods and yielding the results from the generator iterator which yields the results from the generator function.

A python generator function can contain more than one yield statement and it yields the values following each yield statement in turn. Taking a cue from our generator function, we can optimize it with more than one yield statement owing to the fact that the quotient of a division of a number by a factor is also a factor, and also by testing values up to the square root of the number.

Notice that while the first iteration yielded the factors in sequential order, this second implementation although it is more optimized, did not. It just goes to show that a generator function remembers where it was in the scheme of things when it yields a result and resumes operation from where it left off. This goes to show that the big difference between a yield and a return statement is that when the return statement is executed, all variables are discarded from the function, but when the yield statement is executed, the state of execution of the generator is suspended and all local variables in the namespace is preserved. It then resumes execution from where it stopped on another invocation of the generator when the caller calls the __next__() method of the generator iterator.

In the code example above, I asked the for loop to call the __iter__() and __next__() methods of the generator iterator automatically. I would like you to see a visual demonstration of how this works. I would use a command line invocation of a python generator function for this. For example, say we have a generator function that generates ints up to a given number. Let us see how it would be doing this with yield.

 

a command line example of python generator

You can see from the command line screenshot above that when we called ints_gen(3) in order to yield 3 integer values, it created an iterator. We know that iterators are defined by the __next__() method. So, when we call the next function on the iterator, it yields each of the 3 integers one after the order until it gets to the end and then raises StopIteration exception which every python iterator raises on getting to the end of their iteration. This is just a simplification of how the generator function works with an intermediate generator iterator.

One thing to note too is that generator functions can also have a return statement. They do not preclude a return statement. A generator function with a return statement will raise StopIteration exception when control flow goes to the return statement, ending all processing of values.

User defined classes with generators.

According to the documentation, writing your own user defined classes that act as generators can be a messy issue. You can make a workaround by reflecting on the fact that generator functions produce iterators, and the __iter__() method also produces iterators. So, what you do is make the class you want to have a generator to be an iterable that implements the __iter__() method and let it yield its results. Here is a python generator example as a workaround.

Happy pythoning.

Using A Python Iterator To Get Data

In the last post about python iterables, we discussed what it means to be an iterable – being able to participate in the for loop and implementing the __iter__() method to create iterator objects. There is another related concept in python that takes this ability to participate in python for loops a bit further. The concept of being an iterator. This is very important because people often get confused about what it means to be a python iterable from being a python iterator.

fractals are like python iterators
 

In fact, you are basically enabling your object to participate in python for loops or to be used to retrieve a stream of data when you implement the __iter__() method (make an object an iterable) but that is not enough because as I showed you in the user defined class in the last post, you need to implement one more method, the __next__() method to complete the process. So why you need the __next__() method is because __iter__(), which makes your object an iterable, just returns an iterator object but implementing __next__() makes it possible for you to access the elements in the iterator object and defines that object as an iterator. So, with this we are ready to define what it means to be an iterator.

What it means to be a python iterator

To be a python iterator, an object just needs to implement the __next__() method. This method helps the object to remember its state when returning the next value in the iteration, update its state so that it can point to the next value, and signals when there are no more elements in the stream by raising the StopIteration exception. That is it. An iterator is just able to remember what it is doing while retrieving a stream of data.

Python recommends that any object that implements the __next__() method should also implement __iter__() method and when doing so return the object itself. So, this makes it that python iterators are also python iterables. Remember that fact because that is where many persons get confused. We covered this in the post on iterables.

In summary, iterators are like iterables that participate in for loops or in functions like map, zip etc which need iterables and remembers where it is when retrieving items from the object.

Now that we have a definition, let’s take examples. Several built-in datatypes support iteration like lists and dictionaries, so we will use them for examples.

See what happens when you call iter() (which invokes the __iter__() method) and then next() (which invokes the __next__()) on a dictionary object which we will use as our loop in python example.

As you can see from the code above, the dictionary looped through its keys when it was used as the argument to the next method.

You can do the same thing above with any native python iterable. They were built to act as iterators.

Python has made it that when you carry out a python for loop the process of calling iter(object) and next() is automated so that you really don’t realize what is happening under the hood.

You should note that once the StopIteration exception is raised for an object, it must continue to raise that exception on subsequent calls to the next method. This is because in memory what you have is an empty container or iterator. To make the object start all over again and return the stream, you need to call iter method afresh if it is a container object like a list or dictionary, but if not, there is nothing to do but to use a python generator. This occasion is why you often do not see python iterators being used often because python generators come in handy to help you when you need multiple passes to a non-container iterator object. We will discuss python generators in the next post because they are interesting python functions, so just watch out for it.

User defined python iterators

Iterators that you define yourself in code just need to implement __iter__() which produces an iterator object and __next__() which helps you to traverse the elements in the stream of data. That’s just that; what I have been saying all along. I touched on this in the iterables discussion. This is some code that could be used to produce a user defined iterator that is based on the list datatype.

As I said before, one deficiency of iterators is that they only support one pass. If you attempt a second pass at them, they behave like empty containers. You can try it out and see for yourself. Because of this limitation on having only one pass, when I want to access the items in an object as a stream, I just use them as an iterable using python for loop. But when I want to be able to generate values, I use a generator.

Some things you can do with iterators is to materialize the iterator object as a tuple, list etc, do sequence unpacking on them, or even use the max and min functions on them.

Happy pythoning.

Python Iterables Are Not Just About Sequences

Lots of times when I read code, I see people thinking that python iterables are just about python sequences like python lists, tuples, or strings. The most culprit are python lists. When they want to create a custom class that is iterable, they would rather make the underlying data structure a list in order to make use of the methods that are supported by sequences. I want to use this post to make you understand that python iterables are not just sequences. Iterables include a whole lot of objects than just sequences.

 

First, what is an iterable?

The definition of a python iterable.

Basically, a python iterable is any object that you can loop over. The object can be a python sequence like lists, strings, tuples, bytes, or they can be python collections like dictionaries, sets, frozensets, or they can even be file objects. These are all objects that are capable of returning their members or elements one item at a time. If you so desire, you can define your own user defined objects and can make them an iterable. I will show you how in later examples of loops in python.

Also, on a practical level, you can define a python iterable as anything that can appear in a for loop. I really don’t need to give an example here but think of anything you have put on the right side of a for loop in your code and that object is an iterable. The list goes on and on. Also, anything that you can put as an argument to the zip and map functions are python iterables. Therefore, knowing how the for loop operates, we can give a technical definition of an iterable.

Technically, an iterable is any object whose class implements the __iter__() special method or if you want to specify sequence semantics, which implements the __getitem__() special method. You really need to implement __iter__() method for your iterable when you need a generalized python iterator. But if you want to play with a sequence type in your object, then all you need to do is implement the __getitem__() special method.

As I do to in all my posts, let’s illustrate the definitions above with examples. Let’s first give examples of python iterables that are not python sequences. We’ll be using the practical definition: ability to participate in python for loops.

First, we’ll show that python dictionaries are iterables. Using for loop directly on the dictionary, python iterates over the dictionary based on the keys, but it has a powerful method, items, that can help one to iterate over the keys and values at the same time.

File objects are also iterables. You can replace the ‘eba.txt’ file in the code below with any text file of your choice. All I wanted to show was that the file handling object, fh, is a python iterable since it can participate in a for loop.


with open('eba.txt') as fh:
    for line in fh:
        print(line)

Then finally what you must be familiar with, python sequences. All sequence types are iterables. But not all iterables are sequence types as we have noted above.

Python strings are iterable sequences.

Lists and tuples are sequence types and also iterable. In fact, all sequence types are iterable. They give examples of loops in python.

All the types above that are iterable are custom data types. What about user defined types? I said above that user defined types can be made iterable. How? By making them implement the __iter__() method or if you desire sequence semantics, the __getitem__() method. Let’s use the __iter__() method because later I will show you how to implement the __getitem__() method.

The Fruits class below uses a python list as the underlying data structure. We implemented the __iter__() method which returns an iterator object, itself. All implementers of this method will return themselves as iterator objects. To enable the for loop to access each of the items in the iterator object, we need to implement another method, the __next__() method. The __next__() method defines an object as a python iterator and iterators are also iterables. What the __next__() method below does is just to go through each of the items using their index, which is also a data attribute, and returning each of the items with that index. When it gets to the end of the list, it returns the Stopiteration exception to the python for loop which then stops asking for more items.

One thing I want you to note from above code is that all the built-in iterables implement the __iter__() method that is why when you explicitly call iter(object) on them, they will give you an iterator object. You can read on python iterators here.

Now, let me discuss on one special type of iterable and those are sequences.

What are sequences?

Sequences are iterables but they support looping through the items in the sequence using indices. So, everything you do with a python iterable, you can also do with a python sequence. That is why when you read code, you wonder if everyone thinks only sequences are iterables. For an object to be a sequence, it must implement the __getitem__() and __len__() special methods. I discussed using the __len__() special method on user defined objects in another post. So, all python sequences like lists, tuples, strings implement these two methods.

Let’s give an example of a user defined object that acts as an iterable by mimicking the sequence semantics. Here, the Fruits class implements the __getitem__() and __len__() methods.

This code is not different from the earlier user defined code except that this time it is implementing the __getitem__() method rather than the __iter__() method.

If you ask me, which should I use, the __iter__() method or the __getitem__() method for user defined objects? The answer is – it depends on what you want to do with your user defined objects. It is rare to see implementations of the __iter__() method because of the limitations associated with iterators and python has made it possible to produce iterators easily using generators. But it is common to implement the __getitem__() method if you want your object to behave like a sequence, or even a collection.

And if you want further functionality, you could make collections.abc.Sequence the base class for your class. When you do this, then you could be able to carry out further functionalities that sequences have like find out the count of an item, get the index of an item, find out if an item is in the object etc.

Let me give some examples. First, I imported the Sequence abstract base class from collections.abc. Then, I made it the parent class to my class, Fruits. I also implemented the __contains__() special method to be able to use the “in” operator on instances of the class. Here is some code that shows the added functionality of my new Fruits class that is separate from what the native Fruits class above could do.

We cannot end the discussion without noting how python iterables support lazy evaluation of values.

Lazy evaluation in python.

According to Wikipedia.org article on lazy evaluation strategy, this is a technique which delays the evaluation of an expression until the value is needed and which also avoids repeated evaluations. Python supports the lazy evaluation technique in iterables. For example, with the built-in python range function, we don’t need to explicitly produce all the values that are needed for the range but instead use them as when needed due to the lazy evaluation technique. This helps us to save memory. For example, take the following call to range to evaluate a million items. I decided I didn’t need to print the values beyond the 100th, so I decided to break the loop at the 101th item.


for i in range(1000000):
    if i == 101:
        break
    print(i)

If python did not use the lazy evaluation technique while producing the items, it could have produced a million items while I only required just the first 100.

Lazy evaluation is also seen when you are using the items or values functions of a dictionary. Python would only get the key or value based on when and whether you need them or not. It doesn’t just populate memory with all the keys and values. Here is a python iteration over dictionary keys and values.


fruits_dict = {'mango': 1, 'orange': 3, 'pineapple': 7, 'melon': 4}
for key, value in fruits_dict.items():
    print(key, value)

Lazy evaluation saves time and memory space. This is one feature that is very powerful in python programming.

But if on the other hand, you do need all the values from the iterator that is created during lazy evaluation, you can just cast it to a list or tuple. For example, using the range earlier, If I really needed all the million items, instead of retrieving them one at a time, I can cast it to a list and get everything at once.


range_list = list(range(1000000))

You can read up the documentation glossary on iterables and use them to your pleasure. Happy pythoning.

Matched content