Search

Generating Power Through Python Generator Functions And Iterators

In my post on python iterators, I mentioned that one limitation of using python iterators in user defined objects is that they do not allow you to have more than one pass at the iterator when you have encountered the StopIteration exception. To conveniently overcome that limitation and give you programming power, the creators of python decided to design an object that is not only an iterable and iterator, but is also a function that yields values. That object is a python generator.

Python generators are like this steel frame
 

In this post, I will describe what a generator is and the advantages conferred on your programming when you use generators.

What are python generators?

Python generators are a special class of python functions that make the task of writing iterators very simple. While regular python functions will compute a value and return it, a python generator will return an iterator that returns a stream of values. To get up to speed with iterators, you can read up this post on python iterators. In regular functions you use a return statement to return a value, but in python generators you use a yield statement to indicate each element that is to be returned in a series.

The simple definition of a generator is any function that contains a yield keyword.

Let’s illustrate this with examples.

For example, take the function of computing the factors of a number.


def factors(n):
    ''' returns all the factors of n as a list '''
    results = []
    for k in range(1, n+1):
        if n % k == 0:
            results.append(k)
    return results

In the function, factors, given a number we divide it by every number between 1 and that number. Whenever any number divides it without a remainder, that number is a factor and we store that number in the results list. At the end of the iterative division, we return the results list containing all the factors of the given number.

I want you to notice the following deficiencies of regular functions like this. 1. We had to populate the results list with all the numbers, waiting until everything was complete and then store all the values in memory. That takes up time and memory space. 2. When the function returned its results, all the variables used in the namespace of the function were garbage collected or thrown away. We can not get them again unless we call the function another time. 3. We cannot pause and resume the function if we want to.

What if we had a function that has the ability to overcome the deficiencies above and has the ability to be iterable? That is where a python generator comes in. Now, let’s use a generator to compute the factors of a number this time around. Take note of where the yield keyword is placed in the generator function.


def factors(n):
    for k in range(1, n+1):
        if n % k == 0:
            yield k

Note that in the generator function, the return statement has been replaced by a yield statement. Also, we do not need to populate the results with all the factors like we did with the regular function but since the generator function produces an iterator that iterates through the values, we yield each of the factors as needed to the iterator. What this means is that since a python generator function produces a generator iterator, we could use the generator function in a for loop.

Take the following code as an example.

Notice that the generator function produced a gen_iterator that was itself an iterable since it implements the __iter__() and also the __next__() method. The for loop was only automatically calling on those methods and yielding the results from the generator iterator which yields the results from the generator function.

A python generator function can contain more than one yield statement and it yields the values following each yield statement in turn. Taking a cue from our generator function, we can optimize it with more than one yield statement owing to the fact that the quotient of a division of a number by a factor is also a factor, and also by testing values up to the square root of the number.

Notice that while the first iteration yielded the factors in sequential order, this second implementation although it is more optimized, did not. It just goes to show that a generator function remembers where it was in the scheme of things when it yields a result and resumes operation from where it left off. This goes to show that the big difference between a yield and a return statement is that when the return statement is executed, all variables are discarded from the function, but when the yield statement is executed, the state of execution of the generator is suspended and all local variables in the namespace is preserved. It then resumes execution from where it stopped on another invocation of the generator when the caller calls the __next__() method of the generator iterator.

In the code example above, I asked the for loop to call the __iter__() and __next__() methods of the generator iterator automatically. I would like you to see a visual demonstration of how this works. I would use a command line invocation of a python generator function for this. For example, say we have a generator function that generates ints up to a given number. Let us see how it would be doing this with yield.

 

a command line example of python generator

You can see from the command line screenshot above that when we called ints_gen(3) in order to yield 3 integer values, it created an iterator. We know that iterators are defined by the __next__() method. So, when we call the next function on the iterator, it yields each of the 3 integers one after the order until it gets to the end and then raises StopIteration exception which every python iterator raises on getting to the end of their iteration. This is just a simplification of how the generator function works with an intermediate generator iterator.

One thing to note too is that generator functions can also have a return statement. They do not preclude a return statement. A generator function with a return statement will raise StopIteration exception when control flow goes to the return statement, ending all processing of values.

User defined classes with generators.

According to the documentation, writing your own user defined classes that act as generators can be a messy issue. You can make a workaround by reflecting on the fact that generator functions produce iterators, and the __iter__() method also produces iterators. So, what you do is make the class you want to have a generator to be an iterable that implements the __iter__() method and let it yield its results. Here is a python generator example as a workaround.

Happy pythoning.

Using A Python Iterator To Get Data

In the last post about python iterables, we discussed what it means to be an iterable – being able to participate in the for loop and implementing the __iter__() method to create iterator objects. There is another related concept in python that takes this ability to participate in python for loops a bit further. The concept of being an iterator. This is very important because people often get confused about what it means to be a python iterable from being a python iterator.

fractals are like python iterators
 

In fact, you are basically enabling your object to participate in python for loops or to be used to retrieve a stream of data when you implement the __iter__() method (make an object an iterable) but that is not enough because as I showed you in the user defined class in the last post, you need to implement one more method, the __next__() method to complete the process. So why you need the __next__() method is because __iter__(), which makes your object an iterable, just returns an iterator object but implementing __next__() makes it possible for you to access the elements in the iterator object and defines that object as an iterator. So, with this we are ready to define what it means to be an iterator.

What it means to be a python iterator

To be a python iterator, an object just needs to implement the __next__() method. This method helps the object to remember its state when returning the next value in the iteration, update its state so that it can point to the next value, and signals when there are no more elements in the stream by raising the StopIteration exception. That is it. An iterator is just able to remember what it is doing while retrieving a stream of data.

Python recommends that any object that implements the __next__() method should also implement __iter__() method and when doing so return the object itself. So, this makes it that python iterators are also python iterables. Remember that fact because that is where many persons get confused. We covered this in the post on iterables.

In summary, iterators are like iterables that participate in for loops or in functions like map, zip etc which need iterables and remembers where it is when retrieving items from the object.

Now that we have a definition, let’s take examples. Several built-in datatypes support iteration like lists and dictionaries, so we will use them for examples.

See what happens when you call iter() (which invokes the __iter__() method) and then next() (which invokes the __next__()) on a dictionary object which we will use as our loop in python example.

As you can see from the code above, the dictionary looped through its keys when it was used as the argument to the next method.

You can do the same thing above with any native python iterable. They were built to act as iterators.

Python has made it that when you carry out a python for loop the process of calling iter(object) and next() is automated so that you really don’t realize what is happening under the hood.

You should note that once the StopIteration exception is raised for an object, it must continue to raise that exception on subsequent calls to the next method. This is because in memory what you have is an empty container or iterator. To make the object start all over again and return the stream, you need to call iter method afresh if it is a container object like a list or dictionary, but if not, there is nothing to do but to use a python generator. This occasion is why you often do not see python iterators being used often because python generators come in handy to help you when you need multiple passes to a non-container iterator object. We will discuss python generators in the next post because they are interesting python functions, so just watch out for it.

User defined python iterators

Iterators that you define yourself in code just need to implement __iter__() which produces an iterator object and __next__() which helps you to traverse the elements in the stream of data. That’s just that; what I have been saying all along. I touched on this in the iterables discussion. This is some code that could be used to produce a user defined iterator that is based on the list datatype.

As I said before, one deficiency of iterators is that they only support one pass. If you attempt a second pass at them, they behave like empty containers. You can try it out and see for yourself. Because of this limitation on having only one pass, when I want to access the items in an object as a stream, I just use them as an iterable using python for loop. But when I want to be able to generate values, I use a generator.

Some things you can do with iterators is to materialize the iterator object as a tuple, list etc, do sequence unpacking on them, or even use the max and min functions on them.

Happy pythoning.

Matched content