Search

Python Iterables Are Not Just About Sequences

Lots of times when I read code, I see people thinking that python iterables are just about python sequences like python lists, tuples, or strings. The most culprit are python lists. When they want to create a custom class that is iterable, they would rather make the underlying data structure a list in order to make use of the methods that are supported by sequences. I want to use this post to make you understand that python iterables are not just sequences. Iterables include a whole lot of objects than just sequences.

 

First, what is an iterable?

The definition of a python iterable.

Basically, a python iterable is any object that you can loop over. The object can be a python sequence like lists, strings, tuples, bytes, or they can be python collections like dictionaries, sets, frozensets, or they can even be file objects. These are all objects that are capable of returning their members or elements one item at a time. If you so desire, you can define your own user defined objects and can make them an iterable. I will show you how in later examples of loops in python.

Also, on a practical level, you can define a python iterable as anything that can appear in a for loop. I really don’t need to give an example here but think of anything you have put on the right side of a for loop in your code and that object is an iterable. The list goes on and on. Also, anything that you can put as an argument to the zip and map functions are python iterables. Therefore, knowing how the for loop operates, we can give a technical definition of an iterable.

Technically, an iterable is any object whose class implements the __iter__() special method or if you want to specify sequence semantics, which implements the __getitem__() special method. You really need to implement __iter__() method for your iterable when you need a generalized python iterator. But if you want to play with a sequence type in your object, then all you need to do is implement the __getitem__() special method.

As I do to in all my posts, let’s illustrate the definitions above with examples. Let’s first give examples of python iterables that are not python sequences. We’ll be using the practical definition: ability to participate in python for loops.

First, we’ll show that python dictionaries are iterables. Using for loop directly on the dictionary, python iterates over the dictionary based on the keys, but it has a powerful method, items, that can help one to iterate over the keys and values at the same time.

File objects are also iterables. You can replace the ‘eba.txt’ file in the code below with any text file of your choice. All I wanted to show was that the file handling object, fh, is a python iterable since it can participate in a for loop.


with open('eba.txt') as fh:
    for line in fh:
        print(line)

Then finally what you must be familiar with, python sequences. All sequence types are iterables. But not all iterables are sequence types as we have noted above.

Python strings are iterable sequences.

Lists and tuples are sequence types and also iterable. In fact, all sequence types are iterable. They give examples of loops in python.

All the types above that are iterable are custom data types. What about user defined types? I said above that user defined types can be made iterable. How? By making them implement the __iter__() method or if you desire sequence semantics, the __getitem__() method. Let’s use the __iter__() method because later I will show you how to implement the __getitem__() method.

The Fruits class below uses a python list as the underlying data structure. We implemented the __iter__() method which returns an iterator object, itself. All implementers of this method will return themselves as iterator objects. To enable the for loop to access each of the items in the iterator object, we need to implement another method, the __next__() method. The __next__() method defines an object as a python iterator and iterators are also iterables. What the __next__() method below does is just to go through each of the items using their index, which is also a data attribute, and returning each of the items with that index. When it gets to the end of the list, it returns the Stopiteration exception to the python for loop which then stops asking for more items.

One thing I want you to note from above code is that all the built-in iterables implement the __iter__() method that is why when you explicitly call iter(object) on them, they will give you an iterator object. You can read on python iterators here.

Now, let me discuss on one special type of iterable and those are sequences.

What are sequences?

Sequences are iterables but they support looping through the items in the sequence using indices. So, everything you do with a python iterable, you can also do with a python sequence. That is why when you read code, you wonder if everyone thinks only sequences are iterables. For an object to be a sequence, it must implement the __getitem__() and __len__() special methods. I discussed using the __len__() special method on user defined objects in another post. So, all python sequences like lists, tuples, strings implement these two methods.

Let’s give an example of a user defined object that acts as an iterable by mimicking the sequence semantics. Here, the Fruits class implements the __getitem__() and __len__() methods.

This code is not different from the earlier user defined code except that this time it is implementing the __getitem__() method rather than the __iter__() method.

If you ask me, which should I use, the __iter__() method or the __getitem__() method for user defined objects? The answer is – it depends on what you want to do with your user defined objects. It is rare to see implementations of the __iter__() method because of the limitations associated with iterators and python has made it possible to produce iterators easily using generators. But it is common to implement the __getitem__() method if you want your object to behave like a sequence, or even a collection.

And if you want further functionality, you could make collections.abc.Sequence the base class for your class. When you do this, then you could be able to carry out further functionalities that sequences have like find out the count of an item, get the index of an item, find out if an item is in the object etc.

Let me give some examples. First, I imported the Sequence abstract base class from collections.abc. Then, I made it the parent class to my class, Fruits. I also implemented the __contains__() special method to be able to use the “in” operator on instances of the class. Here is some code that shows the added functionality of my new Fruits class that is separate from what the native Fruits class above could do.

We cannot end the discussion without noting how python iterables support lazy evaluation of values.

Lazy evaluation in python.

According to Wikipedia.org article on lazy evaluation strategy, this is a technique which delays the evaluation of an expression until the value is needed and which also avoids repeated evaluations. Python supports the lazy evaluation technique in iterables. For example, with the built-in python range function, we don’t need to explicitly produce all the values that are needed for the range but instead use them as when needed due to the lazy evaluation technique. This helps us to save memory. For example, take the following call to range to evaluate a million items. I decided I didn’t need to print the values beyond the 100th, so I decided to break the loop at the 101th item.


for i in range(1000000):
    if i == 101:
        break
    print(i)

If python did not use the lazy evaluation technique while producing the items, it could have produced a million items while I only required just the first 100.

Lazy evaluation is also seen when you are using the items or values functions of a dictionary. Python would only get the key or value based on when and whether you need them or not. It doesn’t just populate memory with all the keys and values. Here is a python iteration over dictionary keys and values.


fruits_dict = {'mango': 1, 'orange': 3, 'pineapple': 7, 'melon': 4}
for key, value in fruits_dict.items():
    print(key, value)

Lazy evaluation saves time and memory space. This is one feature that is very powerful in python programming.

But if on the other hand, you do need all the values from the iterator that is created during lazy evaluation, you can just cast it to a list or tuple. For example, using the range earlier, If I really needed all the million items, instead of retrieving them one at a time, I can cast it to a list and get everything at once.


range_list = list(range(1000000))

You can read up the documentation glossary on iterables and use them to your pleasure. Happy pythoning.

No comments:

Post a Comment

Your comments here!

Matched content