Search

Complete Methods For Python List Copy

After my post on python shallow and deep copy, a reader asked me: you can also copy a list with list slicing and it is fast. Which should I use?

Well, I decided to dedicate a post on the numerous ways you can copy a python list and then evaluate their timing to find out which is the fastest in order to give a concise answer.

python list copy

 

So, here are the different methods one after the other.

1. The built-in python list.copy method

This method is built-in for sequences and a list is a sequence. Because it is built-in, I guarantee you that like everything python built, it should be fast. I just love using this method whenever I need to copy a list. But let the timing will tell us the most efficient.

An example of how you can use it to copy a list is:

As you can see from the code names2 was independent of names after copying. So, it gives desired behavior. But I need to tell you a caveat. List.copy() does a shallow copy; it cannot recursively copy nested lists. Too bad.

2. Slicing the entire list

Yes, I said it again. When you slice the entire list, you eventually copy everything into another object. This method is so cool. The syntax is listname[:]. That’s all you need to do to copy.

Let’s try this with an example.

Yes, it is extremely convenient. It worked just as we expected, producing an independent list as output even when the original was changed. Like the first method, this method of slicing to copy python lists is shallow copy also. Too bad.

3. Using the built-in list constructor, list()

This is just like creating a list from another list. The syntax is list(originallist). It returns a new object, a list.

Here is an example.

4. Use the generic shallow copy method.

For the generic shallow copy method, you need to import the copy module: import copy. Then call the copy method of the module on the original list: copy.copy(originalist). I talked all about how to do this in the post on python shallow copy and deep copy. You can reference it for a refresher.

Here is an example.

So, as we expected. The returned list, names2, was independent of the original list, names. But as the name says, it does shallow copy. That means it cannot copy recursively. Like where we have a nested list, it cannot copy deep down but returns a reference to the nested items.

5. The generic deep copy method

This is the last and the method I use whenever I have lists in my custom classes and need to copy them. This method copies deep down, even to the nested items in a nested list. It is also a method of the copy module. You can read all about it in the link I gave above.

Let’s do an example.

I really need to do one more example with this method, to show that it copies deep down even to nested lists.

As you can see from the above nested list, when we change one of the nested items in the original list, the copy did not reflect that change to show that it was not copying by reference but copying deep down to the values.

Now that you are familiar with all the different ways to copy a list, which is the most time efficient?

First, I will have to tell you that if you have a nested list or an array, the only method you can use is the python deep copy method. That is the only method that copies everything in the nested list or array without leaving any references.

Now, for the other types of lists, that is, lists that are not nested, all the methods can be used so we will now try to find out which is more time efficient by timing their processes.

Which method is more time efficient?

To test it out, you have to run the code below and see for yourself.

You will notice that the built-in python list copy method was approximately faster than all the other methods of copying a list. That’s why I love using any function or method that is built-in specifically for any data type or data structure. But list slicing comes at a close second place. Although I would not want to use list slicing if I have a very large list.

That’s it. I hope you did enjoy this post. Watch out for other interesting posts. Just update via your email and they will surely land right in your inbox.

Happy pythoning.

Visualizing ‘Regression To The Mean’ In Python

Let’s take a philosophical bent to our programming and consider something related to research. I decided to consider regression to the mean because I have found that topic fascinating.

regression to the mean python

 

What is regression to the mean?

Regression to the mean, or sometimes called reversion towards the mean, is a phenomenon in which if the sample point of a random variable is extreme or close to an outlier, a future point will be close to the mean or average on further measurements. Note that the variable under measure has to be random for this effect to play out and to be significant.

Sir Francis Galton first described this phenomenon when he was observing hereditary stature in his book: “Regression towards mediocrity in hereditary stature.” He observed that parents who were taller than average in the community tend to give birth to children who became shorter or close to the community average height.

Since then, this phenomenon has been described in other fields of life where randomness or luck is also a factor.

For example, if a business has a highly profitable quarter in one year, in the next coming quarter it is likely not to do as well. If one medical trial suggests that a particular drug or treatment is outperforming all other treatments for a condition, then in a second trial it is more likely that the outperforming drug or treatment will perform closer to the mean the next quarter.

But the regression to the mean should not be confused with the gambler’s fallacy that states that if an event occurs more frequently than normal in the past, then in the future it is less likely to happen even where it has been established that in such events the past does not determine the future i.e they are independent.

I was thinking about regression to the mean while coding some challenge that involved tossing heads and tails along with calculating their probability, so I decided to add a post on this phenomenon.

This is the gist of what we are looking for in the code. Suppose we have a coin that we flip a set number of times and find the average of those times. Then we aggregate the flips for several trials. For each trial, we look for the averages that were extremes and find out if the average flip after that extreme regressed towards the mean. Note that the mean of the flip of a coin is 0.5 because the probability that a fair coin will come heads is ½ and the probability it will come tails is also ½.

So after collecting the extremes along with the trial that comes after it, we will want to see if the trials were regressing towards the mean or not. We do this visually by plotting a graph of the extremes and the trials after the extremes.

So, here is the complete code. I will explain the graph that accompanies the code after you run it and then provide a detailed explanation of the code by lines.

After you run the above code, you will get a graph that looks like that below.

regression to mean python


We drew a line across the 0.5 mark on the y-axis that shows when the points cross the average line. From the graph you will see rightly that for several occasions, when there are extremes above or below the average line, the next trial results in an flip that moved towards the mean line except for one occasion when it did not. So, what is happening here? Because the coin flip is a random event, it has the tendency to exhibit this phenomenon.

Now, let me explain the code I used to draw the visuals. There are two functions here, one that acts as the coin flip function and the other to collect the extremes and subsequent trials.

First, the code for the coin flip.

    
def flip(num_flips):
    ''' assumes num_flips a positive int '''
    heads = 0
    for _ in range(num_flips):
        if random.choice(('H', 'T')) == 'H':
            heads += 1
    return heads/num_flips

The function, flip, takes as argument a specified number of flips that the coin should be tossed. Then for each flip which is done randomly, it finds out if the outcome was a head or a tail. If it is a head, it adds this to the heads variable and finally returns the average of all the flips.

Then the next function, regress_to_mean.

    
def regress_to_mean(num_flips, num_trials):
    # get fractions of heads for each trial of num_flips
    frac_heads = []
    for _ in range(num_trials):
        frac_heads.append(flip(num_flips))
    # find trials with extreme results and for each 
    # store it and the next trial
    extremes, next_trial = [], []
    for i in range(len(frac_heads) - 1):
        if frac_heads[i] < 0.33 or frac_heads[i] > 0.66:
            extremes.append(frac_heads[i])
            next_trial.append(frac_heads[i+1])
    # plot results 
    plt.plot(extremes, 'ko', label = 'Extremes')
    plt.plot(next_trial, 'k^', label = 'Next Trial')
    plt.axhline(0.5)
    plt.ylim(0,1)
    plt.xlim(-1, len(extremes) + 1)
    plt.xlabel('Extremes example and next trial')
    plt.ylabel('Fraction Heads')
    plt.title('Regression to the mean')
    plt.legend(loc='best')
    plt.savefig('regressmean.png')
    plt.show()

This function is the heart of the code. It flips the coin a set number of times for a set number of trials, accumulating each average for each trial in a list. Then later, it finds out which of the averages is an extreme or outlier. When it gets an outlier, it adds it to the extremes list, and then adds the next trial to the next_trial list. Finally, we used matplotlib to draw the visuals. The visuals is a plot of the extremes and next_trial figures with a horizontal line showing the average line for the viewer to better understand what direction the next trial is expected to move to when there is an extreme.

I hope you sure enjoyed the code. You can run it on your machine or download it to study it, regress_to_mean.py.

Thanks for your time. I hope you do leave a comment.

Happy pythoning.

Python Shallow Copy And Deep Copy

Sometimes while programming, in order to prevent having side effects when we want to change an object, we need to create a copy of that object and mutate the copy so that we can later use the original. Python provides methods that we can use to do this. In this post, I will describe the shallow copy and deep copy methods of python that you can effectively use to copy objects even recursively.

python shallow copy and deep copy

 

Many programmers think that the assignment operator makes a copy of an object. It is really deceptive. When you write code like this:

object2 = object1

You are not copying but aliasing. That is, object2 is getting a reference to the objects which serve as the value of object1. Aliasing could seem intuitive to use, but the caveat there is that if you change the value of any one of the aliased objects, all the objects referencing that value also change. Let’s take an example.

You could see that I made an aliasing between second_names and names in line 2 so that they both reference the same object. When I appended a name to second_names, it reflected in names because they are both referencing the same object.

Sometimes, we don’t want this behavior. We want the fact that when we have made a copy, we have made a copy that is independent from the original copy. That is where python shallow copy and deep copy operations come in. To make this work, we need to import the methods from the copy module: import copy.

How Python shallow copy works with examples

The syntax for python shallow copy is copy.copy(x). The x in the argument is the original iterable you want to copy. I need to state here that the iterable that needs to be copied must be mutable. Immutable iterables are not copied.

Let’s take an example of how python shallow copy works on a list.

You can see that the copy, second_names, remained unchanged even after we added items to the original.

Let’s take an example on how python shallow copy works on a dictionary.

You can see in the dictionary also that the python shallow copy function operates on a dictionary as we expected.

You can also do copy on sets; they are mutable iterables. If you want a refresher on iterables, you can check this post.

There is one weakness of python shallow copy. As the name implies, it does not copy deep down. It copies only items at the surface. If in a list or dictionary we have nested items, it will reference them like in the aliasing operation rather than copy them.

Let’s use an example to show this.

Now, you can see that we changed Rose’s grade in the original from ‘C’ to an ‘A’ but the change was reflected in the copy. Too bad! That is behavior we don’t want. This is because python shallow copy does not go deep down or does not copy recursively. We need another type of copy to make both lists or dictionaries independent. That is where python deep copy comes in.

How python deep copy works

Python deep copy will create a new object from the original object and recursively will add all the objects found in the original object to the new object without passing them by reference. That’s cool. That makes our new nested objects copy effectively.

The syntax for deep copy is copy.deepcopy(x[, memo]). The x in the argument is the original object that has to be copied while memo is a dictionary that keeps a tab on what has already been copied. The memo is very useful in order to avoid recursive copying loops or for deep copy not to copy too much. I find myself using the memo often when I am implementing deep copy in my custom classes.

Now, let’s take an example of python deep copy on a list, a nested list precisely, and see how it performs.

You can see now that the original nested list was changed without affecting the copy.

That goes to show you the power of python as a programming language.

We can take this concept further by showing how to implement shallow copy and deep copy in python using custom classes. All you really need to do is implement two special methods in your classes for whichever you need. If you need to use python shallow copy in your class, implement the __copy__() special method and if you need to use python deep copy, just implement the __deepcopy__() special method.

Let’s show this by examples.

In the code above we defined a Student class with each student having a name, grade and dept. Then we defined a Faculty class that aggregates a list of students. Then in the Faculty class we implemented the __deepcopy__() special method in order to be able to recursively copy the list of students. Finally in the driver codes, lines 25 to 37, we created the objects for the classes and then copied the faculty object to a new faculty object to see how it will run, printing out the students in the new faculty object.

That’s cool. Just love this code. I hope you enjoyed yourself. I would love to receive feedback as comments.

Happy pythoning.

Python Print() Function: How it works

One of the ubiquitous and most often used functions in python is the python print function. We use it for realizing the output of our codes and even for debugging. So, it is pertinent that we understand how it works.

python print

 

In its essential form what the python print function does is to take a given object, convert it to a string object and print the value out to the standard output, or what is called the screen. It can even send the output to a file.

The python print syntax

The python print function despite its wide ranging value has a simple syntax. The syntax of the python print statement is print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False). I will be explaining each of the arguments in this post. So, just take note of the syntax.

Usually when you want to print something to the screen you provide the python print function with an object or several object arguments. If you don’t specify other parameters, what the function does is print each of the arguments to the screen, each separated by empty space and after all the arguments are printed, to go to a new line. Let’s illustrate this with an example and explain how it relates to the syntax.

When you run the code above, you will see that it nicely prints out each of the objects to the python print function. Here is what happened. I passed it 5 objects and it prints out the five objects each separated by a space. The separation by a space comes about from the sep parameter in the syntax above. The sep means separator. By default its value is a space. Notice that I cast one of the objects to a string before printing it out. This is a trick to make the period adhere to the value of the string. Very cool. We can change the value of the separator. I will highlight it in the separator section below.

Now what happens if we print without passing an object. Let me give an example following from our example above.

You can see that I repeated the earlier code. But on line five I wrote a print statement without giving it any argument or object. If you look at the output on the screen, you will see that it translated it into an empty space. Yes, without any argument the python print function just looks at what is at the end parameter and since the default is a newline, ‘\n’, it creates a new line.

Now let’s see how we can customize the working of the python print function using the keyword parameters outlined in the syntax.

Customizing python print with the sep keyword

The sep keyword separates each of the objects in the python print function based on its value. The default is a whitespace character. That means if you use the default, as outlined above, each of the objects when printed out will be separated by a whitespace.

What if we want another separator on python print, like we want a colon, :, to separate each of the objects to be printed. Here is code that could do it.

If you watch the output to the screen, you could see that each of the objects that was passed to the python print function now has a colon separator between them.

You could create any separator of your imagination. Most times when I have specific ways to print an output it could call for my customizing the separator.

Customizing python print with the end keyword.

The end keyword is another parameter that we could use to customize the python print function. As I highlighted above where I printed a print function without objects, the default for the end keyword is a newline, ‘\n’, which creates a new line after printing the objects. That means python print adds a newline to each line. Most times when I want python print without newlines, that means, subsequent lines of objects to print on the same line, I customize the end keyword. You just replace the default with a space character, ‘ ‘, which signifies to concatenate all the subsequent lines on one single line.

For example, you have code you want printed in the same line. Here is the code that could do it.

You can see that by customizing the end parameter to a space, I have made all the objects in the python print function print without newline to the same line.

How to print to file using file keyword

Most usually when you call the python print function, it prints to standard output, that is, the screen. That is the default. I will show you how to print to a file. You can customize it to print to a file by specifying a file object as the value to the file parameter which file object should be writable. For details on how to open, read, and make files writable, see this blog post.

Now, let’s take an example. This time instead of printing to the screen we will be printing to a file or writing to a file. Here is the code:

    
text = 'I feel cool using python.
        \nIt is the best programming language'
with open('new_file.txt', 'w') as source_file:
    print(text, file=source_file)

You can run it on your machine. When you do, rather than getting the text message to your screen, it will print to the file, new_file.txt. If new_file.txt doesn’t exist, it will create one.

One thing to note about file objects passed to the file keyword – you cannot use binary mode file objects. This is because python print function converts all its objects to the str class (strings) before passing them to the file. So, note this and if you want to write to binary mode file objects, use the write methods that are built-in for file objects.

You must really be feeling empowered with all the cool features in python print function. I am. You can subscribe to my blog or leave a comment below. I feel happy when I believe I have made an impact.

Happy pythoning.

Simulating A Random Walk In Python

Deterministic processes are what every programmer is familiar with when starting out in their journey. In fact, most beginner books on programmer will teach you deterministic processes. These are processes where for an input, you always get the same output. But when you get into industry, you find out that most times stochastic processes are the norm when finding solutions to problems. Stochastic processes give different results for the same input.

random walk python

 

In this post, I will be simulating a stochastic process, a drunkard’s walk, which is an example of a random walk.

Python random walks are interesting simulation models for the following reasons:

  1. They are widely used in industry and interesting to study.
  2. They show us how simulation works in practice and can be used to demonstrate how to structure abstract data types.
  3. They usually involve producing plots which are interesting and as they say, a picture is worth a thousand words.

So, let’s go to the simulation exercise. It is interesting to find out how much distance a drunk would have made from his starting position if he takes a number of steps within a given space of time. Would he have moved farther after that time, would he still be close to the origin, or where would the drunk be? Such questions can only be simulated for us to get a general idea of the drunk’s position. We’ll imagine that for each movement, the drunk can take one step either in the north, south, east, or west direction. That means he has four choices to choose from for each step.

To model the drunk’s walk after some time, we will be using three classes representing objects that define his position relative to the origin: Location, Field, and Drunk classes.

The Location class defines his location relative to the origin. We could write code for the class this way:

    
class Location(object):

    def __init__(self, x, y):
        ''' x and y are numbers '''
        self.x, self.y = x, y

    def move(self, delta_x, delta_y):
        ''' delta_x and delta_y are numbers '''
        return Location(self.x + delta_x, self.y + delta_y)

    def get_x(self):
        return self.x

    def get_y(self):
        return self.y

    def dist_from(self, other):
        ox, oy = other.x, other.y
        x_dist, y_dist = self.x - ox, self.y - oy
        return (x_dist**2 + y_dist**2)**0.5

    def __str__(self):
        return '<' + str(self.x) + ', ' + str(self.y) + '>'

Each location has an x and y coordinate representing the x and y-axis. When the drunk moves and changes his location, we could return a new Location object to signify this. Also using the location class, we can calculate the distance of the drunk from another location, and most possibly the origin.

The second class we need to define is the Field class. This class will allow us to add multiple drunks to the same location. It is a mapping of drunks to their locations. Code could be written this way for it:

    
class Field(object):

    def __init__(self):
        self.drunks = {}

    def add_drunk(self, drunk, loc):
        if drunk in self.drunks:
            raise ValueError('Duplicate drunk')
        else:
            self.drunks[drunk] = loc

    def move_drunk(self, drunk):
        if drunk not in self.drunks:
            raise ValueError('Drunk not in field')
        x_dist, y_dist = drunk.take_step()
        current_location = self.drunks[drunk]
        # use move method of Location to get new location
        self.drunks[drunk] = 
                  current_location.move(x_dist, y_dist)

    def get_loc(self, drunk):
        if drunk not in self.drunks:
            raise ValueError('Drunk not in field')
        return self.drunks[drunk]            

As you can see, the Field class is a mapping of drunks to locations. When we move a drunk, his location reflects this move and we take note of the current location. Also, we can use this class to find out the location of any drunk.

The last class of interest is the Drunk class. The Drunk class embodies all the drunks we will be playing with. It is a common class or parent class as all other drunks will inherit from this class.

    
import random

class Drunk(object):

    def __init__(self, name=None):
        ''' Assumes name is a string '''
        self.name = name

    def __str__(self):
        if self != None:
            return self.name
        return 'Anonymous'

What the Drunk class does is give identity to each drunk object or subclass.

Now, we will create a drunk with our expected way of movement: that is take one step each time in the north, south, east, or west direction. We will call this drunk class, UsualDrunk. Here is the definition of the class.

    
class UsualDrunk(Drunk):

    def take_step(self):
        step_choices = [(0,1), (0, -1), (1,0), (-1,0)]
        return random.choice(step_choices)

The UsualDrunk class inherits from the Drunk class and the only method it defines is the random step it can take. From the take_step method you can see that it can only move one step to the east, west, north or south, and this in a randomized fashion.

So, now that we have our classes let us try to answer the question – where will the drunk be after taking a series of walks in a random fashion? Like taking 10 walks, or 100, or 1000? Normally, we would expect that when the number of walks increases, the distance from the origin should increase. But this might not be the case because you know how drunks walk – haphazardly. Some drunks can even retrace their steps back to where they started and go nowhere!

So, for our simulation, we will write code that makes use of these classes and run the code on the drunk taking a number of steps with different trials for each step. We are using different trials in order to balance out the randomized walk and get a mean of distances.

Here is the code:

When you run it there is one fact that stands out: The mean distance from the origin increases as the number of steps increases. That is the hypothesis we started with.

Some pertinent new driver code are the following:

    
def walk(f, d, num_steps):
    '''Assumes: f a field, d a drunk in f, 
    and num_steps an int >= 0.
    Moves d num_steps times; returns the distance between
    the final location and the location at the start 
    of the walk.'''
    start = f.get_loc(d)
    for _ in range(num_steps):
        f.move_drunk(d)
    return start.dist_from(f.get_loc(d))

The walk function returns the distance from the final location for a single trial based on the drunk taking a number of steps that is defined.

    
def sim_walks(num_steps, num_trials, d_class):
    '''Assumes num_steps an int >= 0, num_trials an int > 0,
    d_class a subclass of Drunk. 
    Simulates num_trials walks of num_steps steps each. 
    Returns a list of the final distance for each trial'''
    homer = d_class()
    origin = Location(0,0)
    distance = []
    for _ in range(num_trials):
        f = Field()
        f.add_drunk(homer, origin)
        distance.append(round(walk(f, homer, num_steps), 1))
    return distance

The sim_walks function (simulated walks) is different from the walk function only in one aspect: it relates to all the different trials that are used for a specific step. Say for a 10 steps walk we did 100 trails so as to get the mean. So sim_walk returns a list of the distances for the trials. This is so that we can take the mean distance for each number of steps since we are randomizing the walk.

And finally, the drunk_test function.

    
def drunk_test(walk_lengths, num_trials, d_class):
    '''Assumes walk_lengths a sequence of ints >= 0
    num_trials an int > 0, d_class a subclass of Drunk
    for each number of steps in walk_lengths, runs sim_walk
    with num_trials walks and prints results '''
    for num_steps in walk_lengths:
        distances = sim_walks(num_steps, num_trials, d_class)
        print(d_class.__name__, 'random walk of ', num_steps, 'steps')
        print('Mean:', round(sum(distances)/len(distances), 4))
        print('Max:', max(distances), 'Min:', min(distances))

This serves as the test of our code. It prints out the mean for each number of steps after doing the various trials and then the max and min for those trials in a specific step.

You could download the above code here, random_walk.py.

But a picture is worth a thousand words. Let us use a plotted graph to illustrate the variation in the number of steps to the distance from the origin.

drunkards walk python

 

You can see from the graph above that when the drunk is taking ten steps for each of the 100 trials, the distance he moves is closer to the origin than when he takes 100 or 1000 steps. But the drunk seems more determined to walk farther away if he is given the opportunity to take several steps. Drunks really mean to get home it seems! A graph of number of steps for each trial to mean distances shows that this is truly the case: the more opportunity he is given to take higher steps, the closer he gets to home and away from where he started with. The graph below shows that information.

drunkards walk python


The scales in the graph have been extrapolated to logarithmic scales to clearly show the straight line relationship between number of steps and mean distance from the starting point. To see how the code for the plotted graphs were written you can download it here, random_walk_mpl.py.

Now, our simulation has dwelt on a drunk walking the way we expect: for each step one unit towards the east, west, north, or south.

What if we could make the drunkard’s walk somewhat biased by skewing it a little. That would involve creating different drunks with different steps and comparing them to our usual drunk.

A biased random walk simulation.

Let’s imagine a drunkard who hates the cold and moves twice as fast in a southward direction. We could make him a subclass of Drunk class and change his way of movement in the class, calling him ColdDrunk.

This could be his class definition:

    
class ColdDrunk(Drunk):
    def take_step(self):
        step_choices = [(0.0, 1.0), (0.0, -2.0), 
                       (1.0, 0.0), (-1.0, 0.0)]
        return random.choice(step_choices)

You can see that whenever he moves southwards, y axis, he takes two times a unit step.

Now let’s also add another hypothetical drunk that moves only in the east-west direction. He really moves with the sun or is phototrophic. We could define his class, EWDrunk, in the following way:

    
class EWDrunk(Drunk):
    def take_step(self):
        step_choices = [(1.0, 0.0), (-1.0, 0.0)]
        return random.choice(step_choices)

So, we have all our drunks ready. Now let’s write code that will run them and compare their mean distances for various number of steps.

If you run the code above you will get a plotted graph that shows number of steps against mean distance from the origin for the three drunks. You will get a graph that looks like the following:

drunkards walk python


You will notice that for both the UsualDrunk, who we highlighted earlier, and the phototrophic drunk, EWDrunk, their variation in mean distance as the number of steps increases is not much compared to the South loving or north hating drunk, ColdDrunk. That means the ColdDrunk, or north hating drunk, is moving faster than all other drunks. This is not surprising based on the fact that whenever he moves south, he moves twice as fast. That means randomly the drunk’s movement is more favorable than the other two.

We could extrapolate on this conjecture and build a scatter plot of the location of each drunk’s movement for each step but I think the point has already been made: simulating a random walk could give us insights into a model and could confirm or deny a hypothesis.

If you would like a copy of the code for the three drunks, you can download it here, random_walk_biased.py.

That’s it folks. I hope you enjoyed this post. I really enjoyed coding it. It was fun.

This helps us to see the insight that plotting a class or set of classes can give to a programmer.

Happy pythoning.

Matched content