Search

How To Split A String In Python: Python Split() Method

Very often we have a long string and we want to split it. As programmers we encounter this situation often. Python has a method built in to the string data type, the python string split method, that can be used to split strings conveniently. In this post, I will describe how to use the python string split method to split strings and also the different ways you can split a string. I also describe how you can create a dictionary from a split string.

python string split

 

What is the python string split method?

The python string split method is built into the string class and has the syntax str.split(separator=None, maxsplit=-1). What the method does is take a string, represented by the name, str, and then split it based on the separator specified in the arguments. If no separator is specified, it defaults to white space. It then returns a list of the elements of the string as items of the list. The maxsplit argument specifies how many splitting has to be done.

We will cover examples for all the scenarios above shortly.

Here is how it works in practice without any argument specified.

In the example above, I did not specify any argument so it defaulted to splitting the string, string, using whitespace as the delimiter and then splitting all the available white spaces.

Do you know? You can split the string and then join them back again to get back your string? Here is an example where I joined them using the dash character.

Now, let’s discuss each of the argumernts to the python string split method.

The separator argument.

As I said above, if separator is not given, the string is split based on whitespace characters. But if separator arguments are provided, the string is split based on the separator provided in the argument.

In the code below, the comma character is the separator.

Notice that the comma character delimits each of the strings and stores them into a new list using the python string split method.

In the example below the ‘#’ character is the separator used.

If we have a string specifying an email address, we could use the ‘@’ character as the separator.

This gives you an idea of how the string split method works with the separator. Note that when you split an empty string with this method, it will result in a list with the empty string as item.

The maxsplit argument.

The maxsplit argument specifies the maximum number of splits that has to be done on the string.

The default is -1 which means there is no limit to the number of splits. When the maxsplit argument is specified, the result list has maxsplit plus one items i.e if the maxsplit is 2, the items in the resultant list are 3.

Now let’s use examples to show how the maxsplit argument works.

The code above specifies a maxsplit of 2 i.e split the string according to whitespace character twice. Notice that the number of items in the resultant list is 3. It only splits using two white space characters and leaves the remaining white space untouched.

If maxsplit is not specified, the default of -1 is used by the method which signifies split the maximum number of times. Now let’s use the same example but not specifying it.

You will notice that the string is now split the maximum number of times i.e by all the whitespace.

Now that you know how the arguments work, go experiment with them and play with python.

There is a tweak I want to show you. How to build a dictionary using the python string split method.

How to make a dictionary using the python string split method.

Very often we want to be able to use the items in a string to make a dictionary. This is simply done by casting the returned key and value pairs to a dictionary. Consider the example below:

If you read the code you will notice that I first split by the semi-colon character, ;, which separates each of the key-value pair. Then for each key-value pair in the resulting list, I split by the equal sign, =, and cast the result into a dictionary to get my dictionary data structure. Just beautiful.

Now, you have the tricks and treats on how to use the python string split method. Go use it with pleasure.

Happy pythoning.

Python Regex For Mobile Phone Number Validity Check

Because there was a huge response from readers towards the email validity check and Roman numerals validity check algorithms I wrote using python regex, I decided to write another common place validity check – mobile phone numbers validity check. Checking mobile phone numbers for validity promises to be easier than other validity checks.

python regex mobile phone validity check

 

To understand the code I will be using, I recommend you read the following earlier blog posts that discusses python regex syntax and methods: “How to find a match when you are dating floats" which is an introduction to python regex coached in story form, and “The Big Advantage of Understanding Python Regex Methods” which discusses three methods you will always use when looking for matches in python regex. At least, you will use one of the three.

Now that we have the fundamentals out of the way, let’s start coding.

Now the rule I will be using for valid mobile numbers are that: A valid mobile number is a ten digit number starting with a 7, 8, or 9. Simply that. I know you can do it on your own after reading the two earlier posts above. I know you can.

But would you like to see my implementation? Here it is below with explanations. The embedded python interpreter would ask you to input a mobile number that would be used for validation.

The really interesting part I think I should explain is the pattern. Other parts of the code should be clear to you but if they are not clear, check the links above. Now for the pattern (see line 3) I first stated that it starts with either a 7, 8, or 9 using a set notation for python regex, ^[789]. Then after the starting digits, there should be 9 digits after that and only 9 digits after, nothing more nothing less. This notation nails it: \d{d}$ with the $ signifying that the last digit is the end of the pattern. That’s that.

Note: In mobile numbers there are other rules like adding a + before the number or an 0. For the sake of simplicity in this post and helping you to get a hold on the fundamentals, I restricted my pattern to only the rule that it is a ten digit number starting with 7, 8, or 9. If you want to validate for additional rules, then experiment with python regex and send me your code about what you did. I would be happy to take a look at it. Remember, programming is all about being creative in problem solving.

Happy pythoning.

Python Sort and Sorted Functions Explained

Sorting in programming is very important. It can improve the efficiency of your code. For example, if you have to search for an item in a collection, the ability to sort the items beforehand reduces the computation that has to be done by the search algorithm, thereby increasing efficiency. That is why understanding how to sort is very important. Python provides two functions for sorting: the python sort and sorted functions. I will describe both functions in this post. Also, I will discuss their similarities and differences, and using examples show you how to use them effectively.

python sort and sorted functions

 

So, let’s start with the first method, the python sort method.

What is the python sort method?

This method is made only for the list data type. It sorts items in place and uses lexicographic order. The items in the list must be able to compare equal and if they do not the sorting fails and raises an exception. The list might be left unstable if this happens. The syntax for the python sort method is: sort(*, key=None, reverse=False). The keyword argument, key, refers to the comparison function that could be used for sorting and the default is None. The reverse argument could be switched between True and False to state whether the sorting should be done in ascending or descending order. The default is False i.e ascending order. The method returns None which means the list is sorted in place.

See this post for a refresher on how sorting using lexicographic order is done.

Let’s give examples of sorting that is done without specifying the key and reverse arguments. We will deal extensively with those two after the sorted function is explained.

The list above, items, has both strings and int values. Since strings and ints cannot compare equal for ‘<’ comparison, the code returns a TypeError. Note this please when sorting.

In the code above the list, items, is a list of numbers and on calling python list sort they are correctly sorted in ascending order, the default. Notice that the list, items, is sorted in place and returns None.

Now, let’s illustrate the second sorting function, the python sorted function.

What is the python sorted function?

The sorted function is a built in function that sorts any iterable in python. The syntax of the sorted function is: sorted(iterable, *, key=None, reverse=False). Unlike the python sort method which acts only on lists, the sorted function can accept lists, dictionaries, strings, tuples, sets etc. It accepts anything that is an iterable. The key and reverse arguments are the same as for sort method and they will be explained below. The python sorted function returns a sorted iterable.

Now for some examples using the same list, items, I used for the sort method.

Now let’s explore the differences and similarities between the two functions.

First, the difference between python sort and sorted functions.

The fundamental difference between both of them is that sort modifies the list in place, while sorted returns a new sorted iterable. So, if you want something that is optimized for lists, just use the python sort method and you are good to go. But if you want to sort an object that is not a list, then you have python sorted function at your convenience.

The similarities between python sort and sorted functions

The similarities between both functions are based on their keyword arguments: key and reverse. The key and reverse arguments for both functions work similarly and can be interchanged. These two keyword arguments give both functions their power so I will take time to explain each of them in turn.

The key argument in sort and sorted.

The key argument, when present, specifies how the comparison is to be done. The key argument is supposed to be a function that takes a single argument and returns a key for the python sort or sorted functions.

Most times when people have items that are lists of lists, they would want to sort based on one of the indices in the list. This is where the key function really comes in. Let’s take a list of tuples for example, of names and ages, and sort based on the ages. This will demonstrate how the key argument can be used. Please I used a list and the sorted function for the examples that come next. You can use any iterable of your choice and either sort or sorted; you will get the same results.

Notice that the youngest person now comes first, followed by the second youngest and then the next etc. So, we specified the key using a lambda function and that the key to use should be the index 1 for the items in the list and index 1 specifies the age. So, we’re sorting the python list based on its indices. Notice though that the list is not efficiently sorted. It was sorted by ages but the names for the same ages are out of order. We will come to that later.

See the following post if you want a refresher on lambda expressions as used in the code above.

Now a list is a built-in data type. Can we do the same sorting on custom objects we created? Yes, we can. Let’s take an example.

In the code above we created a Person class and all instances of Person have a name and age. Then in the driver code, from line 16, we created a list of Person instances and then sorted the list using a lambda expression with the age as the key. I want you to study this code very well and see that we could sort based on specific attributes of objects just as we did for native data types. It shows you the powerful capabilities of python as a language. We can even sort python objects of any type.

But the sorting is not yet efficient. The names are not in alphabetical order; just the same efficiency problem for the first sorted list. So let’s make the sorting efficient.

Please compare the output of the code below with that of the code above.

You will notice in the code above that it is now optimized. Initially, we were able to sort correctly for ages but when two Persons have the same age their names were not sorted. So, I added a little tweak to the lambda function so as to sort first for ages and then for names. I modified the statement in the lambda function to: key=lambda x : str(x.get_age())+x.get_name(). What the code says is to tell sorted to first sort by age with the key cast to a string to make it compare equal to name, and then after sorting by age, then sort by name.

It’s now elegant and more efficient, not so? It’s fun. That’s python programming.

Now we have been dealing with an iterable that has some order to it. What if we have a dictionary, an iterable, that has no order to it. How can we sort a dictionary by value or sort a dictionary by key in python?

First note that to sort a dictionary you only use the python sorted function. And by default it sorts the dictionaries by keys. For example, taking the key, value pairs of fruits below when we call the sorted function on it, it sorts the dictionary by the alphabetic order of the names of the fruits.

This is a well done sort of dictionary by keys. Notice that when I called sorted the iterable I used is fruits.items() instead of fruits. This is because I wanted to get a view into both the keys and values on the output. If I had used fruits only, then it would have given me a list of only the keys.

Compare this code and the code before it and see for yourself how the output using fruits as iterable is different from that using fruist.items().

So, what if I want to sort the dictionary by values in python. That is where using the key argument comes in. From the ordering of the view given by fruits.items(), which are tuples of (key,value) pairs, what I do is modify the lambda function to catch only the value which is the index 1 in each tuple. So, just study the below code.

What I modified in the code is to insert the expression for the key using a lambda expression and then make it refer to the value index in the tuple.

So, you now know how to sort a dictionary by key and how to sort a dictionary by value in python.

Now for the second keyword argument, the reverse argument.

The reverse argument for sort and sorted functions

The reverse argument has Boolean values. When the value is True, you are asking the sort or sorted function to arrange the outcomes in descending order. When it is False, the default, you are asking it to arrange them in ascending order. It’s as simple as that.

Now, for everything we use examples. So, let’s take examples. We’ll use our initial list of names and ages and sort by ages in ascending order and then descending order.

First, ascending order, the default, and later in descending order.

Notice that to change from ascending to descending order I only changed the reverse argument value from False to True. That’s it.

So, I believe you have all you need to do sorting in python. Experiment to your heart’s delight.

Happy pythoning.

Classes For Graphs and Directed Graphs In Python: Graph Theory

In computer science and mathematics, graphs are ubiquitous. They are just everywhere. We use graphs to solve a lot of problems that involve relationships. Since 1735 when the Swiss Mathematician, Leonhard Euler, used what we now know as graph theory to solve the Seven Bridges of Königsberg problem, graphs have become a brand name of sorts. That is why I decided to write a post on graphs and explore graphs in subsequent posts.

python graphs and directed graphs

 

What are graphs?

In simple terms, graphs are structures used to represent the relationship between objects (called vertices or nodes) where two objects (or nodes) have an edge connecting them if they are related. Diagrammatically, they are depicted with a set of dots or circles for the objects or nodes, and related objects are joined by lines called edges.

The graph below has 6 nodes or vertices, and 7 edges.


 

The edges of a graph may be directed or undirected.

I will be writing code for both directed and undirected graphs. What made me attracted to writing code on graphs was because they are used in every area of life. From scientists to businesses, graphs are used to model solutions to problems.

So, let’s start with writing classes for graphs and we will implement them.

First, we’ll create a class for a Node and an Edge.

A node is just an object in a graph. One attribute every object has is a Name. So, we’ll give our node a name attribute to start with. Here is the code for the Node class.

    
class Node(object):

    def __init__(self, name):
        ''' assumes name a string '''
        self.name = name

    def get_name(self):
        return self.name

    def __str__(self):
        return self.name

Every instance of a Node, as you can see from the code, has a name and each instance has a method, get_name, which you can use to retrieve the name.

An edge is a relation connector between objects. If two objects are connected to each other by a relationship, they will have an edge between them. Edges can be directed or non-directed. Let’s model the Edge class to start with.

    
class Edge(object):

    def __init__(self, src, dest):
        '''assume src and dest are nodes '''
        self.src = src
        self.dest = dest

    def get_source(self):
        return self.src

    def get_destination(self):
        return self.dest

    def __str__(self):
        return self.src.get_name() + '-->' + \
            self.dest.get_name()

From the code you can see that each instance of an Edge has a source node, self.src, and a destination node, self.dest. On creation of a node the source and destination nodes have to be passed as arguments to the constructor, __init__. Then I added a special method for representing an Edge as a string of source and destination nodes, __str__(). This would make for easy printing.

Now that we have the Node and Edge classes, let us go on to model the directed graphs and undirected graphs.

Directed graphs in Python code

A directed graph is a graph in which edges have orientations. The relationship in directed graphs goes one-sided and never both way. The edges are represented by arrows.

A simple class for a directed graph might be written in the following way:

    
class Digraph(object):
    # nodes is a list of the nodes in the graph
    # edges is a dict mapping each node to 
    # a list of its children 
    def __init__(self):
        self.nodes = []
        self.edges = {}

    def add_node(self, node):
        if node in self.nodes:
            raise ValueError('Duplicate Node')
        else:
            self.nodes.append(node)
            self.edges[node] = []

    def add_edge(self, edge):
        src = edge.get_source()
        dest = edge.get_destination()
        if not (src in self.nodes and dest in self.nodes):
            raise ValueError('Node not in graph')
        self.edges[src].append(dest)

    def children_of(self, node):
        return self.edges[node]

    def has_node(self, node):
        return node in self.nodes

    def __str__(self):
        result = ''
        for src in self.nodes:
            for dest in self.edges[src]:
                result = result + src.get_name() + \
                    '-->' + dest.get_name() + '\n'
        return result[:-1] # remove last newline

The class, Digraph, represents a class for objects of a directed graph. If you look at the constructor, we are representing all the nodes in the graph as a list, while the edges are represented by a mapping of nodes to child nodes which mapping goes only one way. Therefore, a dictionary data structure was used for this mapping. To have a graph, it needs nodes. To add a node, we use the method add_node that takes a node as its sole argument. To add a node, we need first to check if the node already exists in the list of nodes and if it does, the method raises a ValueError. If not, it then appends the node to the list of nodes in the graph and then creates a mapping for that node with its values as an empty list that would later be populated when edges are added. To add an edge to the graph, we use the add_edge method. We first initialize the source and destination nodes of the edge, and before adding the edge we check that the source and destination nodes are already in the list of nodes for the graph. If they are not in the list, we raise a ValueError exception. If no exception is raised, a directed edge is created with the source, src, mapped to its value, the destination node, dest. The class also has complementary methods, children_of, that would bring out a list of the nodes connected to that node as source, and also another method, has_node, that returns a Boolean value after evaluating whether the node in question is in the list of nodes for the graph.

That sums up our directed graph, Digraph, class. Now, let’s show that the code works. Let’s implement it by creating an instance of a directed graph, or Digraph. The Digraph instance I will be creating will be based on the directed graph below with 5 nodes and 6 edges. 

python directed graph

 

The only new code is the driver code that creates a Digraph instance. For the sake of brevity, I would recommend that you read the driver code starting from line 67 down to line 106. It is really an exciting code. I hope you appreciate it.

Now the question I asked myself is: why create a class for a graph and not just write code direct? This is because I would be reusing the code in the future. So, we will be coming back to this code for solving problems involving graphs in the future. Maybe you could bookmark this code for the Digraph class. You can download the file here, directed_graph.py.

Undirected graphs or simple graphs in Python code.

An undirected graph or graph for short is a connection between a pair of nodes using their edges. The edges can go both ways which distinguishes it from directed graphs that have orientations.

Now while writing the code for undirected graphs, I ran into a dilemma about inheritance. I was stuck between which graph should inherit from which. Should a directed graph inherit from a undirected graph or should it be vice versa? I decided that it was best for a graph to inherit from a directed graph. This was because instances of graphs can substitute for instances of digraphs and still add one more behavior by making the relationship go the other way. But instances of digraphs cannot stand as substitutes for instances of graphs; digraphs relationship goes only one way. Therefore, I decided to make digraph the superclass and graph the subclass.

Now, this is the code for the class graph.

    
class Graph(Digraph):

    def add_edge(self, edge):
        Digraph.add_edge(self, edge)
        rev = Edge(edge.get_destination(), edge.get_source())
        Digraph.add_edge(self, rev)                                            

Notice that the class, Graph, is inheriting from Digraph so it shares the same attributes with Digraph instances but it only overrides the add_edge method of Digraph. In the add_edge method of Graph I made it that the relationship can go both ways i.e every node in a relationship or edge is both a source and destination node for that edge.

So, for a little implementation that creates an instance of a Graph, I will be modeling the Graph pictured below:

python graph example

 

Run the code and notice the differences between this instance of a Graph and instances of a Digraph. The driver code that creates the Graph instances starts at line 73. You can alternatively download the code and run it on your own machine, graph.py script, so you can bookmark it.

I hope to see you in the future when we begin solving problems with graphs like the traveling salesman problem. To receive updates when I post new articles, just subscribe to my blog.

Happy pythoning.

Matched content