Search

Python Graph Traversal Algorithms: DFS and BFS algorithms

Graph traversal (or graph search) is an important concept in computer science and programming. It refers to the process of visiting each node or vertex in a graph and checking or updating them. The algorithms that denote how this visitation are done are used to classify the type of traversal. I will be considering a python depth first search algorithm and a python breadth first search traversal of a graph in this post by applying them to finding the shortest path in a graph.

graph traversal algorithms

 

I believe that you are familiar with the graph classes I created in an earlier post. Because I will be using the objects of those classes as the model for the traversals we are going to be doing. If you are not familiar with the classes, I will encourage you to check the digraph and graph classes here. It only takes a minute or two.

Graph traversal has many applications in programming. It could be used to find all the nodes or vertices in a connected graph, find the shorted path between two nodes, test a graph for bipartiteness, or even analyse relationship and networks in a graph.

In this tutorial, we will be looking for the shortest path between two nodes using our algorithms.

The Python Depth First Search (DFS) traversal algorithm for shortest path in a graph

This depth first search algorithm requires that each node is called on exactly once for each connection on the graph. The algorithm is a recursive algorithm that ends when we find the end node in the path.

The search works by starting from a start node, and then it looks for all its adjacent nodes (i.e nodes connected by an edge), and it iterates through each of these nodes, also looking for their adjacent nodes recursively until it gets to a node without an adjacent node in the directed graph, or it gets to the end node. I modeled it as a directed graph because searching is directed in order to avoid cycles.

The graph below illustrates how graph search works in depth first search traversal. 

depth first search example
By Mre - Own work, CC BY-SA 3.0

Now, to implement the algorithm, we will be using the graph below. This is a directed graph with 5 nodes and 6 edges.

directed graph example


From the graph above, we will be taking a path from node A to node E.

You can run the code below to see how it works. The explanation of the relevant parts comes after. You can download the code for use on your machine at the last paragraph of this post.

If you read the code, you can see that the graph is modeled after the Digraph object, digraph, which explanation for the Digraph class is contained in the link I posted some paragraphs before this. The new parts of code that needs to be explained are the print_path function, the DFS function, and the driver code.

The code for the print_path function is found from lines 67 – 74.

    
def print_path(path):
    '''assumes path is a list of nodes'''
    result = ''
    for i in range(len(path)):
        result = result + str(path[i])
        if i != len(path) - 1:
            result = result + '-->'
    return result 

The code prints out each path or set of connected nodes while going through all the nodes in the graph searching for the end node. It uses a loop that iterates through the list of nodes in the path and concatenates each node with an arrow to represent relationship. It stores the name of each node in a variable, result, and prints out result at the end of the function.

The most interesting aspect of the code is the DFS function which does the actual python depth first search traversal. Here is the code from lines 76 to 90.

    
def DFS(graph, start, end, path, shortest, to_print = False):
    '''Assumes graph is a Digraph; start and end are nodes; 
    path and shortest are lists of nodes
    Returns a shortest path from start to end in graph '''
    path = path + [start]   #creation of visited nodes list
    if to_print:
        print('Current DFS Path:', print_path(path))
    if start == end:  # base case and end of the recurisve loop
        return path
    for node in graph.children_of(start):
        if node not in path:  #avoid cycles 
            if shortest == None or len(path) < len(shortest):
                new_path = DFS(graph, node, end, path, shortest, to_print)
                if new_path != None:
                    shortest = new_path

As you can see from the code above, the function, DFS, takes 6 arguments: the graph to search, the start and end nodes, the path of visited nodes, a shortest list of nodes that is the shortest path to the end node from the start node, and a keyword argument, to_print, that states whether you want a print out of the nodes as they are visited. The path argument refers to the visited nodes in the graph. We need to keep a count of visited nodes so that we don’t go through the same node more than once because that would mean going in a cycle. Then the shortest argument is the shortest path found so far starting from the start node. When we first called the DFS function, the shortest path was given the value None to indicate that we have not found a shortest path yet.

We indicated the base case for the recursive loop in an if statement that checks to see when the function is called recursively whether the start and end loops are the same. This makes sure the loop ends when we have arrived at the end loop and we can create the shortest path. This is indicated in lines 83 and 84. Then from lines 85 – 90, using a for loop when the traversal gets to each node we get the children of that node in the loop. First, we check to see whether that child has been visited before and if not we check to see if it will lead to the end node and if it does if the length from the start to that node is shorter than the shortest so far. If it is longer, we stop that search and go to the next child in the loop but if shorter, we recursively call the function again and where it returns a path to the end node, that becomes the shortest so far.

The function is guaranteed to return the shortest path if it exists.

Now, let’s go on to the second traversal algorithm, the python breadth first search (BFS) algorithm.

The Python Breadth First Search (BFS) traversal for shortest path in a graph.

Unlike in the DFS algorithm where we looked for the shortest path by first analyzing the child nodes, in the python BFS algorithm, we first explore the sibling nodes while looking for the shortest path. Sibling nodes are nodes with the same parent. We use a queue to understand which nodes have been visited and which has not been visited. The BFS algorithm guarantees that whenever it gets to the end node, that path discovered is the shortest path and other nodes do not need to be analyzed after that (unlike in DFS where you analyze every other node for length of path). Many programmers prefer to use the BFS algorithm for finding shortest path in a graph.

Now this is the code for the python BFS algorithm. Run it to see how it would work before I provide the explanation for new code. You can download the code for the python breadth first search (BFS) algorithm at the last paragraph of this post.

Other codes and functions are familiar except for the BFS function, so I will explain it in depth. The lines reference are from the embedded code above.

    
def BFS(graph, start, end, to_print = False):
    '''Assumes graph is a digraph; 
    start and end are nodes. 
    Returns a shortest path from start to end
    in graph '''
    init_path = [start]
    path_queue = [init_path] #store all the nodes currently being explored
    while len(path_queue) != 0:
        # get and remove oldest element in path_queue
        tmp_path = path_queue.pop(0)
        print('Current BFS path:', print_path(tmp_path))
        last_node = tmp_path[-1]
        if last_node == end:
            return tmp_path
        for next_node in graph.children_of(last_node):
            if next_node not in tmp_path:
                new_path = tmp_path + [next_node]
                path_queue.append(new_path)
        
    return None                     

The BFS function takes 4 arguments: the graph to be searched, the start and end nodes, and a keyword argument about whether to print out the visited nodes. On line 82 we create a queue of all the sibling nodes for each node that is currently being visited. This queue has the start node as its first node. On lines 88 and 89, we check if the last node in the current path being explored is the end node, if it is we have ended our search, but if it is not, then we move on to the next lines in the code.

On lines 90 – 93 we explore the child paths of the last node and add all the sibling nodes together to the path queue which will be explored in each iteration until we get to the end node.

Note that the BFS function returns the path from the start to end node if they are connected or if not, it returns None.

The DFS and BFS functions return the shortest path where they exist even while working differently. But the BFS is guaranteed to be faster in its implementation and does not go through needless iterations (unlike DFS) when the end node has been discovered because any path it finds to the end node promises to be the shortest path.

That’s it pythonistas. I hope you enjoyed my code for today. I had a swell time writing the code.

Drop a comment if you enjoyed it and say what you think needs to be changed. I would surely love to read your comments. Also, subscribe to my blog so you can receive regular updates when I post new articles. You can download the DFS code, depthfirstsearch.py and BFS codes, breadthfirstsearch.py, here.

Happy pythoning.

No comments:

Post a Comment

Your comments here!

Matched content