Regular expressions, or regex, in python is fun. It is a very fast way to search through a string for a given pattern. Whenever I have to search and I am dealing with a string, the first thing I do is to look for a solution in regex. If you know regular expressions, so many string operations will be easy.
My earlier post, How To Find A Match When You Are Dating Floats, explains the basic syntax and use of regex. In this post, I will highlight two functions in using regex that could confuse anyone unless they understand how they work. I will also add a third method that serves as an extension to the two.
So, what are the two methods? They involve searching for a pattern in a string and the two methods are re.search() and re.match(). They both do the same thing: search for a pattern in a string.
How python re.match() works.
The syntax of python re.match() is re.match(pattern, string, flags=0). What it does is take a pattern, as the first argument and a string as the second argument and search for the pattern in the string. You could add in other flags if you want to like make it search multiline or ignore string case.
Now, the subtlety of re.match() is that it returns a match object only if the pattern is at the beginning of the string. Else, if it is not at the beginning of the string, it returns None. This is very important to remember because many unsuspecting pythonistas have found themselves thinking their pattern was wrong when their match returned None.
Let me illustrate this with a little code.
From the code above, I changed the patterns. The first pattern started from the beginning of the string, line 4, and it returned a match object when I printed the object. But the second pattern, line 10, did not start from the beginning of the string. When you ran the code, you would have noticed that it printed None for this case.
So, always remember, re.match() searches for the pattern at the beginning of the string.
How python re.search() works
Now, the second method for searching for patterns is re.search(). The syntax is similar to re.match() but different from re.match() because it searches for the pattern anywhere in the string. Even if the string is multiline, it would still return a match if the pattern exists in the string. But it does not return a match for all locations where the pattern can be found in the string. Rather, it returns only the first match for the pattern.
If you run the code above, you can see that it both gets a match at the beginning of the string and in the middle of the string. It gets a match anywhere in the string but returns a match object that corresponds only to the first match.
So, remember the difference between these two useful methods and don’t make the mistake of fighting your terminal trying to understand why a pattern you thought was well formed turned out not to be giving you a match object.
The bonus python regex method.
This is a bonus method because it is the one I use most often. It is quite different from the earlier two. Remember the earlier two only return a single match object or None where there is no match. The bonus method is python re.findall(). This method, re.findall(), will scan the string from left to right for matches and will return all the matched patterns as a list of strings. Not a match object, but a list of strings. That comes quite useful several times you might say. I just love this method. Here is some little code to illustrate this.
Notice that I am using the same code but just changing the methods.
So you can see how powerful re.findall() is. It gives you the ability to see all the matches in a list, something that re.match() and re.search() do not make possible.
I limited this post to just the rudimentary functionalities of all three methods. You can experiment with them now that you know how they work. Make out your own code with various concepts.
And don’t forget to subscribe to my blog so that you can get updated articles as I publish them daily. The submit textbox is at the topright.