Python regex, or sometimes called python regular expressions, are expressions written in python that are made to match a specific pattern in a string. They are a widely used feature in the world of UNIX and is provided by many programming languages. Python is not left out. Some of the advantages of using python regex are that with just one pattern you can validate any kind of input. Something we will be doing in this post. It keeps your code cleaner because it usually involves fewer lines of code, and furthermore saves you the stress of writing numerous lines of if else statements.
If you want a guide to regular expressions in python and some functions that come with the use of python regex, I will encourage you to read it up in this post, that describes the basic syntax, and then this other post on the methods we will be using, the python re match method.
In today’s post, we are going to show how to use python regex to validate Roman numerals based on its rules.
Roman Numerals and Its Rules
Roman numerals are a numeral system that originated in ancient Rome. They were popular and became the usual ways of writing numbers even down to the late middle ages in Europe. The numbers use Latin alphabets to represent numbers and these alphabets are combined according to set rules. In the modern usage of Roman numerals, seven alphabets are used to designate numbers and they are:
Symbol | Value |
---|---|
I | 1 |
V | 5 |
X | 10 |
L | 50 |
C | 100 |
D | 500 |
M | 1000 |
Some of the rules for writing valid Roman numerals which we will be using for validation are:
- The Roman numerals I, X and C can be repeated up to 3 times in succession to form the numbers but repetition of V, L, or D is invalid.
- To form numbers a digit of lower value can be placed before or after the digit of higher value and digits of lower value that can be used for this are I, X, and C.
- You should add up all the digits in a group when a digit of lower value is placed after or to the right of a digit of higher value. Digits of similar values placed together are also added.
- Subtract the value of lower digit from the value of higher value when a digit of lower value is placed to the left or before a digit of higher value. Note that V is never written to the left of X.
So, now that we have the rules we need to form the python regular expressions, let’s do the Roman numerals validation which is the juicy part.
Validating any Roman numeral
When you run the code below, you need to input a string as a Roman numeral when you are prompted. You will get a result indicating whether the string is a valid Roman numeral or not. If it is an invalid Roman numeral, you will get a message that says: “Invalid Roman Numeral” but if it is valid, you will get a message that says: “Your roman numeral was valid. Welcome.”
Now, let’s run it and have fun. After you have tried running it, I will give a brief explanation of the lines of code. Note that this code takes only 8 lines. If I had needed to use a python if else statement, that would have taken more than that which would not be clean.
Now, that you have taken some time running the above code and seeing how it works, let me explain some of the parts. I think I don’t need to explain the python re match method because you have read it from the link I gave above. So, I will just explain the pattern.
The key to the pattern matching above is the python regex pattern which is denoted as:
regex_pattern = r"^(?=[MDCLXVI])M*(C[MD]|D?C{0,3})(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$"
The ^ symbol that starts the pattern states that we should start from the beginning of the string while the corresponding $ symbol at the end says that we end at the end of the string. So, we presume that each string passed to the code will only be a single regular expression pattern, otherwise you will get invalid code. Now after the ^ symbol is a lookahead assertion, (?=[MDCLXVI])). Read up this blog post on python lookahead assertions if you want a refresher.
What the python lookahead assertion does is that it says starting at the beginning of the string we want to look ahead and state that any symbol we will be getting must either be an M, D, C, L, X, V or I. Yes, the only symbols that should be allowed to start the string are the seven symbols of the Roman numerals and nothing else. Note that the characters in python lookahead assertion are not captured. So, right now, we have not captured any match.
The next symbol is to match the thousands place. I denote this with the pattern: M*. It states that for the thousands place in the number, we need to match for M either 0 or more times. If the number is not a thousand or multiple of it, then M is zero but if it is then M is 1 or more, so we get a match for this. Unfortunately, I cannot guarantee you that this pattern will match beyond 3999, this is because from 4000, we need a very special thousand Roman numeral symbol to denote this which the pattern cannot cover. But you can try 1999 (MCMXCIX) and see that it matches. Because of the limitation in the thousands place, we could replace M* above with M{0,3} to state that we cannot go beyond 3999.
The next symbol to match is the hundreds place from 100 to 999. I denote the hundreds place with (C[MD]|D?C{0,3}) pattern. What this pattern says is the for a hundred place match, either C (100), should be to the left of M (1000) or D(500), or C should come after an optional D (500), but not more than three consecutive Cs.
The next is the tens place which runs from 10 to 99. The symbol for it is: (X[CL]|L?X{0,3}). This states that the tens place can either be an X (10) before a C (100) or L (50), or it can come after an optional L (50) and if this is the case in not more than 3 consecutive Xs.
The next is the units place which is between 1 and 9. Remember there is no 0 in roman numerals. The symbol for it is: (I[XV]|V?I{0,3}). What the symbol is stating is that the units place is denoted either by an I (1) appearing to the left of an X (10) or V (5), or it appears to the right of an optional V (5) and if that is the case not more than 3 times.
Well, that is it. Enjoy validating your Roman numerals with this simple tool.
I hope you do leave a comment about your results.
Happy pythoning.
No comments:
Post a Comment
Your comments here!