While going through a Reddit group post on python recently, I saw a question that was similar to the element uniqueness problem and the way the answer was formulated struck me as odd. I thought the implementation was not optimal or efficient, so I volunteered an answer that I knew was better.
The question was: how do I check to see if my python list contains elements or items that are unique or not duplicates?
And the response code that was given was this:
def unique_bad(S):
''' S a list or sequence
returns True if all the elements are unique
'''
for i in range(len(S)):
for j in range(i+1, len(S)):
if S[i] == S[j]:
return False
return True
There is nothing wrong with the code because it really does the job, but it does the job badly. How do we know that? Let’s take the worst case scenario for the code. The code has two loops, the outer loop with the i variable and the inner loop with the j variable. Depending on the length of the sequence, the outer loop takes at least n-1 iterations in the worst case. The inner loop takes n(n+1)/2 iterations in the worst case owing from the fact that it does 1+2+3…+n-1+n iterations after the outer loop. So, in the asymptotic sense, the worst case running time for this code is O(n2). This is not really too bad. A quadratic running time for a program is fair enough. But I called it bad because there is a better way to implement this if we know the nature of the problem.
You can see that for each outer iteration, i, the code looks for a duplicate in the inner iteration, j, and this is very repetitive. In the fortunate case where two duplicates lie side by side, the code would exit early but if not, it would take a longer time. So, taking from this idea, why don’t we make it that if there were duplicates they should lie side by side. This could help us to write code that takes a shorter worst case running time.
In fact, we can do so by first sorting the list. When we sort the list, if there were duplicates they would lie side by side and then when we iterate through it looking for duplicates, we would catch duplicates very early.
That is what this better code seeks to do.
def unique_better(S):
''' S a list or sequence
returns True if all the elements are unique
'''
new = sorted(S)
for j in range(1, len(new)):
if new[j-1] == new[j]:
return False
return True
You can see that I sorted the python list or sequence first and then iterated through the list looking for duplicates. Now the loop runs in linear time rather than quadratic time. If you look through python’s documentation you would discover that the worst case running time of sorted function is O(n long n) while the loop is O(n) time so the code runs in O(n log n) time which is far better and more effective than the earlier code which runs in quadratic time.
So a little understanding of the nature of the problem can lead one to writing better code.
I just wanted to point this out because the question on Reddit intrigued me. It was an example of some of the problems that sorting can easily solve if only the nature of the problem was understood.
Happy pythoning.