Knowing how to optimize code can help you in a pinch. You can practice with coding challenges, and HackerRank is a good source of them. (This is not a sponsored post).
Some of the problems come with large test cases of their own that will force you to optimize your solutions.
Here’s one: the Piling Up! problem.
Let me paraphrase:
Given a row of cubes of different sizes. Determine if it’s possible to stack them by pulling cubes from the front or the back of the row. You can’t pull cubes from the middle, and you can’t place a cube on the stack if it is bigger than the one below it.
I’ve illustrated a few examples.
[4, 3, 2, 1, 3, 4]
We can stack these cubes, because the cubes on the ends always get smaller:
What about [1, 3, 2]?
We can’t stack these, because there’s no way to keep the three below the one and the two.
So how do we solve this problem?
The Most Naïve Approach
Convince yourself that you are working in clay, not marble, on paper not eternal bronze: Let that first sentence be as stupid as it wishes.
-Jacques Barzun (on writer’s block)
Every time we pick a cube to stack, we can choose from the left or the right. We can draw this out like a tree.
You can always walk a finite tree with recursive backtracking. We can just pick cubes from the left and backtrack when we hit a dead end.
Here’s what it looks like:
import sys def stackable_recursive(cubes, base=None): if not len(cubes): return True if base is None: base = sys.maxsize left = cubes.pop(0) if left <= base and stackable_recursive(cubes, left): return True cubes.insert(0, left) right = cubes.pop() if right <= base and stackable_recursive(cubes, right): return True cubes.append(right) return False
Boom! Easy. Except it’s s. l. o. w. . . . So slow that it can’t finish some of the test cases from HackerRank.
We’re just trying every possible choice blindly. For a list of n cubes, that could become O(n2) operations. At least we’re only using O(n) memory.
To get the worst possible performance out of this, we would need a list of cubes that starts with half of the cubes being the same size and counts up from one. For example: [3, 3, 3, 1, 2, 3, 4, 5]. This way, we have to traverse the list and backtrack a few times.
Let’s try this with a list of 10,000 cubes. Here’s some code to generate the test case:
print(1) n = int(10_000) print(n) cubes = [] for n in range(0, n//2): cubes.append(3) for n in range(1, n//2 + 1): cubes.append(n) print(' '.join([str(x) for x in cubes]))
Here’s how I use it:
$ python create_test.py | python -m cProfile stackable_blocks.py
Here’s how our algorithm performs:
RecursionError: maximum recursion depth exceeded while calling a Python object
Bummer.. Well, let’s bump up the recursion limit.
sys.setrecursionlimit(max(n + 10, sys.getrecursionlimit())) if stackable_recursive(_cubes): print("Yes") else: print("No")
Okay, how does that do?
😨Almost two minutes.
There’s got to be a better way!
Pruning the Branches
Half of the troubles of this life can be traced to saying yes too quickly and not saying no soon enough.
Josh Billings (probably a psuedonym, but nobody knows for sure)
I broke a rule here. I wrote this script without ironing out the algorithm. You can save so much time by working through your algorithms before you write code.
What if we could eliminate some choices along the way? We could save ourselves from stepping through every possibility.
We always want the bigger cubes on the bottom. That means we can always pull the biggest cube from either side of the list.
import sys def stackable_recursive_pruned(cubes, base=None): if not len(cubes): return True if base is None: base = sys.maxsize left = cubes[0] if left > base: # we're doomed if any cubes are bigger than the base return False right = cubes[-1] if right > base: return False if left > right: left = cubes.pop(0) return stackable_recursive_pruned(cubes, left) else: right = cubes.pop() return stackable_recursive_pruned(cubes, right)
Better! Now we’re down to O(n) operations.
Incredible! A simple algorithm change brought our time down by four orders of magnitude!
At 100,000 cubes, we get an error:
Segmentation fault
This happens when the recursion limit is too high and there are too many functions on the stack.
Let’s check the documentation:
sys.setrecursionlimit(limit)
Set the maximum depth of the Python interpreter stack to limit. This limit prevents infinite recursion from causing an overflow of the C stack and crashing Python.
The highest possible limit is platform-dependent. A user may need to set the limit higher when they have a program that requires deep recursion and a platform that supports a higher limit. This should be done with care, because a too-high limit can lead to a crash.
https://docs.python.org/3.8/library/sys.html#sys.setrecursionlimit
So, recursion breaks down here. Let’s turn to an iterative approach.
Converting from Recursion to Iteration
The proverbial German phenomenon of the verb-at-the-end about which droll tales of absentminded professors who would begin a sentence, ramble on for an entire lecture, and then finish up by rattling off a string of verbs by which their audience, for whom the stack had long since lost its coherence, would be totally nonplussed, are told, is an excellent example of linguistic recursion.
“Gödel, Escher, Bach: an Eternal Golden Braid”. Book by Douglas Hofstadter, 1979.
Note how much information you need to keep in mind while reading that quote. Recursion is like that.
We want to process massive lists of cubes here. For that, we need to use iteration.
def stackable_iterative(cubes): stack = [sys.maxsize] while len(cubes) > 0: if cubes[0] > stack[-1]: # left > base return False if cubes[-1] > stack[-1]: # right > base return False if cubes[0] > cubes[-1]: # left > right stack.append(cubes.pop(0)) else: stack.append(cubes.pop(-1)) return True
Okay, we can run the script again. But wait! Isn’t this algorithm at O(n) complexity? We’re processing 10x more cubes, but the run time shot up 1000x.
Why are we spending so much time popping cubes off of the list?
The Right Tool for the Job
Don’t bring a knife to a gun fight.
When you see an underperforming data structure, dig deeper. documentation is your friend. You never know what’s lurking under the hood. Here’s what the python docs say about lists:
5.1.2. Using Lists as Queues
It is also possible to use a list as a queue, where the first element added is the first element retrieved (“first-in, first-out”); however, lists are not efficient for this purpose. While appends and pops from the end of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be shifted by one).
To implement a queue, use
https://docs.python.org/3/tutorial/datastructures.html#using-lists-as-queuescollections.deque
which was designed to have fast appends and pops from both ends.
That explains why cubes.pop(0)
is so slow: python lists are terrible at popping items from the beginning, because everything has to be shifted over.
Deques are designed to be good at handling modifications from either side. So, let’s use a deque instead:
import sys from collections import deque def stackable_iterative_with_deque(cubes): stack = [sys.maxsize] cubes = deque(cubes) while len(cubes) > 0: if cubes[0] > stack[-1]: return False if cubes[-1] > stack[-1]: return False if cubes[0] > cubes[-1]: stack.append(cubes.popleft()) else: stack.append(cubes.pop()) return True
Much better. Another 100x improvement.
Just changing the data structure can massively speed up your code.
Let’s try for 10 million. There’s still one last technique to go over.
Much of our time is spent reading the test input, and I’m not interested in optimizing that. So, let’s modify our script so we’re only testing the algorithm.
import cProfile with cProfile.Profile() as pr: is_stackable = stackable(_cubes) pr.print_stats(sort=True)
Now, we’re only measuring the algorithm. Again, how does it do with 10 million cubes?
We’re spending lots of time modifying data structures and checking their length. Let’s fix that.
YAGNI: Ya Ain’t Gonna Need It!
Eternal Tao doesn’t do anything, yet it leaves nothing undone.
Translated by Brian Browne Walker, 1996, Chapter 37
If you abide by it, everything in existence will transform itself.
I lied to you about data structures. Sometimes, the best option is to get rid of them.
You hear it all the time. Throwing the right data structure at a problem will give you the best results. It can feel like reaching for a raft in the ocean. Only, you don’t realize that you are in shallow water. You can stand up!
Sure, deques handle modifications better than lists, but why modify anything at all?
We don’t need to keep the stack. We just need the top of it.
We also don’t need to modify the list of cubes, either. We can use indexes to remember where we are and scan it.
def stackable_indexed(cubes): previous = sys.maxsize left_index = 0 right_index = len(cubes) - 1 while left_index != right_index: if cubes[left_index] > previous: return False if cubes[right_index] > previous: return False if cubes[left_index] > cubes[right_index]: previous = cubes[left_index] left_index += 1 else: previous = cubes[right_index] right_index -= 1 return True
And there you have it! We got another 4x improvement by removing the stack and avoiding modifications to the list.
At this point, most of the time is spent on processing the input.
Some key lessons for optimizing python code:
- Think through your algorithm before you start coding. Write it down if you have to. Evaluate the computational and memory Big O values. Pick some data structures to work with.
- See if there is anything you can do in your algorithm to skip steps or eliminate possibilities early. This is a great point to ask yourself if you can use a math formula instead of iteration.
- Prefer an iterative approach over a recursive one if you’re working in a language that calls for it. In functional programming, recursion is your friend, but this is python. 🙂
- Check your data structures. Would a different one perform better? Do you really need something fancy? Are you over-engineering?
- What operations can you remove? Do you have to modify that object? Do you really need to hold all of that data in memory? Can you discard values as you go?
- Big O will help you, but keep in mind that scripts performing 1000000n or 2n are both O(n). Don’t rely on it for everything.
- The most important lesson of all: measure your code’s performance. Test it multiple times and get a distribution if you need to. Try different inputs. Think about your worst-case scenarios. Don’t just guess why it’s slow.
Share your own strategies to optimize python code below!