Collatz Conjecture

Recently I had an interesting problem to solve on the Collatz Conjecture and it started a conversation amongst some of my friends on set theory and math’s ability to reduce the complexity of coding problems. The Collatz Conjecture is described simply as an unsolved problem in mathematics. From wikipedia: Start with any positive integer n. Then each term is obtained from the previous term as follows: if the previous term is even, the next term is one half of the previous term. If the previous term is odd, the next term is 3 times the previous term plus 1. The conjecture is that no matter what value of n, the sequence will always reach 1. Its namesake comes from Lothar Collatz, a mathematician who introduced the idea in 1937. It’s history and the history of mathematicians attempting to prove it false if fascinating and can be read just a bit deeper on the wiki for the problem. However ths focus of this is piece in on the problem I had an opportunity to solve.

So first let’s make the conjecture just a bit easier to understand. If it was a piecewise function it would be as follows:
n/2 if n=even,
3n+1 if n=odd

From a coding perspective this is pretty simple to implement. You could make a simple function that acts as follows.

function collatz(n) {
if ( n == 1) { return; }
else if ( n%2 == 0) { return collatz(n/2) }
else { return collatz(3*n+1) }
}

From here any number we enter into the collatz in theory will return 1 to us eventually. But that isn’t much of a problem is it. So let’s talk about the actual problem.

Given a range between 1 and 1,000,000, calculate the number of operations it takes using the Collatz Conjecture to reduce a number to 1. Finally find the which number between 1 and 1,000,000 took the most operations to equal 1.

Now we have an interesting coding problem. As coders doing these sorts of problems we are always thinking of processing speed, memory, and how well this could scale with use on our system and this can really get at the heart of those core issues.

Like all problems, it is useful to brute force it first. Taken the same function as written above let’s expand it and write a loop to call it.

Function collatz(n, operations) {
if ( n == 1) { return operations; }
else if ( n%2 == 0) { return collatz(n/2, operations + 1) }
else { return collatz(3*n+1, operations + 1) }
}

Function callCollatz = () {
operations = 0
for (i = 1000000; i>=1; i–) {
temp = collatz(i, operations);
if (temp > operations) { operations = temp }
}
return operations;
}

callCollatz();

No this is a first approach counting down from 1,000,000 and calculating the length of operations for each number then return the longest. Time wise this is O(a * b) where a is the for loop iterations, and b is the amount of iterations the recursive collatz function takes. Spacewise is a different story though, We are really only abusing space when we attempt the recursive function and this is O(n) space.

Left simply as is we have a quick and dirty way to get the number of operations, however this is where simple diverges. We can make this faster in a few ways.

First why did I initially choose to iterate down from 1,000,000. Well that’s because we could attempt a divide and conquer strategy similar to merge sort and have multiple recursive loops breaking the number in half recursively and then calculating the solution.
This approach really doesn’t optimize much though.

Instead the function should be changed to iterate up from 1 to 1,000,000. But why though. Well we know that each iteration of the Collatz conjecture has a number of operations it takes to get to 1. 1 takes 1 operation, 2 takes 2, 4 takes 3, 3 takes 7. Let’s look at it in sets.
1 -> [1] = 1
2 -> [2, 1] = 2
3 -> [3, 10, 5, 16, 8, 4, 2, 1] = 8
4 -> [4, 2, 1] = 3
5 -> [5, 16, 8, 4, 2, 1] = 6

You may notice that when we are calculating these results there is some overlap in the adding. For instance 4 takes 3 operations, and 5 takes 6.

5 goes to 16, then 8, then 4, 2 1. We don’t need to iterate during the 4, 2, 1 part however since we already have attempted that starting point and know its total number of operations. This means that if we just store the operations a number takes to reach 1 in a dictionary like data structure, we can check the dictionary for whether we have crossed that number before and if we did just add its number of operations to our current number and return it.

This can reduce the amount of processing time by a significant number, and has the advantage of getting faster with more iterations over the function.

So that handles the amount of operations and ultimately the processing time, but what about other things that we can do to reduce the load.

Well with a large chunk of the main functionality already optimized We can look at the actual math occuring in the problem. Instead of approaching the problem using the adding and dividing math functions(dividing having a history with being a bit processor intensive. We can approach it with bit shifting and manipulation. If the number is odd we can add a bit to the number and add the new number to the old number. If there are trailing 0s aka the first bit is 0 we can just drop shift that bit altogether which is the equivalent of dividing by 2.

Using a bit shifting approach seems minor but it really does add up to better performance if implemented correctly, and this problem is ideal for that type of system.

So as you can see the Collatz conjecture occupies an interesting problem as a simple cs problem that can be open for a depth of complexity in the implementation. For those reading this, how would you approach this problem.