Thursday, November 3, 2011

Combinations, Permutations, and Python Generators

The task of generating all permutations (or combinations) from a collection is an instructive exercise in recursion, and as such it is frequently the subject of introductory computer science courses. Upholding this hallowed tradition, here I use permutations and combinations as artifices in introducing the powerful python generators.1 But first let's gain a passing acquaintance with the recursive solution.

def permutations(string):
     if not string:
             return ['']
     ret = []
     for i, d in enumerate(string):
             perms = permutations(string[:i] + string[i+1:])
             for perm in perms:
                     ret.append(d + perm)
     return ret

As promised, the function is indeed recursive. For simplicity, I've opted to make the code only work on strings.2 The base case is simple enough: given an empty string, the only permutation is the empty string. Now for the recursive case we iterate over each character of the string and permute the complementary substring. Each recursive call to permutations() itself returns a list of permutations, so we must then go through these sub-permutations to construct the return list.

>>> permutations('vegetable')
['vegetable', 'vegetabel', 'vegetalbe', 'vegetaleb', 'vegetaebl', 'vegetaelb', 'vegetbale', 'vegetbael', 'vegetblae', 'vegetblea', ... ]

This recursive implementation demands that the function compute all permutations before returning the full list. As you may imagine, this is inconvenient. For instance, running the function above on the string 'vegetable' takes two and a half seconds on my computer and returns a rather unwieldy list of 362880 elements. While the running time is something hard to escape, for many tasks - such as those which don't require all permutations simultaneously - python generators will considerably ease the super exponential space requirements of the above algorithm.

So what are python generators? They look very similar to functions, but return an iterator object which supports the next() and __iter__() methods. However, don't be fooled by the superficial similarities. Internally, generators and functions are very different. Generators yield values rather than return them, which means their execution state is saved and halted, including entire stack frames. The ability to freeze recursive calls allows for powerful and expressive code.

def permutations(string):
    if not string:
        yield ''
    for i, d in enumerate(string):
        for perm in permutations(string[:i] + string[i+1:]):
            yield d + perm
>>> permutations('animal')
<generator object permutations at 0x11020d730>
>>> for perm in permutations('animal'):
... print perm
... etc ...

Notice how the generator version is both more understandable and more efficient! Never is the entire list of permutations held in memory. The property of avoiding value generation until needed is called laziness and it is desirable. Many python constructs are lazy generators, including the enumerate() function which we have been using.

Generating all combinations is similar. In the combination case, instead of recurring on the entire complementary substring we require that additional letters be chosen from the right substring only. This imposes an implicit order on the choices, and is precisely what is needed to generate all combinations. Furthermore, we don't have an empty base case, instead any substring has for a combination the empty string. This illustrates an important distinction between yield and return. Return breaks out of its function, whereas yield falls through to the subsequent statements. Generators are not functions! Observe the operation of the next snippet.

def combinations(string):
     yield ''
     for i, d in enumerate(string):
             for comb in combinations(string[i+1:]):
                     yield d + comb

In the case of combinations, the very first value yielded is the empty string. When the iterator object is called once again, execution continues into the for loop, beginning the recursion in earnest.

>>> [comb for comb in combinations('mineral')]
['', 'm', 'mi', 'min', 'mine', 'miner', ... ]

I hope these simple examples are an adequate introduction to python generators. The generator construct can be used in many places where lazy iteration is the natural thing to do - such as traversing a tree, or generating disjoint values. As exemplified above, generators are both more powerful and more expressive than their function counterparts. In short, in matters vegetable, animal, and mineral these are the very model of a modern major generator.

1 If you actually need to compute permutations and combinations for a real purpose, I recommend you use the corresponding functions in the itertools module.
2 The implementation above uses string addition rather inefficiently, creating a great deal many of useless intermediate strings. As a didactic example, this is fine, but where performance is necessary consider using the idiomatic string.join() method.

Sunday, September 11, 2011

Easy memoization with decorators in Python

It is often the case that a function's performance can be drastically improved if it remembers previous computations. This kind of optimization is broadly called "memoization" and it is widely applicable to many different algorithms - especially dynamic programming ones1.

However, it is also often the case that reaching into a function and altering its logic can be time consuming and frustrating - especially a poorly commented function written many hundreds of years ago by a now defunct software-and-beer conglomerate. Despite such difficulties, we can resort to Python's powerful decorator construct to retrofit previously forgetful functions with memoization.

There are various ways to build a memoizing decorator. Here I opt to use a sweet and classy version2.

class cached:
   def __init__(self, f):
      self.f = f
      self.cache = {}
   def __call__(self, *args):
      if args not in self.cache:
         self.cache[args] = self.f(*args)
      return self.cache[args]

We can now apply this decorator to any function and instantly imbue it with memoization. For instance, computing the nth Fibonacci number recursively is only a modestly bad idea:

def fib(n):
   return 1 if n < 2 else fib(n-1) + fib(n-2)
print fib(200)
The 200th Fibonacci number is computed instantly, whereas the naive approach would take well past the heat death of the universe.

The program above works roughly like this: The function defined under @cached is used to initialize an instance of cached, which then replaces it (observe the __call__ method implementation). The end result is that all occurrences of recursive calls are transparently captured by the memoizing decorator.

Unfortunately this approach has some shortcomings. Though it can cope with a variable number of arguments, cached cannot handle keyword arguments; and more disconcerting, the reliance on a dictionary cache precludes non-hashable input! Furthermore, if unfettered, the cache can grow in size indefinitely, presenting a serious problem. Still, the ease of implementation and the ease of use of this caching decorator has kept me content.

1 Implementing dynamic programming algorithms using this caching decorator is very easy and convenient, but for many algorithms this approach will have quadratic memory demands where linear space would suffice.
2 The method presented here makes significant use of syntactic sugar. There are yet other ways to write a caching decorator in Python, but this variety is the subject of another post.