Thursday, November 3, 2011

Combinations, Permutations, and Python Generators

The task of generating all permutations (or combinations) from a collection is an instructive exercise in recursion, and as such it is frequently the subject of introductory computer science courses. Upholding this hallowed tradition, here I use permutations and combinations as artifices in introducing the powerful python generators.1 But first let's gain a passing acquaintance with the recursive solution.

def permutations(string):
     if not string:
             return ['']
     ret = []
     for i, d in enumerate(string):
             perms = permutations(string[:i] + string[i+1:])
             for perm in perms:
                     ret.append(d + perm)
     return ret

As promised, the function is indeed recursive. For simplicity, I've opted to make the code only work on strings.2 The base case is simple enough: given an empty string, the only permutation is the empty string. Now for the recursive case we iterate over each character of the string and permute the complementary substring. Each recursive call to permutations() itself returns a list of permutations, so we must then go through these sub-permutations to construct the return list.

>>> permutations('vegetable')
['vegetable', 'vegetabel', 'vegetalbe', 'vegetaleb', 'vegetaebl', 'vegetaelb', 'vegetbale', 'vegetbael', 'vegetblae', 'vegetblea', ... ]

This recursive implementation demands that the function compute all permutations before returning the full list. As you may imagine, this is inconvenient. For instance, running the function above on the string 'vegetable' takes two and a half seconds on my computer and returns a rather unwieldy list of 362880 elements. While the running time is something hard to escape, for many tasks - such as those which don't require all permutations simultaneously - python generators will considerably ease the super exponential space requirements of the above algorithm.

So what are python generators? They look very similar to functions, but return an iterator object which supports the next() and __iter__() methods. However, don't be fooled by the superficial similarities. Internally, generators and functions are very different. Generators yield values rather than return them, which means their execution state is saved and halted, including entire stack frames. The ability to freeze recursive calls allows for powerful and expressive code.

def permutations(string):
    if not string:
        yield ''
    for i, d in enumerate(string):
        for perm in permutations(string[:i] + string[i+1:]):
            yield d + perm
>>> permutations('animal')
<generator object permutations at 0x11020d730>
>>> for perm in permutations('animal'):
... print perm
...
animal
animla
aniaml
... etc ...

Notice how the generator version is both more understandable and more efficient! Never is the entire list of permutations held in memory. The property of avoiding value generation until needed is called laziness and it is desirable. Many python constructs are lazy generators, including the enumerate() function which we have been using.

Generating all combinations is similar. In the combination case, instead of recurring on the entire complementary substring we require that additional letters be chosen from the right substring only. This imposes an implicit order on the choices, and is precisely what is needed to generate all combinations. Furthermore, we don't have an empty base case, instead any substring has for a combination the empty string. This illustrates an important distinction between yield and return. Return breaks out of its function, whereas yield falls through to the subsequent statements. Generators are not functions! Observe the operation of the next snippet.

def combinations(string):
     yield ''
     for i, d in enumerate(string):
             for comb in combinations(string[i+1:]):
                     yield d + comb

In the case of combinations, the very first value yielded is the empty string. When the iterator object is called once again, execution continues into the for loop, beginning the recursion in earnest.

>>> [comb for comb in combinations('mineral')]
['', 'm', 'mi', 'min', 'mine', 'miner', ... ]

I hope these simple examples are an adequate introduction to python generators. The generator construct can be used in many places where lazy iteration is the natural thing to do - such as traversing a tree, or generating disjoint values. As exemplified above, generators are both more powerful and more expressive than their function counterparts. In short, in matters vegetable, animal, and mineral these are the very model of a modern major generator.



1 If you actually need to compute permutations and combinations for a real purpose, I recommend you use the corresponding functions in the itertools module.
2 The implementation above uses string addition rather inefficiently, creating a great deal many of useless intermediate strings. As a didactic example, this is fine, but where performance is necessary consider using the idiomatic string.join() method.

2 comments:

  1. You've inspired me! I've posted on my blog as an excercise for me a complete copy of this in C#. Hope you don't mind.
    Yield Statement in C#

    ReplyDelete
  2. Great article!
    Just one little comment: I think in the calculation of the combination, the element itself is missing, i.e. 'ret.append(d)' resp. 'yield d'.

    ReplyDelete