Introduction to itertools

David Radcliffe
April 14, 2016

What is an iterator?

An iterator is an object that supports two methods:

  1. __next__ returns the next value from the iterator.
  2. __iter__ returns the iterator itself.

An iterator behaves like a list of values, with some important differences.

  1. The values are generated on demand.
  2. The values can only be accessed sequentially.
  3. The values can only be accessed once.

Example

it = iter('PYTHON')

for n, char in enumerate(it):
    print n, char

for char in enumerate(it): # Does nothing.
    print n, char
0 P
1 Y
2 T
3 H
4 O
5 N

Generators

A generator function is a function that has yield statements instead of return statements.

A generator function returns a generator object, which is a kind of iterator.

def fibonacci(n): 
    a, b = 0, 1
    for i in xrange(n):
        yield b
        a, b = b, a+b

print list(fibonacci(12))
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144]

Generator expressions

A generator expression looks like a list comprehension, but it is enclosed by parentheses instead of square brackets.

import re
from collections import Counter

with open('shake-it-off-lyrics.txt') as f:
    wordlist = Counter(word.lower() 
        for line in f 
        for word in re.split('\W+', line)
        if word != '')

Itertools

Itertools is a module for building iterators. It is part of the Python Standard Library.

The count function does what it says. It counts forever from a given starting value. The default starting value is 0.

Since it creates an infinite loop, be sure to have a break or return statement inside the loop.

from itertools import count
for n in count(1):
    print n
    if n == 10:
        break

islice

islice(iterable, stop)

islice(iterable, start, stop[, step])

Make an iterator that selects elements from an iterable.

from itertools import islice
from string import letters
for letter in islice(letters, 5):
     print letter
a
b
c
d
e

ifilter

ifilter(predicate, iterable)

Make an iterator that filters elements from iterable returning only those for which the predicate is true. If predicate is None, return the items that are true.

Like filter, but it returns an iterator instead of a list.

from itertools import ifilter
even = lambda x: x % 2 == 0
print list(ifilter(even, range(10)))
[0, 2, 4, 6, 8]

imap

imap(function, *iterables)

Make an iterator by applying a function to the elements of one or more iterables.

Like map, but returns an iterator instead of a list.

from itertools import imap
multiply = lambda x,y: x*y
print list(imap(multiply, range(10), 
                range(10)))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Product

The product function makes an iterator from the cartesian product of two or more iterables. It is equivalent to a nested for-loop.

from itertools import product, islice
faces = ('Ace', 'King', 'Queen', 'Jack')
suits = ('Spades', 'Hearts', 'Diamonds', 
        'Clubs')
print list(islice(product(faces, suits), 6))
[('Ace', 'Spades'), ('Ace', 'Hearts'), ('Ace', 'Diamonds'), ('Ace', 'Clubs'), ('King', 'Spades'), ('King', 'Hearts')]

Cartesian power

To compute the product of an iterable with itself, use the optional repeat argument.

from itertools import product, islice
from string import letters
passwords = (''.join(seq)
    for seq in product(letters, repeat=6))
print list(islice(passwords, 12))
['aaaaaa', 'aaaaab', 'aaaaac', 'aaaaad', 'aaaaae', 'aaaaaf', 'aaaaag', 'aaaaah', 'aaaaai', 'aaaaaj', 'aaaaak', 'aaaaal']

Combinations

combinations(iterable, r)

Return r length subsequences of elements from an iterable.

# Generate all Powerball combinations.

from itertools import product, combinations
white = range(1, 70)
red = range(1, 27)
powerball_combos = product(
    combinations(white, 5), red)

print sum(1 for combo in powerball_combos)
292201338

Groupby

groupby(iterable[, key])

Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element. The key defaults to an identity function.

from itertools import groupby
def longest_run(sequence):
    return max(sum(1 for _ in group) 
        for key, group in groupby(sequence))

coin_tosses = 'HTHHTTTHHHHHTTTHTTT'
print longest_run(coin_tosses)
5