Common Higher-order Functions

1. map/reduce

Python has built-in map() and reduce() functions.

If you have read Google’s famous paper, “MapReduce: Simplified Data Processing on Large Clusters,” you will have a general understanding of the map/reduce concept.

Let’s first look at map. The map() function takes two arguments: a function and an Iterable. map applies the passed function to each element of the sequence in turn and returns the results as a new Iterator.

For example, suppose we have a function f(x) = x^2 and we want to apply this function to the list [1, 2, 3, 4, 5, 6, 7, 8, 9]. We can do this using map() as follows:

f(x) = x * x

            f(x) = x * x

                  │
                  │
  ┌───┬───┬───┬───┼───┬───┬───┬───┐
  │   │   │   │   │   │   │   │   │
  ▼   ▼   ▼   ▼   ▼   ▼   ▼   ▼   ▼

[ 1   2   3   4   5   6   7   8   9 ]

  │   │   │   │   │   │   │   │   │
  │   │   │   │   │   │   │   │   │
  ▼   ▼   ▼   ▼   ▼   ▼   ▼   ▼   ▼

[ 1   4   9  16  25  36  49  64  81 ]

Now, let’s implement this in Python code:

>>> def f(x):
...     return x * x
...
>>> r = map(f, [1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> list(r)
[1, 4, 9, 16, 25, 36, 49, 64, 81]

The first argument passed to map() is f, the function object itself. Since the result r is an Iterator, and an Iterator is a lazy sequence, we use the list() function to compute the entire sequence and return it as a list.

You might be thinking that without the map() function, you could achieve the same result with a loop:

L = []
for n in [1, 2, 3, 4, 5, 6, 7, 8, 9]:
    L.append(f(n))
print(L)

Indeed, you can. However, can you immediately understand from the loop code that it “applies f(x) to each element of the list and generates a new list with the results”?

Therefore, as a higher-order function, map() actually abstracts the operation rule. Consequently, we can compute not only simple functions like f(x) = x^2 but also arbitrarily complex ones. For instance, converting all numbers in this list to strings:

>>> list(map(str, [1, 2, 3, 4, 5, 6, 7, 8, 9]))
['1', '2', '3', '4', '5', '6', '7', '8', '9']

It only takes one line of code.

Now, let’s look at the usage of reduce. reduce applies a function cumulatively to the items of a sequence [x1, x2, x3, ...]. This function must take two arguments. reduce continues the calculation by taking the result and the next element of the sequence. The effect is:

reduce(f, [x1, x2, x3, x4]) = f(f(f(x1, x2), x3), x4)

For example, summing a sequence can be achieved using reduce:

>>> from functools import reduce
>>> def add(x, y):
...     return x + y
...
>>> reduce(add, [1, 3, 5, 7, 9])
25

Of course, the summation operation can be done directly using Python’s built-in sum() function, so there’s no need to use reduce.

However, if you want to transform the sequence [1, 3, 5, 7, 9] into the integer 13579, reduce becomes useful:

>>> from functools import reduce
>>> def fn(x, y):
...     return x * 10 + y
...
>>> reduce(fn, [1, 3, 5, 7, 9])
13579

This example itself isn’t particularly useful. However, considering that a string (str) is also a sequence, with a slight modification to the above example and by combining it with map(), we can write a function to convert a str to an int:

>>> from functools import reduce
>>> def fn(x, y):
...     return x * 10 + y
...
>>> def char2num(s):
...     digits = {'0': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9}
...     return digits[s]
...
>>> reduce(fn, map(char2num, '13579'))
13579

We can organize this into a str2int function:

from functools import reduce

DIGITS = {'0': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9}

def str2int(s):
    def fn(x, y):
        return x * 10 + y
    def char2num(s):
        return DIGITS[s]
    return reduce(fn, map(char2num, s))

It can be further simplified using a lambda function:

from functools import reduce

DIGITS = {'0': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9}

def char2num(s):
    return DIGITS[s]

def str2int(s):
    return reduce(lambda x, y: x * 10 + y, map(char2num, s))

In other words, assuming Python did not provide the int() function, you could easily write your own function to convert a string to an integer in just a few lines of code!

The usage of lambda functions will be introduced later.

1.1 Summary

map is used to apply a function to a sequence, resulting in another sequence.
reduce is used to apply a function cumulatively to the result of the previous computation and the next element of the sequence, resulting in a single final value.

2. filter

Python’s built-in filter() function is used to filter sequences.

Similar to map(), filter() also takes a function and a sequence. However, unlike map(), filter() applies the passed function to each element in turn and then decides whether to keep or discard the element based on whether the return value is True or False.

For example, to remove even numbers from a list and keep only the odd ones, you can write:

def is_odd(n):
    return n % 2 == 1

list(filter(is_odd, [1, 2, 4, 5, 6, 9, 10, 15]))
# Result: [1, 5, 9, 15]

To remove empty strings from a sequence, you can write:

def not_empty(s):
    return s and s.strip()

list(filter(not_empty, ['A', '', 'B', None, 'C', '  ']))
# Result: ['A', 'B', 'C']

It can be seen that the key to using the higher-order function filter() lies in correctly implementing a “filtering” function.

Note that the filter() function returns an Iterator, which is a lazy sequence. Therefore, to force filter() to complete the computation and obtain all results, you need to use the list() function to return a list.

2.1 Finding Prime Numbers with `filter`

One method for finding prime numbers is the Sieve of Eratosthenes, and its algorithm is very simple to understand:

First, list all natural numbers starting from 2 to form a sequence:plaintext2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...
Take the first number in the sequence, 2, which is definitely a prime number. Then, use 2 to sieve out all multiples of 2 from the sequence:plaintext3, 5, 7, 9, 11, 13, 15, 17, 19, ...
Take the first number from the new sequence, 3, which is definitely a prime number. Then, use 3 to sieve out all multiples of 3 from the sequence:plaintext5, 7, 11, 13, 17, 19, 23, ...
Take the first number from the new sequence, 5, then use 5 to sieve out all multiples of 5 from the sequence:plaintext7, 11, 13, 17, 19, 23, ...
Continue sieving in this way, and you can obtain all prime numbers.

To implement this algorithm in Python, we can first construct an odd sequence starting from 3:

def _odd_iter():
    n = 1
    while True:
        n = n + 2
        yield n

Note that this is a generator and represents an infinite sequence.

Then, define a filtering function:

def _not_divisible(n):
    return lambda x: x % n > 0

Finally, define a generator that continuously returns the next prime number:

def primes():
    yield 2
    it = _odd_iter() # Initial sequence
    while True:
        n = next(it) # Return the first number of the sequence
        yield n
        it = filter(_not_divisible(n), it) # Construct a new sequence

This generator first returns the first prime number, 2, and then uses filter() to continuously generate new filtered sequences.

Since primes() is also an infinite sequence, you need to set a condition to exit the loop when calling it:

# Print prime numbers less than 100:
for n in primes():
    if n &lt; 100:
        print(n)
    else:
        break

Note that an Iterator is a lazily evaluated sequence. Therefore, we can use Python to represent sequences like “all natural numbers” or “all prime numbers” with very concise code.

2.2 Summary

The role of filter() is to select elements from a sequence that meet certain criteria. Because filter() uses lazy evaluation, the actual filtering and return of the next element only occur when the result of filter() is accessed.

3. sorted

Sorting is also an algorithm frequently used in programs. Whether using bubble sort or quick sort, the core of sorting is comparing the sizes of two elements. If they are numbers, we can compare them directly, but what if they are strings or two dictionaries? Directly comparing their mathematical sizes is meaningless; therefore, the comparison process must be abstracted through a function.

Python’s built-in sorted() function can sort a list:

>>> sorted([36, 5, -12, 9, -21])
[-21, -12, 5, 9, 36]

Furthermore, the sorted() function is also a higher-order function. It can accept a key function to implement custom sorting. For example, sorting by absolute value:

>>> sorted([36, 5, -12, 9, -21], key=abs)
[5, 9, -12, -21, 36]

The function specified by key is applied to each element of the list, and sorting is performed based on the values returned by this key function. Compare the original list with the list processed by key=abs:

Original list: [36, 5, -12, 9, -21]
Keys (after abs): [36, 5,  12, 9,  21]

The sorted() function then sorts based on these keys and returns the corresponding elements from the original list:

Sorted keys:   [5, 9,  12,  21, 36]
                |  |    |    |   |
Sorted result: [5, 9, -12, -21, 36]

Let’s look at another example of sorting strings:

>>> sorted(['bob', 'about', 'Zoo', 'Credit'])
['Credit', 'Zoo', 'about', 'bob']

By default, string sorting is based on ASCII values. Since ‘Z’ (ASCII 90) is less than ‘a’ (ASCII 97), uppercase ‘Z’ comes before lowercase ‘a’ in the result.

Now, let’s consider sorting while ignoring case, following alphabetical order. To implement this, we don’t need to significantly modify existing code; we just need a key function that maps strings to a representation suitable for case-insensitive comparison. Comparing two strings case-insensitively effectively means converting both to uppercase (or both to lowercase) first and then comparing them.

Thus, we can pass a key function to sorted() to achieve case-insensitive sorting:

>>> sorted(['bob', 'about', 'Zoo', 'Credit'], key=str.lower)
['about', 'bob', 'Credit', 'Zoo']

To perform reverse sorting, there’s no need to change the key function; we can pass a third parameter, reverse=True:

>>> sorted(['bob', 'about', 'Zoo', 'Credit'], key=str.lower, reverse=True)
['Zoo', 'Credit', 'bob', 'about']

From the above examples, it’s clear that the abstraction capability of higher-order functions is very powerful, and the core code can remain extremely concise.

3.1 Summary

sorted() is also a higher-order function. The key to sorting with sorted() lies in implementing a mapping function (the key function).

Exercise

Suppose we represent student names and grades using a list of tuples:

L = [('Bob', 75), ('Adam', 92), ('Bart', 66), ('Lisa', 88)]

Please use sorted() to sort the above list by name:

L = [('Bob', 75), ('Adam', 92), ('Bart', 66), ('Lisa', 88)]

def by_name(t):
    pass

L2 = sorted(L, key=by_name)
print(L2)

Then sort by score from highest to lowest:

L = [('Bob', 75), ('Adam', 92), ('Bart', 66), ('Lisa', 88)]

def by_score(t):
    pass

L2 = sorted(L, key=by_score)
print(L2)

Reference Source Code

do_sorted.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from operator import itemgetter

L = ["bob", "about", "Zoo", "Credit"]

print(sorted(L))
print(sorted(L, key=str.lower))

students = [("Bob", 75), ("Adam", 92), ("Bart", 66), ("Lisa", 88)]

print(sorted(students, key=itemgetter(0)))
print(sorted(students, key=lambda t: t[1]))
print(sorted(students, key=itemgetter(1), reverse=True))

Python for beginner

Curriculum