collections

collections is a built-in collection module in Python that provides many useful collection classes.

namedtuple

We know that a tuple can represent an immutable collection. For example, the 2D coordinates of a point can be expressed as:

>>> p = (1, 2)

However, looking at (1, 2), it’s hard to tell that this tuple represents a coordinate.

Defining a full class would be overkill—and this is where namedtuple comes in handy:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(1, 2)
>>> p.x
1
>>> p.y
2

namedtuple is a function that creates a custom tuple object, specifies the number of tuple elements, and allows accessing elements via attributes (instead of indexes).

This makes namedtuple a convenient way to define a data type that retains the immutability of a tuple while enabling attribute-based access—extremely easy to use.

We can verify that the created Point object is a subclass of tuple:

>>> isinstance(p, Point)
True
>>> isinstance(p, tuple)
True

Similarly, we can use namedtuple to define a circle with coordinates and a radius:

# namedtuple('Name', [attribute list]):
Circle = namedtuple('Circle', ['x', 'y', 'r'])

deque

When using a list to store data, accessing elements by index is fast, but inserting/deleting elements is slow—list uses linear storage, so efficiency drops significantly with large datasets.

deque is a double-ended list optimized for fast insertions and deletions, ideal for queues and stacks:

>>> from collections import deque
>>> q = deque(['a', 'b', 'c'])
>>> q.append('x')
>>> q.appendleft('y')
>>> q
deque(['y', 'a', 'b', 'c', 'x'])

In addition to implementing list‘s append() and pop(), deque supports appendleft() and popleft(), enabling highly efficient addition/removal of elements at the head of the list.

defaultdict

When using a dict, accessing a non-existent Key raises a KeyError. To return a default value for missing keys, use defaultdict:

>>> from collections import defaultdict
>>> dd = defaultdict(lambda: 'N/A')
>>> dd['key1'] = 'abc'
>>> dd['key1'] # key1 exists
'abc'
>>> dd['key2'] # key2 does not exist, return default value
'N/A'

Note: The default value is returned by calling a function, which is passed when creating the defaultdict object.

Except for returning a default value for missing keys, defaultdict behaves exactly like a regular dict.

OrderedDict

In a standard dict, keys are unordered—we cannot guarantee the order of keys when iterating over a dict.

To preserve key order, use OrderedDict:

>>> from collections import OrderedDict
>>> d = dict([('a', 1), ('b', 2), ('c', 3)])
>>> d # dict keys are unordered
{'a': 1, 'c': 3, 'b': 2}
>>> od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> od # OrderedDict keys are ordered
OrderedDict([('a', 1), ('b', 2), ('c', 3)])

Important: OrderedDict sorts keys by insertion order, not by the keys themselves:

>>> od = OrderedDict()
>>> od['z'] = 1
>>> od['y'] = 2
>>> od['x'] = 3
>>> list(od.keys()) # Return keys in insertion order
['z', 'y', 'x']

OrderedDict can implement a FIFO (First-In-First-Out) dict that deletes the oldest key when capacity is exceeded:

from collections import OrderedDict

class LastUpdatedOrderedDict(OrderedDict):

    def __init__(self, capacity):
        super(LastUpdatedOrderedDict, self).__init__()
        self._capacity = capacity

    def __setitem__(self, key, value):
        containsKey = 1 if key in self else 0
        if len(self) - containsKey >= self._capacity:
            last = self.popitem(last=False)
            print('remove:', last)
        if containsKey:
            del self[key]
            print('set:', (key, value))
        else:
            print('add:', (key, value))
        OrderedDict.__setitem__(self, key, value)

ChainMap

ChainMap chains together a set of dicts to form a logical single dict. It is itself a dict, but when looking up keys, it searches internal dicts in order.

When is ChainMap most useful? For example: applications often accept parameters from command-line arguments, environment variables, or default values. ChainMap implements priority-based parameter lookup: first check command-line arguments, then environment variables, and finally use defaults if neither exists.

The code below demonstrates how to look up the user and color parameters:

from collections import ChainMap
import os, argparse

# Construct default parameters:
defaults = {
    'color': 'red',
    'user': 'guest'
}

# Construct command-line arguments:
parser = argparse.ArgumentParser()
parser.add_argument('-u', '--user')
parser.add_argument('-c', '--color')
namespace = parser.parse_args()
command_line_args = { k: v for k, v in vars(namespace).items() if v }

# Combine into a ChainMap:
combined = ChainMap(command_line_args, os.environ, defaults)

# Print parameters:
print('color=%s' % combined['color'])
print('user=%s' % combined['user'])

Without any arguments, default values are printed:bash运行$ python3 use_chainmap.py color=red user=guest
With command-line arguments, they take priority:bash运行$ python3 use_chainmap.py -u bob color=red user=bob
With both command-line arguments and environment variables, command-line arguments have higher priority:bash运行$ user=admin color=green python3 use_chainmap.py -u bob color=green user=bob

Counter

Counter is a simple counter—for example, counting character occurrences:

>>> from collections import Counter
>>> c = Counter('programming')
>>> for ch in 'programming':
...     c[ch] = c[ch] + 1
...
>>> c
Counter({'g': 2, 'm': 2, 'r': 2, 'a': 1, 'i': 1, 'o': 1, 'n': 1, 'p': 1})
>>> c.update('hello') # Can also update in one step
>>> c
Counter({'r': 2, 'o': 2, 'g': 2, 'm': 2, 'l': 2, 'p': 1, 'a': 1, 'i': 1, 'n': 1, 'h': 1, 'e': 1})

Counter is actually a subclass of dict—the results above show the count of each character.

Summary

The collections module provides useful collection classes that can be selected based on specific needs.

Reference Source Code

use_collections.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from collections import namedtuple

Point = namedtuple("Point", ["x", "y"])
p = Point(1, 2)
print("Point:", p.x, p.y)

from collections import deque

q = deque(["a", "b", "c"])
q.append("x")
q.appendleft("y")
print(q)

from collections import defaultdict

dd = defaultdict(lambda: "N/A")
dd["key1"] = "abc"
print("dd['key1'] =", dd["key1"])
print("dd['key2'] =", dd["key2"])

from collections import Counter

c = Counter("programming")
print(c)

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from collections import namedtuple

Point = namedtuple("Point", ["x", "y"])
p = Point(1, 2)
print("Point:", p.x, p.y)

from collections import deque

q = deque(["a", "b", "c"])
q.append("x")
q.appendleft("y")
print(q)

from collections import defaultdict

dd = defaultdict(lambda: "N/A")
dd["key1"] = "abc"
print("dd['key1'] =", dd["key1"])
print("dd['key2'] =", dd["key2"])

from collections import Counter

c = Counter("programming")
print(c)

Python for beginner

Curriculum