collections is a built-in collection module in Python that provides many useful collection classes.
We know that a tuple can represent an immutable collection. For example, the 2D coordinates of a point can be expressed as:
>>> p = (1, 2)
However, looking at (1, 2), it’s hard to tell that this tuple represents a coordinate.
Defining a full class would be overkill—and this is where namedtuple comes in handy:
>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(1, 2)
>>> p.x
1
>>> p.y
2
namedtuple is a function that creates a custom tuple object, specifies the number of tuple elements, and allows accessing elements via attributes (instead of indexes).
This makes namedtuple a convenient way to define a data type that retains the immutability of a tuple while enabling attribute-based access—extremely easy to use.
We can verify that the created Point object is a subclass of tuple:
>>> isinstance(p, Point)
True
>>> isinstance(p, tuple)
True
Similarly, we can use namedtuple to define a circle with coordinates and a radius:
# namedtuple('Name', [attribute list]):
Circle = namedtuple('Circle', ['x', 'y', 'r'])
When using a list to store data, accessing elements by index is fast, but inserting/deleting elements is slow—list uses linear storage, so efficiency drops significantly with large datasets.
deque is a double-ended list optimized for fast insertions and deletions, ideal for queues and stacks:
>>> from collections import deque
>>> q = deque(['a', 'b', 'c'])
>>> q.append('x')
>>> q.appendleft('y')
>>> q
deque(['y', 'a', 'b', 'c', 'x'])
In addition to implementing list‘s append() and pop(), deque supports appendleft() and popleft(), enabling highly efficient addition/removal of elements at the head of the list.
When using a dict, accessing a non-existent Key raises a KeyError. To return a default value for missing keys, use defaultdict:
>>> from collections import defaultdict
>>> dd = defaultdict(lambda: 'N/A')
>>> dd['key1'] = 'abc'
>>> dd['key1'] # key1 exists
'abc'
>>> dd['key2'] # key2 does not exist, return default value
'N/A'
Note: The default value is returned by calling a function, which is passed when creating the defaultdict object.
Except for returning a default value for missing keys, defaultdict behaves exactly like a regular dict.
In a standard dict, keys are unordered—we cannot guarantee the order of keys when iterating over a dict.
To preserve key order, use OrderedDict:
>>> from collections import OrderedDict
>>> d = dict([('a', 1), ('b', 2), ('c', 3)])
>>> d # dict keys are unordered
{'a': 1, 'c': 3, 'b': 2}
>>> od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> od # OrderedDict keys are ordered
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
Important: OrderedDict sorts keys by insertion order, not by the keys themselves:
>>> od = OrderedDict()
>>> od['z'] = 1
>>> od['y'] = 2
>>> od['x'] = 3
>>> list(od.keys()) # Return keys in insertion order
['z', 'y', 'x']
OrderedDict can implement a FIFO (First-In-First-Out) dict that deletes the oldest key when capacity is exceeded:
from collections import OrderedDict
class LastUpdatedOrderedDict(OrderedDict):
def __init__(self, capacity):
super(LastUpdatedOrderedDict, self).__init__()
self._capacity = capacity
def __setitem__(self, key, value):
containsKey = 1 if key in self else 0
if len(self) - containsKey >= self._capacity:
last = self.popitem(last=False)
print('remove:', last)
if containsKey:
del self[key]
print('set:', (key, value))
else:
print('add:', (key, value))
OrderedDict.__setitem__(self, key, value)
ChainMap chains together a set of dicts to form a logical single dict. It is itself a dict, but when looking up keys, it searches internal dicts in order.
When is ChainMap most useful? For example: applications often accept parameters from command-line arguments, environment variables, or default values. ChainMap implements priority-based parameter lookup: first check command-line arguments, then environment variables, and finally use defaults if neither exists.
The code below demonstrates how to look up the user and color parameters:
from collections import ChainMap
import os, argparse
# Construct default parameters:
defaults = {
'color': 'red',
'user': 'guest'
}
# Construct command-line arguments:
parser = argparse.ArgumentParser()
parser.add_argument('-u', '--user')
parser.add_argument('-c', '--color')
namespace = parser.parse_args()
command_line_args = { k: v for k, v in vars(namespace).items() if v }
# Combine into a ChainMap:
combined = ChainMap(command_line_args, os.environ, defaults)
# Print parameters:
print('color=%s' % combined['color'])
print('user=%s' % combined['user'])
$ python3 use_chainmap.py color=red user=guest$ python3 use_chainmap.py -u bob color=red user=bob$ user=admin color=green python3 use_chainmap.py -u bob color=green user=bobCounter is a simple counter—for example, counting character occurrences:
>>> from collections import Counter
>>> c = Counter('programming')
>>> for ch in 'programming':
... c[ch] = c[ch] + 1
...
>>> c
Counter({'g': 2, 'm': 2, 'r': 2, 'a': 1, 'i': 1, 'o': 1, 'n': 1, 'p': 1})
>>> c.update('hello') # Can also update in one step
>>> c
Counter({'r': 2, 'o': 2, 'g': 2, 'm': 2, 'l': 2, 'p': 1, 'a': 1, 'i': 1, 'n': 1, 'h': 1, 'e': 1})
Counter is actually a subclass of dict—the results above show the count of each character.
The collections module provides useful collection classes that can be selected based on specific needs.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from collections import namedtuple
Point = namedtuple("Point", ["x", "y"])
p = Point(1, 2)
print("Point:", p.x, p.y)
from collections import deque
q = deque(["a", "b", "c"])
q.append("x")
q.appendleft("y")
print(q)
from collections import defaultdict
dd = defaultdict(lambda: "N/A")
dd["key1"] = "abc"
print("dd['key1'] =", dd["key1"])
print("dd['key2'] =", dd["key2"])
from collections import Counter
c = Counter("programming")
print(c)