Using dict and set

`dict`

Python has a built-in dictionary: dict support. A dict, short for dictionary, is also known as a map in other languages. It uses key-value pairs for storage and offers extremely fast lookups.

For example, suppose we want to look up a student’s score by their name. Using a list, we would need two lists:

names = ['Michael', 'Bob', 'Tracy']
scores = [95, 75, 85]

To find the score for a given name, we first need to find its position in names and then retrieve the corresponding score from scores. The longer the list, the longer this takes.

Using a dict, we only need a single “name”-“score” lookup table. We can directly find the score by name, and the lookup speed remains fast regardless of the table size. Here’s how to write a dict in Python:

>>> d = {'Michael': 95, 'Bob': 75, 'Tracy': 85}
>>> d['Michael']
95

Why is dict lookup so fast? Because its underlying implementation is analogous to looking up a word in a physical dictionary. Imagine a dictionary with 10,000 Chinese characters. One way to find a character is to flip through the pages from the beginning until you find it. This is how element lookup works in a list—the larger the list, the slower the search.

A second, much faster method is to first check the dictionary’s index table (e.g., the radical index) to find the page number for the character, then flip directly to that page. This method is extremely fast for any character and does not slow down as the dictionary grows larger.

A dict uses this second method. Given a key like 'Michael', the dict internally calculates the “page number” where the corresponding value (95) is stored (i.e., its memory address) and retrieves it directly, resulting in very fast access.

As you might guess, for this key-value storage to work, when storing a value, its position must be calculated based on the key. This allows the value to be retrieved directly using the key later.

To add data to a dict, besides specifying it during initialization, you can assign a value to a key:

>>> d['Adam'] = 67
>>> d['Adam']
67

Since a key can only map to one value, assigning a new value to an existing key will overwrite the old one:

>>> d['Jack'] = 90
>>> d['Jack']
90
>>> d['Jack'] = 88
>>> d['Jack']
88

If a key does not exist in the dict, accessing it will raise a KeyError:

>>> d['Thomas']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Thomas'

To avoid errors when a key does not exist, there are two methods:

Use the in operator to check if the key exists:

>>> 'Thomas' in d
False

Use the get() method provided by dict. If the key does not exist, it can return None or a default value that you specify:

>>> d.get('Thomas')
>>> d.get('Thomas', -1)
-1

Note: In Python’s interactive environment, None is not displayed as an output.

To delete a key, use the pop(key) method. The corresponding value will also be removed from the dict:

>>> d.pop('Bob')
75
>>> d
{'Michael': 95, 'Tracy': 85}

Important: The order in which items are stored internally in a dict is not related to the order in which keys were added.

Compared to a list, a dict has the following characteristics:

Extremely fast for lookups and insertions; speed is not affected by the number of keys.
Requires a lot of memory, leading to higher memory overhead.

In contrast, a list has the opposite characteristics:

Lookup and insertion times increase as the number of elements grows.
Uses less memory with minimal overhead.

Therefore, a dict is a space-time trade-off, trading memory efficiency for faster access.

Dictionaries are useful in many scenarios requiring high-speed lookups and are ubiquitous in Python code. Using them correctly is important. The first rule to remember is that the keys of a dict must be immutable objects.

This is because the dict uses the key to calculate the storage position of the value (via a hash algorithm). If the same key could yield different hash values, the internal structure of the dict would become corrupted.

To ensure the correctness of the hash calculation, the key object must not change. In Python, strings, integers, and similar types are immutable and can therefore be safely used as keys. A list, however, is mutable and cannot be used as a key:

>>> key = [1, 2, 3]
>>> d[key] = 'a list'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

`set`

A set is similar to a dict in that it is a collection of keys, but it does not store values. Since keys cannot be duplicated, a set contains no duplicate elements.

To create a set, list its elements within curly braces {}:

>>> s = {1, 2, 3}
>>> s
{1, 2, 3}

Alternatively, you can provide a list as input to the set() constructor:

>>> s = set([1, 2, 3])
>>> s
{1, 2, 3}

Note: The input [1, 2, 3] is a list, while the displayed output {1, 2, 3} merely indicates that the set contains these three elements. The order of display does not imply that the set is ordered.

Duplicate elements are automatically filtered out in a set:

>>> s = {1, 1, 2, 2, 3, 3}
>>> s
{1, 2, 3}

Elements can be added to a set using the add(key) method. Adding an existing element again has no effect:

>>> s.add(4)
>>> s
{1, 2, 3, 4}
>>> s.add(4)
>>> s
{1, 2, 3, 4}

Elements can be removed using the remove(key) method:

>>> s.remove(4)
>>> s
{1, 2, 3}

A set can be viewed as an unordered collection of unique elements in the mathematical sense. Therefore, two sets can perform mathematical operations like intersection and union:

>>> s1 = {1, 2, 3}
>>> s2 = {2, 3, 4}
>>> s1 & s2  # Intersection
{2, 3}
>>> s1 | s2  # Union
{1, 2, 3, 4}

The only difference between a set and a dict is that a set does not store corresponding values. However, set shares the same underlying principles as dict. Consequently, mutable objects cannot be added to a set either. This is because it’s impossible to reliably check the equality of mutable objects, making it impossible to guarantee the uniqueness of elements within the set. Try adding a list to a set and observe the error.

Revisiting Immutable Objects

As discussed earlier, a str is an immutable object, while a list is a mutable object.

For a mutable object like a list, operations on it will change its internal contents:

>>> a = ['c', 'b', 'a']
>>> a.sort()
>>> a
['a', 'b', 'c']

What happens when we perform an operation on an immutable object like a str?

>>> a = 'abc'
>>> a.replace('a', 'A')
'Abc'
>>> a
'abc'

Although the string has a replace() method that seemingly produces 'Abc', the variable a still holds the original value 'abc'. How should this be understood?

Let’s rephrase the code:

>>> a = 'abc'
>>> b = a.replace('a', 'A')
>>> b
'Abc'
>>> a
'abc'

It’s crucial to remember that a is a variable, while 'abc' is the actual string object. Sometimes we say “the content of object a is 'abc'“, but more precisely, a is a variable that points to (or references) an object whose content is 'abc':

┌───┐     ┌───────┐
│ a │────▶│ 'abc' │
└───┘     └───────┘

When we call a.replace('a', 'A'), we are actually invoking the replace method on the string object 'abc'. Despite its name, this method does not modify the original string 'abc'. Instead, it creates and returns a new string object 'Abc'. If we assign this new object to the variable b, the relationship becomes clear: a still points to the original string 'abc', while b points to the new string 'Abc':

┌───┐     ┌───────┐
│ a │────▶│ 'abc' │
└───┘     └───────┘
┌───┐     ┌───────┐
│ b │────▶│ 'Abc' │
└───┘     └───────┘

Thus, for immutable objects, calling any of their methods will not alter the object’s own content. Instead, these methods return new objects. This guarantees that immutable objects themselves remain unchanged forever.

Summary

The dict data structure, which uses key-value pairs, is very useful in Python. Choosing immutable objects for keys is important, with strings being the most commonly used type of key.

Although a tuple is an immutable object, try adding (1, 2, 3) and (1, [2, 3]) to a dict or set and explain the results.

Python for beginner

Curriculum

Using dict and set

dict

set

Revisiting Immutable Objects

Summary

Leave a Reply Cancel reply

Modal title

`dict`

`set`