During the execution of a program, all variables reside in memory. For example, define a dictionary:
d = dict(name='Bob', age=20, score=88)
You can modify variables at any time—for instance, changing the name to ‘Bill’. However, once the program terminates, the operating system reclaims all memory occupied by the variables. If the modified ‘Bill’ is not stored on disk, the variable will be reinitialized to ‘Bob’ the next time the program runs.
The process of converting variables from memory into a storable or transferable format is called serialization (referred to as pickling in Python; other names in different languages include serialization, marshalling, flattening, etc.—all meaning the same thing).
After serialization, the serialized content can be written to disk or transmitted to other machines over a network.
Conversely, reading variable content from a serialized object back into memory is called deserialization (i.e., unpickling).
Python provides the pickle module to implement serialization.
First, let’s try serializing an object and writing it to a file:
>>> import pickle
>>> d = dict(name='Bob', age=20, score=88)
>>> pickle.dumps(d)
b'\x80\x03}q\x00(X\x03\x00\x00\x00ageq\x01K\x14X\x05\x00\x00\x00scoreq\x02KXX\x04\x00\x00\x00nameq\x03X\x03\x00\x00\x00Bobq\x04u.'
The pickle.dumps() method serializes any object into a bytes object, which can then be written to a file. Alternatively, the pickle.dump() method directly serializes an object and writes it to a file-like Object:
>>> f = open('dump.txt', 'wb')
>>> pickle.dump(d, f)
>>> f.close()
If you check the written dump.txt file, it contains a jumble of characters—these are the internal object details saved by Python.
To read the object from disk back into memory: you can first read the content into a bytes object and use pickle.loads() to deserialize it, or directly use pickle.load() to deserialize from a file-like Object. Let’s open another Python command line to deserialize the object we just saved:
>>> f = open('dump.txt', 'rb')
>>> d = pickle.load(f)
>>> f.close()
>>> d
{'age': 20, 'score': 88, 'name': 'Bob'}
The variable content is back!
Of course, this variable is a completely independent object from the original one—they only have the same content.
Like all language-specific serialization mechanisms, Pickle has limitations: it only works with Python, and different Python versions may be incompatible with each other. Therefore, Pickle should only be used to save non-critical data where failed deserialization is acceptable.
If we need to transfer objects between different programming languages, we must serialize them into a standard format (e.g., XML). However, a better approach is to serialize to JSON, as JSON is represented as a string that can be read by all languages, easily stored on disk, or transmitted over a network. JSON is not only a standard format but also faster than XML, and can be directly read in web pages—making it extremely convenient.
A JSON object is a standard object in the JavaScript language. The mapping between JSON types and Python’s built-in data types is as follows:
| JSON Type | Python Type |
|---|---|
{} | dict |
[] | list |
"string" | str |
1234.56 | int/float |
true/false | True/False |
null | None |
Python’s built-in json module provides comprehensive conversion between Python objects and JSON format. Let’s first see how to convert a Python object to JSON:
>>> import json
>>> d = dict(name='Bob', age=20, score=88)
>>> json.dumps(d)
'{"age": 20, "score": 88, "name": "Bob"}'
The dumps() method returns a str containing standard JSON. Similarly, the dump() method can directly write JSON to a file-like Object.
To deserialize JSON back into a Python object, use loads() (deserializes a JSON string) or the corresponding load() method (reads a string from a file-like Object and deserializes it):
>>> json_str = '{"age": 20, "score": 88, "name": "Bob"}'
>>> json.loads(json_str)
{'age': 20, 'score': 88, 'name': 'Bob'}
Since the JSON standard mandates UTF-8 encoding, we can always correctly convert between Python’s str and JSON strings.
Python’s dict objects can be directly serialized to JSON {}. However, in many cases, we prefer to use classes to represent objects—e.g., defining a Student class and then serializing it:
import json
class Student(object):
def __init__(self, name, age, score):
self.name = name
self.age = age
self.score = score
s = Student('Bob', 20, 88)
print(json.dumps(s))
Running this code results in an unforgiving TypeError:
Traceback (most recent call last):
...
TypeError: <__main__.Student object at 0x10603cc50> is not JSON serializable
The error occurs because a Student object is not JSON-serializable by default.
It would be unreasonable if even class instances couldn’t be serialized to JSON!
Don’t worry—let’s examine the parameter list of the dumps() method. Besides the mandatory obj parameter, dumps() provides numerous optional parameters:
https://docs.python.org/3/library/json.html#json.dumps
These optional parameters allow us to customize JSON serialization. The earlier code failed to serialize the Student instance because, by default, dumps() doesn’t know how to convert a Student instance into a JSON {} object.
The optional default parameter converts any object into a JSON-serializable object. We just need to write a conversion function for Student and pass it in:
def student2dict(std):
return {
'name': std.name,
'age': std.age,
'score': std.score
}
Now the Student instance is first converted to a dict by student2dict(), then successfully serialized to JSON:
>>> print(json.dumps(s, default=student2dict))
{"age": 20, "name": "Bob", "score": 88}
However, this won’t work for a Teacher class instance next time. We can take a shortcut to convert any class instance to a dict:
print(json.dumps(s, default=lambda obj: obj.__dict__))
Most class instances have a __dict__ attribute—a dict that stores instance variables (a few exceptions exist, such as classes with __slots__ defined).
Similarly, to deserialize JSON into a Student instance: the loads() method first converts JSON to a dict, then an object_hook function (passed as a parameter) converts the dict to a Student instance:
def dict2student(d):
return Student(d['name'], d['age'], d['score'])
The output is as follows:
>>> json_str = '{"age": 20, "score": 88, "name": "Bob"}'
>>> print(json.loads(json_str, object_hook=dict2student))
<__main__.Student object at 0x10cd3c190>
The printed result is the deserialized Student instance.
When serializing Chinese characters to JSON, the json.dumps() method provides an ensure_ascii parameter. Observe its impact on the result:
import json
obj = dict(name='Michael Liao', age=20)
s = json.dumps(obj, ensure_ascii=True)
print(s)
Python’s language-specific serialization module is pickle, but for more universal, web-standard serialization, use the json module.
The dumps() and loads() functions in the json module are excellent examples of well-designed interfaces. When using them, only a mandatory parameter is required. However, when the default serialization/deserialization mechanism doesn’t meet requirements, additional parameters can be passed to customize the rules—achieving both simplicity of use and full extensibility/flexibility.
use_pickle.py#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import pickle
d = dict(name="Bob", age=20, score=88)
data = pickle.dumps(d)
print(data)
reborn = pickle.loads(data)
print(reborn)use_json.py#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import json
d = dict(name="Bob", age=20, score=88)
data = json.dumps(d)
print("JSON Data is a str:", data)
reborn = json.loads(data)
print(reborn)
class Student(object):
def __init__(self, name, age, score):
self.name = name
self.age = age
self.score = score
def __str__(self):
return "Student object (%s, %s, %s)" % (self.name, self.age, self.score)
s = Student("Bob", 20, 88)
std_data = json.dumps(s, default=lambda obj: obj.__dict__)
print("Dump Student:", std_data)
rebuild = json.loads(std_data, object_hook=lambda d: Student(d["name"], d["age"], d["score"]))
print(rebuild)