To implement multiprocessing in a Python program, we first need to understand relevant operating system knowledge.
Unix/Linux operating systems provide a fork() system call, which is quite special. A normal function call is invoked once and returns once, but fork() is called once and returns twice—this is because the operating system automatically copies the current process (called the parent process) to create a child process, then returns separately within both the parent and child processes.
The child process always returns 0, while the parent process returns the child process’s ID. The reason for this design is that a parent process can fork() many child processes, so the parent process needs to keep track of each child process’s ID, whereas a child process only needs to call getppid() to obtain its parent process’s ID.
Python’s os module encapsulates common system calls, including fork(), allowing us to easily create child processes in Python programs:
import os
print('Process (%s) start...' % os.getpid())
# Only works on Unix/Linux/macOS:
pid = os.fork()
if pid == 0:
print('I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid()))
else:
print('I (%s) just created a child process (%s).' % (os.getpid(), pid))
The output is as follows:
Process (876) start...
I (876) just created a child process (877).
I am child process (877) and my parent is 876.
Since Windows does not have the fork() call, the above code cannot run on Windows. However, macOS is based on the BSD (a type of Unix) kernel, so running it on macOS works without issues—we recommend learning Python on macOS!
With the fork() call, when a process receives a new task, it can copy itself to create a child process to handle the new task. A common example is the Apache server: the parent process listens on a port, and whenever a new HTTP request arrives, it fork()s a child process to handle that request.
If you plan to write a multi-process service program, Unix/Linux is undoubtedly the right choice. Since Windows lacks the fork() call, is it impossible to write multi-process Python programs on Windows?
As Python is cross-platform, it should naturally provide cross-platform multiprocessing support. The multiprocessing module is the cross-platform implementation of multiprocessing in Python.
The multiprocessing module provides a Process class to represent a process object. The following example demonstrates starting a child process and waiting for it to finish:
from multiprocessing import Process
import os
# Code to be executed by the child process
def run_proc(name):
print('Run child process %s (%s)...' % (name, os.getpid()))
if __name__=='__main__':
print('Parent process %s.' % os.getpid())
p = Process(target=run_proc, args=('test',))
print('Child process will start.')
p.start()
p.join()
print('Child process end.')
Execution result:
Parent process 928.
Child process will start.
Run child process test (929)...
Process end.
To create a child process, simply pass an executable function and its arguments to create a Process instance, then start it with the start() method. This makes process creation even simpler than using fork().
The join() method waits for the child process to complete before continuing execution, and is typically used for inter-process synchronization.
If you need to start a large number of child processes, you can create them in batches using a process pool:
from multiprocessing import Pool
import os, time, random
def long_time_task(name):
print('Run task %s (%s)...' % (name, os.getpid()))
start = time.time()
time.sleep(random.random() * 3)
end = time.time()
print('Task %s runs %0.2f seconds.' % (name, (end - start)))
if __name__=='__main__':
print('Parent process %s.' % os.getpid())
p = Pool(4)
for i in range(5):
p.apply_async(long_time_task, args=(i,))
print('Waiting for all subprocesses done...')
p.close()
p.join()
print('All subprocesses done.')
Execution result:
Parent process 669.
Waiting for all subprocesses done...
Run task 0 (671)...
Run task 1 (672)...
Run task 2 (673)...
Run task 3 (674)...
Task 2 runs 0.14 seconds.
Run task 4 (673)...
Task 1 runs 0.27 seconds.
Task 3 runs 0.86 seconds.
Task 0 runs 1.41 seconds.
Task 4 runs 1.91 seconds.
All subprocesses done.
Code Explanation:
Calling the join() method on a Pool object waits for all child processes to finish execution. You must call close() before join()—after calling close(), no new Process instances can be added to the pool.
Note the output: tasks 0, 1, 2, and 3 execute immediately, while task 4 waits until one of the previous tasks completes. This is because the default size of the Pool on my computer is 4, meaning a maximum of 4 processes can run concurrently. This is a deliberate limitation of Pool, not the operating system. If you change it to:
p = Pool(5)
you can run 5 processes concurrently.
Since the default size of Pool is the number of CPU cores, if you have an 8-core CPU, you need to submit at least 9 child processes to see the waiting effect described above.
In many cases, a child process is not part of the current program but an external process. After creating a child process, we often need to control its input and output.
The subprocess module allows us to start a child process conveniently and control its input and output.
The following example demonstrates how to run the command nslookup www.python.org in Python code, with the same effect as running it directly in the command line:
import subprocess
print('$ nslookup www.python.org')
r = subprocess.call(['nslookup', 'www.python.org'])
print('Exit code:', r)
Execution result:
$ nslookup www.python.org
Server: 192.168.19.4
Address: 192.168.19.4#53
Non-authoritative answer:
www.python.org canonical name = python.map.fastly.net.
Name: python.map.fastly.net
Address: 199.27.79.223
Exit code: 0
If the child process requires input, you can use the communicate() method to provide it:
import subprocess
print('$ nslookup')
p = subprocess.Popen(['nslookup'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = p.communicate(b'set q=mx\npython.org\nexit\n')
print(output.decode('utf-8'))
print('Exit code:', p.returncode)
The above code is equivalent to executing nslookup in the command line and manually entering:
set q=mx
python.org
exit
Execution result:
$ nslookup
Server: 192.168.19.4
Address: 192.168.19.4#53
Non-authoritative answer:
python.org mail exchanger = 50 mail.python.org.
Authoritative answers can be found from:
mail.python.org internet address = 82.94.164.166
mail.python.org has AAAA address 2001:888:2000:d::a6
Exit code: 0
Processes inevitably need to communicate with each other, and operating systems provide many mechanisms for inter-process communication (IPC). Python’s multiprocessing module wraps these low-level mechanisms and provides multiple ways to exchange data, such as Queue and Pipes.
Taking Queue as an example: create two child processes in the parent process—one writes data to the Queue, and the other reads data from it:
from multiprocessing import Process, Queue
import os, time, random
# Code executed by the data-writing process:
def write(q):
print('Process to write: %s' % os.getpid())
for value in ['A', 'B', 'C']:
print('Put %s to queue...' % value)
q.put(value)
time.sleep(random.random())
# Code executed by the data-reading process:
def read(q):
print('Process to read: %s' % os.getpid())
while True:
value = q.get(True)
print('Get %s from queue.' % value)
if __name__=='__main__':
# Parent process creates Queue and passes it to child processes:
q = Queue()
pw = Process(target=write, args=(q,))
pr = Process(target=read, args=(q,))
# Start the writing process:
pw.start()
# Start the reading process:
pr.start()
# Wait for pw to finish:
pw.join()
# The pr process runs in an infinite loop and cannot be waited on—terminate it forcefully:
pr.terminate()
Execution result:
Process to write: 50563
Put A to queue...
Process to read: 50564
Get A from queue.
Put B to queue...
Get B from queue.
Put C to queue...
Get C from queue.
On Unix/Linux, the multiprocessing module encapsulates the fork() call, so we don’t need to worry about its implementation details. Since Windows does not support fork(), multiprocessing “emulates” the fork() effect by serializing all Python objects in the parent process (via pickle) and passing them to the child process. Therefore, if multiprocessing fails on Windows, first check if the pickle serialization is the issue.
fork() system call.multiprocessing module.Queue and Pipes.#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
print("Process (%s) start..." % os.getpid())
# Only works on Unix/Linux/macOS:
pid = os.fork()
if pid == 0:
print("I am child process (%s) and my parent is %s." % (os.getpid(), os.getppid()))
else:
print("I (%s) just created a child process (%s)." % (os.getpid(), pid))#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from multiprocessing import Process
import os
def run_proc(name):
print("Run child process %s (%s)..." % (name, os.getpid()))
if __name__ == "__main__":
print("Parent process %s." % os.getpid())
p = Process(target=run_proc, args=("test",))
print("Child process will start.")
p.start()
p.join()
print("Child process end.")#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from multiprocessing import Pool
import os, time, random
def long_time_task(name):
print("Run task %s (%s)..." % (name, os.getpid()))
start = time.time()
time.sleep(random.random() * 3)
end = time.time()
print("Task %s runs %0.2f seconds." % (name, (end - start)))
if __name__ == "__main__":
print("Parent process %s." % os.getpid())
p = Pool(4)
for i in range(5):
p.apply_async(long_time_task, args=(i,))
print("Waiting for all subprocesses done...")
p.close()
p.join()
print("All subprocesses done.")#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import subprocess
print("$ nslookup www.python.org")
r = subprocess.call(["nslookup", "www.python.org"])
print("Exit code:", r)
print("$ nslookup")
p = subprocess.Popen(["nslookup"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = p.communicate(b"set q=mx\npython.org\nexit\n")
print(output.decode("utf-8"))
print("Exit code:", p.returncode)#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from multiprocessing import Process, Queue
import os, time, random
def write(q):
print("Process to write: %s" % os.getpid())
for value in ["A", "B", "C"]:
print("Put %s to queue..." % value)
q.put(value)
time.sleep(random.random())
def read(q):
print("Process to read: %s" % os.getpid())
while True:
value = q.get(True)
print("Get %s from queue." % value)
if __name__ == "__main__":
q = Queue()
pw = Process(target=write, args=(q,))
pr = Process(target=read, args=(q,))
pw.start()
pr.start()
pw.join()
pr.terminate()