Jeff Li

Be another Jeff

Processes Relation on Python Multiprocessing Module

In 2007, when I was taking the undergraduate operating system course, the textbook1 mentioned that a child process could possible be terminated on its parent’s exit. Years later, I came to the world of Linux and found that the design is OS-dependent because in Linux, the child process will keep running even though its parent is gone. But it will be adopted by the process 1 instead of becoming an orphan. Let’s use a small program2 to validate the behaviour.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>

int main(int argc, const char **argv)
{
    pid_t pid = fork();
    if (pid > 0) {
        printf("I am the parent of pid=%d\n", pid);
        sleep(10);
    }
    else  if (!pid) {
        printf("I am the child\n");
        for (int i=0; i<5; i++) {
            printf("loop %d\n", i);
            sleep(5);
        }
    } else if (pid == -1)
        perror ("fork");
    return 0;
}

In Linux, let’s start 2 shell sessions. In the first one, type

1
2
gcc --std=c99 fork.c
./a.out

In the other one, type ps -ef | grep a.out to check the status of the 2 processes. About 10s later, you should find that the parent is gone and the the child is still alive. But the parent of the child has become process 1.

Based on what we have observed, it can be concluded that

  • child will continue to run even though its parent is dead, and it will be adopted by the init process whose pid is 1.
  • parent will not wait for child’s termination if only fork primitive is used.

Now let’s think about another case. If you press ctrl + c in the shell immediately after running a.out, you could find that both parent and child are terminated. This seems to contradict what we have mentioned. Actually, when ctrl + c is typed, the SIGINT will be sent to the whole foreground process group instead of a single process3. When a child is forked by the parent, they will share the same process group id. So both the parent and child will receive the SIGINT signal and exit. You can change the child’s process group id to see what happens.

Multiple Processes in Python

There are multiple ways in the standard lib to implement multiple processes in Python, one is the fork primitive provided by the os package and another is the multiprocess module.

fork primitive

The os.fork function is a wrapper of the fork function in C. So its behaviour is similar. I won’t give too much detail about it. Here is a sample program. You can find that it behaves as the C version one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import os
import time

def child():
    print "process id of child is: %d" % os.getpid()
    time.sleep(10)

p = os.fork()

if p == 0:
    child()
else:
    print "process id of parent is: %d" % os.getpid()
    time.sleep(5)

multiprocess module

multiprocess is a high level module built upon the primitives such as fork, thus it is much more versatile. Besides forking child processes, it also provides features like daemonic process, shared variables. On this post, I will focus only on the process relation. Let’s start with a simple piece of code.

1
2
3
4
5
6
7
8
9
10
11
12
from multiprocessing import Process
import time
import os

def child():
    print "child process pid: %d" % os.getpid()
    time.sleep(10)

p = Process(target=child, args=())
p.start()

print "parent process pid: %d" % os.getpid()

You can find that the parent will always wait for the children to terminated even no join or wait are called. Besides, if the child process is set to be daemonic, instead of waiting, the parent will kill the child before it exits. The behaviour is documented here. So how does the module achieve that? Let’s navigate to the source code of class Process and check the implementation.

  • When the start method of the child is invoked, it will create a Popen object which will fork a new process and execute the child process by calling back the child’s _bootstrap method.
  • Meanwhile, the parent will maintain a list to save all its children.
  • The _bootstrap method imports the util module in which a _exit_function function is registered as a cleanup handler. The function will check all its children and terminate all the daemonic children but wait for those that are not.
  • daemonic is quit different from the daemon process in Linux system programming. In Python multiprocessing module, it is just a flag indicating how the child processes should be handled when parent exits.

Conclusion

Due to the GIL, multithreading is very limited in high concurrency scenario. Even in the gevent’s implementation of thread pool, the issue is a big pain. Thus multiprocessing is encouraged in (C)Python. This post can not cover all the aspects of multiprocessing in Python, but I hope it can help you understand the concurrency in Python.

Reference

  1. Operating Systems: Internals and Design Principles, 5th edition, by William Stalling

  2. Linux System Programming, 2nd edition, by Robert Love

  3. The Linux Programming Interface, by Michael Kerrisk

Comments