Related Processes in Linux
In 2007, when I was taking the undergraduate operating system course, the textbook1 mentioned that a child process could possible be terminated on its parent’s exit. Years later, I came to the world of Linux and found that the design is OS-dependent because in Linux, the child process will keep running even though its parent is gone. But it will be adopted by the process 1 instead of becoming an orphan. Let’s use a small program2 to validate the behaviour.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
In Linux, let’s start 2 shell sessions. In the first one, type
1 2 |
|
In the other one, type ps -ef | grep a.out
to check the status of the 2 processes. About 10s later, you should find that the parent is gone and the the child is still alive. But the parent of the child has become process 1.
Based on what we have observed, it can be concluded that
- child will continue to run even though its parent is dead, and it will be adopted by the
init
process whose pid is 1. - parent will not wait for child’s termination if only
fork
primitive is used.
Now let’s think about another case. If you press ctrl + c
in the shell immediately after running a.out
, you could find that both parent and child are terminated. This seems to contradict what we have mentioned. Actually, when ctrl + c
is typed, the SIGINT
will be sent to the whole foreground process group instead of a single process3. When a child is forked by the parent, they will share the same process group id. So both the parent and child will receive the SIGINT
signal and exit. You can change the child’s process group id to see what happens.
Multiple Processes in Python
There are multiple ways in the standard lib to implement multiple processes in Python, one is the fork
primitive provided by the os
package and another is the multiprocess
module.
fork primitive
The os.fork
function is a wrapper of the fork
function in C. So its behaviour is similar. I won’t give too much detail about it. Here is a sample program. You can find that it behaves as the C version one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
multiprocess module
multiprocess
is a high level module built upon the primitives such as fork
, thus it is much more versatile. Besides forking child processes, it also provides features like daemonic process, shared variables. On this post, I will focus only on the process relation. Let’s start with a simple piece of code.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
You can find that the parent will always wait for the children to terminated even no join
or wait
are called. Besides, if the child process is set to be daemonic, instead of waiting, the parent will kill the child before it exits. The behaviour is documented here. So how does the module achieve that? Let’s navigate to the source code of class Process and check the implementation.
- When the
start
method of the child is invoked, it will create a Popen object which will fork a new process and execute the child process by calling back the child’s _bootstrap method. - Meanwhile, the parent will maintain a list to save all its children.
- The _bootstrap method imports the
util
module in which a _exit_function function is registered as a cleanup handler. The function will check all its children and terminate all the daemonic children but wait for those that are not. daemonic
is quit different from thedaemon process
in Linux system programming. In Python multiprocessing module, it is just a flag indicating how the child processes should be handled when parent exits.
Conclusion
Due to the GIL, multithreading is very limited in high concurrency scenario. Even in the gevent’s implementation of thread pool, the issue is a big pain. Thus multiprocessing is encouraged in (C)Python. This post can not cover all the aspects of multiprocessing in Python, but I hope it can help you understand the concurrency in Python.