The Solution--pthread_atfork(3THR)

Use pthread_atfork() to prevent deadlocks whenever you use the fork-one model.

#include <pthread.h>

int pthread_atfork(void (*prepare) (void), void (*parent) (void),
    void (*child) (void) );

The pthread_atfork() function declares fork() handlers that are called before and after fork() in the context of the thread that called fork().

The prepare handler is called before fork() starts.
The parent handler is called after fork() returns in the parent.
The child handler is called after fork() returns in the child.

Any one of these can be set to NULL. The order in which successive calls to pthread_atfork() are made is significant.

For example, a prepare handler could acquire all the mutexes needed, and then the parent and child handlers could release them. This ensures that all the relevant locks are held by the thread that calls the fork function before the process is forked, preventing the deadlock in the child.

Using the fork all model avoids the deadlock problem described in "The Fork-One Safety Problem and Solution".

Return Values

pthread_atfork() returns a zero when it completes successfully. Any other return value indicates that an error occurred. If the following condition is detected, pthread_atfork(3THR) fails and returns the corresponding value.

ENOMEM

Insufficient table space exists to record the fork handler addresses.

The Fork-All Model

The Solaris fork(2) function duplicates the address space and all the threads (and LWPs) in the child. This is useful, for example, when the child process never calls exec(2) but does use its copy of the parent address space. The fork-all functionality is not available in POSIX threads.

Note that when one thread in a process calls Solaris fork(2), threads that are blocked in an interruptible system call return EINTR.

Also, be careful not to create locks that are held by both the parent and child processes. This can happen when locks are allocated in memory that is sharable (that is use mmap() with the MAP_SHARED flag). Note that this is not a problem if the fork-one model is used.

Choosing the Right Fork

You determine whether fork() has a "fork-all" or a "fork-one" semantic in your application by linking with the appropriate library. Linking with -lthread gives you the "fork-all" semantic for fork(), and linking with -lpthread gives the "fork-one" semantic for fork() (see Figure 7-1 for an explanation of compiling options).

Cautions for Any Fork

Be careful when using global state after a call to any fork() function.

For example, when one thread reads a file serially and another thread in the process successfully calls one of the forks, each process then contains a thread that is reading the file. Because the seek pointer for a file descriptor is shared after a fork(), the thread in the parent gets some data while the thread in the child gets the other. This introduces gaps in the sequential read accesses.

Process Creation--exec(2) and exit(2) Issues

Both the exec(2) and exit(2) system calls work as they do in single-threaded processes except that they destroy all the threads in the address space. Both calls block until all the execution resources (and so all active threads) are destroyed.

When exec() rebuilds the process, it creates a single lightweight process (LWP) . The process startup code builds the initial thread. As usual, if the initial thread returns, it calls exit() and the process is destroyed.

When all the threads in a process exit, the process exits. A call to any exec() function from a process with more than one thread terminates all threads, and loads and executes the new executable image. No destructor functions are called.

Timers, Alarms, and Profiling

The "End of Life" announcements for per-LWP timers (see timer_create(3RT)) and per-thread alarms (see alarm(2) or setitimer(2)) were made in the Solaris 2.5 release. Both features are now replaced with the per-process variants described in this section.

Originally, each LWP had a unique realtime interval timer and alarm that a thread bound to the LWP could use. The timer or alarm delivered one signal to the thread when the timer or alarm expired.

Each LWP also had a virtual time or profile interval timer that a thread bound to the LWP could use. When the interval timer expired, either SIGVTALRM or SIGPROF, as appropriate, was sent to the LWP that owned the interval timer.

Per-LWP POSIX Timers

In the Solaris 2.3 and 2.4 releases, the timer_create(3RT) function returned a timer object with a timer ID meaningful only within the calling LWP and with expiration signals delivered to that LWP. Because of this, the only threads that could use the POSIX timer facility were bound threads.

Even with this restricted use, POSIX timers in the Solaris 2.3 and 2.4 releases for multithreaded applications were unreliable about masking the resulting signals and delivering the associated value from the sigvent structure.

Beginning with the Solaris 2.5 release, an application that is compiled defining the macro _POSIX_PER_PROCESS_TIMERS, or with a value greater that 199506L for the symbol _POSIX_C_SOURCE, can create per-process timers.

Effective with the Solaris 9 Operating Environment, all timers are per-process except for the virtual time and profile interval timers (see setitimer(2) for ITIMER_VIRTUAL and ITIMER_PROF), which remain per-LWP.

The timer IDs of per-process timers are usable from any LWP, and the expiration signals are generated for the process rather than directed to a specific LWP.

The per-process timers are deleted only by timer_delete(3RT) or when the process terminates.

Per-Thread Alarms

In the Solaris Operating Environment 2.3 and 2.4 releases, a call to alarm(2) or setitimer(2) was meaningful only within the calling LWP. Such timers were deleted automatically when the creating LWP terminated. Because of this, the only threads that could use alarm() or setitimer() were bound threads.

Even with this restricted use, alarm() and setitimer() timers in Solaris Operating Environment 2.3 and 2.4 multithreaded applications were unreliable about masking the signals from the bound thread that issued these calls. When such masking was not required, then these two system calls worked reliably from bound threads.

With the Solaris Operating Environment 2.5 release, an application linking with -lpthread (POSIX) threads got per-process delivery of SIGALRM when calling alarm(). The SIGALRM generated by alarm() is generated for the process rather than directed to a specific LWP. Also, the alarm is reset when the process terminates.

Applications compiled with a release before the Solaris Operating Environment 2.5 release, or not linked with -lpthread, will continue to see a per-LWP delivery of signals generated by alarm() and setitimer()

Effective with the Solaris 9 Operating Environment, calls to alarm() or to setitimer(ITIMER_REAL) will cause the resulting SIGALRM signal to be sent to the process.

Profiling

In Solaris releases prior to 2.6, calling profil() in a multithreaded program would impact only the calling LWP; the profile state was not inherited at LWP creation time. To profile a multithreaded program with a global profile buffer, each thread needed to issue a call to profil() at threads start-up time, and each thread had to be a bound thread. This was cumbersome and did not easily support dynamically turning profiling on and off. In Solaris 2.6 and later releases, the profil() system call for multithreaded processes has global impact--that is, a call to profil() impacts all LWPs/threads in a process. This may cause applications that depend on the previous per-LWP semantic to break, but it is expected to improve multithreaded programs that wish to turn profiling on and off dynamically at runtime.

Nonlocal Goto--setjmp(3C) and longjmp(3C)

The scope of setjmp() and longjmp() is limited to one thread, which is fine most of the time. However, this does mean that a thread that handles a signal can longjmp() only when setjmp() is performed in the same thread.

Resource Limits

Resource limits are set on the entire process and are determined by adding the resource use of all threads in the process. When a soft resource limit is exceeded, the offending thread is sent the appropriate signal. The sum of the resources used in the process is available through getrusage(3C).

LWPs and Scheduling Classes

The Solaris kernel has three classes of scheduling. The highest-priority scheduling class is Realtime (RT). The middle-priority scheduling class is system. The system class cannot be applied to a user process. The lowest-priority scheduling class is timeshare (TS), which is also the default class.

Scheduling class is maintained for each LWP. When a process is created, the initial LWP inherits the scheduling class and priority of the creating LWP in the parent process. As more LWPs are created to run unbound threads, they also inherit this scheduling class and priority.

Threads have the scheduling class and priority of their underlying LWPs. Each LWP in a process can have a unique scheduling class and priority that is visible to the kernel. If a thread is bound, it will always be associated with the same LWP.

Thread priorities regulate contention for synchronization objects. By default LWPs are in the timesharing class. For compute-bound multithreading, thread priorities are not very useful. For multithreaded applications that do a lot of synchronization using the MT libraries, thread priorities become more meaningful.

The scheduling class is set by priocntl(2). How you specify the first two arguments determines whether just the calling LWP or all the LWPs of one or more processes are affected. The third argument of priocntl() is the command, which can be one of the following.

PC_GETCID. Get the class ID and class attributes for a specific class.

PC_GETCLINFO. Get the class name and class attributes for a specific class.

PC_GETPARMS. Get the class identifier and the class-specific scheduling parameters of a process, an LWP with a process, or a group of processes.

PC_SETPARMS. Set the class identifier and the class-specific scheduling parameters of a process, an LWP with a process, or a group of processes.

Note that priocntl() affects the scheduling of the LWP associated with the calling thread. For unbound threads, the calling thread is not guaranteed to be associated with the affected LWP after the call to priocntl() returns.


5. Programming With the Operating Environment Process Creation--Forking Issues The Fork-One Model The Fork-One Safety Problem and Solution