SAN FRANCISCO (06/26/2000) - This month's column is about user threads -- threads created with the thr_create(3T) (Solaris threads) or the pthread_create(3T) (POSIX threads) interfaces. The two-level model implemented in Sun Microsystems Inc. Solaris refers to user threads at one level and kernel threads at the layer below. In this column, references to threads apply to both Solaris and POSIX threads. I will point out areas specific to a particular implementation.
User threads aren't visible to the kernel dispatcher because they implement a user-thread priority scheme distinct from the scheduling classes and global priorities previously discussed (more on this in a bit).
Simply put, a thread is an independent unit of execution in a process. Threads within the same process share much of the process's state. Most important, all user threads within a process share the process's address space. This is advantageous when multiple threads of execution within the same process are context-switched on and off processors. Because a context switch to another thread within the same process doesn't require an address-space mapping change, context switches can happen with less kernel code overhead. They also provide a relatively fast method of sharing data among threads in the same process (aka interthread communication) because data in every mapped memory page is visible to all the threads. Threads also share the process's open files, environment, credentials, and signal handlers.
Each user thread has several unique components:
Hardware context (register information, program counter, stack pointer) Stack Priority Thread specific data Probably the most misunderstood (or confusing) aspect of the Solaris user-threads implementation is the notion of priorities and scheduling. The two-layer model in Solaris requires two levels of scheduling for unbound user threads, which are viewed by the threads library and aren't visible to the kernel. Within a multithreaded process, user threads have a priority scheme and a set of queues maintained by the threads library. The dispatch queues maintained by the kernel are for LWPs (lightweight processes), which are the resources a user thread needs to ultimately be scheduled on a processor.
To execute a user thread, you must link it to an available LWP. Thus, the scheduling of threads can be viewed as a two-step process. The first is the linking to an LWP, which happens in the threads library. Second, the LWP must be scheduled on a processor for execution. This level of scheduling is done in the kernel by the kernel dispatcher.
An LWP is not created every time a user thread is created. Nor is a user thread permanently bound to an LWP the first time the user thread's scheduling process links a thread to that LWP. Because user threads are scheduled on LWPs, a given LWP may execute more than one user thread during the lifetime of the process (not at the same time, of course).
The programmer has the option of creating an LWP when the user thread is created. There are two ways to accomplish this with Solaris threads. The THR_NEW_LWP flag can be set in the thr_create(3T) call, which will instruct the kernel to create a new LWP and add it to the pool of LWPs available for the process. This flag doesn't create a bound thread. The second option is to set the THR_BOUND flag in thr_create(3T), which instructs the kernel not only to create an LWP, but also to permanently bind the user thread to the LWP. The LWP will execute only the user thread bound to it.
If you're using POSIX threads, the contentionscope attribute must be set to PTHREAD_SCOPE_SYSTEM to create a bound POSIX thread. Bound threads have the advantage of not requiring a scheduling phase through the threads library; bound threads require only the kernel dispatcher for scheduling. The tradeoffs are system resources and overhead. The kernel incurs more overhead when an LWP is created along with the user thread. Each LWP requires, among other things, a kernel stack. Kernel-stack pages for LWPs are mapped to a specific address-space segment in the kernel called the segkp segment, a pageable segment of kernel memory.
The size of the LWP kernel stack and the maximum number of LWPs the kernel can support system-wide vary, depending on the version of Solaris you're running and on the hardware. On UltraSPARC (sun4u) systems, the LWP stack size is 16 KB on the 32-bit Solaris 7 and Solaris 8 systems (and older 32-bit-only kernels), and 24 KB in 64-bit kernels. On non-UltraSPARC systems, 12 KB is the per-LWP kernel-stack size.
The kernel segment from which kernel stacks are allocated (segkp) is limited to 512 MB on UltraSPARC systems for releases up to and including Solaris 7. Thus, a 32-bit UltraSPARC system can map a maximum of (512 MB / 16 KB) 32,768 LWPs. A 64-bit Solaris 7 system can map a maximum of (512 MB / 24 KB) 21,845 LWPs. Note that the kernel segkp segment's maximum size depends on the amount of physical memory installed in the system. A system requires at least 256 MB of RAM to get to the maximum of a 512-MB segkp segment.
In Solaris 8, the segkp kernel segment for a 64-bit kernel was increased to 2 GB (for systems with at least 1 GB of physical memory). This provides for 87,381 LWPs systemwide. (This increase was back ported to Solaris 7 in patch 106541-04, and it is the default in the May 1999 Solaris 7 update.) The limit on the number of user threads per process is also determined by address-space boundaries and stacks. In the case of user threads, each thread is given a 1-MB stack by default. (The programmer can specify a different stack size in the thr_create(3T) call if desired.) A 32-bit process has a maximum address-space size of 4 GB, with a segment of that address space available for thread stacks. With the default stack size, we can get to about 2,000 user threads in a process before we run out of address space. (I actually got to about 3,300 in one test.) On a 64-bit process the address-space size is considerably larger (16 TB, if I remember correctly). The number of user threads that can be created in a 64-bit address space is in the tens of thousands.
That's a wrap for now. Next month we take it down to the next layer: how user threads are prioritized, how the queues are organized, and what's changed in Solaris 8.
About the author
Jim Mauro is an area technology manager for Sun Microsystems in the Northeast.
He focuses on server systems, clusters, and high availability. Mauro has 18 years of industry experience working in educational services (he developed and delivered courses on Unix internals and administration) and software consulting.