[][src]Module sunrise_kernel::cpu_locals

CPU local storage

We want some statics to be cpu-local (e.g. CURRENT_THREAD). We could implement this fully in software, by having an area of memory that is replicated for every cpu core, where statics are indexes in this memory area, and provide getters and setters to access and modify the cpu-local statics.

However this is not ideal as it is not really optimized, and pretty tedious.

Instead we use the very common concept of Thread Local Storage (TLS), and apply it to cpu cores instead of threads, and let the compiler do all the hard work for us.

Usage

In the kernel you declare a cpu-local using the #[thread_local] attribute :

#[thread_local]
static MY_CPU_LOCAL: core::cell::Cell<u8> = core::cell::Cell::new(42);

and access it as if it was a regular static, only that each cpu core will have its own view of the static.

The compiler is responsible for generating code that will access the right address, provided we configured TLS correctly.

Early boot

Note that you can't access a cpu-local static before init_cpu_locals is called, because the cpu-local areas arent' initialized yet, and this will likely result to a cpu exception being raised, or UB.

This means you can't ever access cpu-locals in early boot. If your code might be called during early boot, we advise you to use ARE_CPU_LOCALS_INITIALIZED_YET to know if you're allowed to access your cpu-local static, and if not return an error of some kind.

Inner workings

We implement the TLS according to conventions laid out by Ulrich Drepper's paper on TLS which is followed by LLVM and most compilers.

Since we're running on i386, we're following variant II.

Each cpu core's gs segment points to a thread local memory area where cpu-locals statics live. Cpu-local statics are simply accessed through an offset from gs. Those regions can be found in CPU_LOCAL_REGIONS.

The linker is in charge of creating an ELF segment of type PT_TLS where an initialization image for cpu local regions can be found, and is meant to be copy-pasted for every thread we create cpu core we have.

Segmentation

Each core gets its own GDT. In each of these there is a KTls segment which points to this core's cpu-local area, and which is meant to be loaded into gs.

Because userspace might want to use Thread Local Storage too, and also needs gs to point to its thread local area (see set_thread_area), we swap the segment gs points to everytime we enter and leave the kernel in trap_gate_asm, from UTls_Elf to KTls and back.

TLS on x86 are really weird. It uses the variant II, where offsets must be subtracted from gs, even though segmentation only supports adding offsets. The only way to make them work is to have gs segment's limit be 0xffffffff, effectively spanning the whole address space, and when the cpu will add a "negative" (e.g. 0xfffffffc for -4) offset, it will treat it as an unsigned huge positive offset, which when added to gs's base will "wrap around" the address space, and effectively end up 4 bytes behind gs's base.

Illustration:

cpu backflip

dtv and __tls_get_addr

We're the kernel, and we don't do dynamic loading (no loadable kernel modules). Because of this, we know our TLS model will be static (either Initial Exec or Local Exec). Those models always access thread-locals directly via gs, and always short-circuit the dtv.

So we don't even bother allocating a dtv array at all. Neither do we define a __tls_get_addr function.

Structs

CpuLocalRegion

Represents an allocated cpu local region.

ThreadControlBlock

Elf TLS TCB

Statics

ARE_CPU_LOCALS_INITIALIZED_YET

Use this if your code might run in an early boot stage to know if you're allowed to access a cpu-local variable. Accessing one when this is false is UB.

CPU_LOCAL_REGIONS

Array of cpu local regions, copied from the initialization image in kernel's ELF.

Functions

get_cpu_locals_ptr_for_core

Address that should be put in KTls segment's base. The limit should be 0xffffffff.

init_cpu_locals

Initializes cpu locals during early boot stage.

tls_align_up

The round function, as defined in section 3.0: