You're reading for free via Monethic.io's Friend Link. Become a member to access the best of Medium.
Member-only story
DYLD — Do You Like Death? (XI)
The lifecycle of a Dynamic Loader from its creation to its termination.
This is the eleventh and the last article in the series about debugging Dyld-1122 and analyzing its source code. We will learn how Dyld load dependent dylibs, bind them all together, return the address of the main()
, calls it and finally terminates.
Please note that this analysis may contain some errors as I am still learning and working on it alone. No one has checked it for mistakes. Please let me know in the comments or contact me through my social media if you find anything.
Let’s go!
WORKING MAP
As last time, we begin our journey by decompiling the Dyld using a Hopper.
hopper -e '/usr/lib/dyld'
We are in the dyld`start
analysing the Memory Manager. In the fourth article, I introduced pseudo-code, which you can see below:

Based on this, we finished creating the allocator
later used as a memory pool for setting the Global state
of the process, which consists of two types of states: fixed — ProcessConfig
, and dynamic — RuntimeState
.
The state
object of RuntimeState
class created in the last episode is like an API for querying process-related data (threads or loaded Mach-Os).

In the last episode, we analyzed the ExternallyViewableState
which holds information about the loaded images. When initialized, it only stores info about Dyld and the executable image, but now we are going to run the prepare
function that will load the rest of the dependent images (dylibs):

Dyld GitHub repository:
- Start:
prepare
in dyld-1122.1 — dyldMain.cpp#1252 - End:
exit
in dyld-1122.1 — dyldMain.cpp#1272
LLDB breakpoints:
# Start - dyld`start+1828
settings set target.env-vars DYLD_IN_CACHE=0
br set -n start -s dyld -R 1828
# Just before call main - dyld`start+2356
br set -n start -s dyld -R 2356
# Just before call exit - dyld`start+2432
br set -n start -s dyld -R 2432
This is the last article at the moment. However, I will maybe write something about things I ommited in these series in the future.
START — prepare
Before executing the prepare
function, we set some values in registers:

The x1
stores the pointer to the beginning of the Dyld image, which is our second argument — MachOAnalyzer
, while the first argument APIs
is stored in the x0
register, and we can double-check it by inspecting instructions:

When we step into the prepare function, we may observe it contains twice as much of the code (4080 instructions 😰) that we saw in dyld::start
:

Just to recap, here is the line from the code repository we are running:
The source code repository contains the corresponding code between lines 482–944. Based on the comment, this is the last straight in our Dyld review:

We can omit the code between lines 484–516 as it is only compiled in the context of EnclaveKit initialization. We should start the analysis at line 517:
We can also double-check our assumption in the debugger, as we can see the first instructions we run into are kdebug_trace_dyld_enabled
:

Here, we will also not go into details of the kdebug system. I will go back to it in another series about XNU debugging. However, it is usually off, so we will perform a jump here to line 524 where we execute the simulator check:
In the lldb, after returning from the kdebug_trace_dyld_enabled
with value 0
in x0
register, there is a CBZ
instruction, and we are jumping to +132
:

Then another jump is performed to 204 to the isSimulatorPlatform
:

This function checks if we are running our executable in the context of any of the simulator platforms shown below:

As we are not running the process in the context of the simulator platform, we will jump over most of the code to line 563
. Through this jump, we will also check if the program we are executing is built for the simulator in line 538
and if logging of environment variables is enabled in the line 554
:
If we run in the simulator context, it ensures the program is correctly configured to run on a simulator platform with the appropriate
DYLD_ROOT_PATH
.
state.initializeClosureMode()
It initializes in RuntimeState to handle PrebuiltLoaders from the dyld cache. The function logic is well explained in PrebuiltLoaderSet_Policy.md.
PrebuiltLoaders are optimized representations of dynamic libraries used by Dyld to speed up application launch times. If the application is run for the first time, there is also JustInTimeLoader.

The above document about the PrebuilLoaderSet Policy is very informative and further explains how it worked with the dyld3 and dyld2 versions:

However, now, with the current version of dyld4, we always have only two options to rely on, and they are JustInTimeLoaders or PrebuiltLoaders:

PrebuiltLoaderSet for dyld4 is like Dyld Closure for dyld3.
The dyld4 policy point summarizes when Dyld Closures are not used and explains how the DYLD_USE_CLOSURES
works for the current version of dyld4:

There is also one more constraint for PrebuildLoaderSet:

Regarding to DYLD_USE_CLOSURES
there is a comment in the code:

There is also a tool for creating Dyld Closures called dyld_closure_util
. Its source code is in the repository. However, it is not so trivial to compile it on a noninternal Apple environment, and I gave up on it:

The initializeClosureMode
is called from the state
object because the RuntimeState
contains a Loader
object that tracks each loaded Mach-O:

The PrebuiltLoader
and JustInTimeLoader
are subclasses of Loader
:

Its code can be found in the repository between lines 90–355.

We can also read about Loaders
in another place in the documentation:

Further about PrebuildLoader
:

Finally, about the JustInTimeLoader
:

The code responsible for all the stuff is between lines 2670–2842:

It starts with the initialization of some variables and then validating the header of the PrebuiltLoaderSet
from the Dyld in cache in line 2677:
The source code of the validHeader
logic is shown below. In our case, it returns a true
value:
The hasValidMagic
checks if PrebuiltLoaderSet->magic
is equal to kMagic
:

We can find the kmagic
in the source code repository or by reading the decompiled code while debugging in the lldb (0x9a66106073703464
):

After checking if the magic is valid, we execute dontUsePrebuiltForApp
:

This function determines whether prebuilt loaders should be disabled based on Dyld Environment Variables and executable load commands:

After this check, we fall into another else if
where we search the cache for PrebuiltLoader for the program using findLaunchLoaderSet
:
If the cachePBLS
was not found, and the main executable path starts with /System/
, it attempts to find a PrebuiltLoaderSet
using the cd-hash:
As we are not running the program from /System/
directory, we are not executing code in lines 2707–2716 and move forward to 2717:
The hasLaunchLoaderSetWithCDHash
function is a simple wrapper that calls findLaunchLoaderSetWithCDHash
and checks if it returns a non-null pointer:

The findLaunchLoaderSetWithCDHash
function constructs a path using the provided cdHashString
, ensures it is neither null, nor too long to prevent buffer overflow and then attempts to find a prebuilt loader set corresponding to this path using findLaunchLoaderSet
:

# Example path after executing DyldSharedCache::hasLaunchLoaderSetWithCDHash
/cdhash/3302ae16a5eda1cf7daab75ce63b94274674ec8b
If PrebuildLoaderSet
was found isOsProgram
is set to True
and we execute the allowOsProgramsToSaveUpdatedClosures
. Otherwise, we are dealing with 3rd party app and execute allowNonOsProgramsToSaveUpdatedClosures
:
The allowOsProgramsToSaveUpdatedClosures
block local closure files from overriding closures in the dyld cache:

The allowNonOsProgramsToSaveUpdatedClosures
blocks 3rd party apps from saving closures depending on several conditions:

- Saving is disallowed on macOS for iPad apps running on Apple Silicon macOS when the executable does not have a CDHash (unsigned).
- Saving is allowed on iOS, tvOS, and watchOS platforms.
In our case, a closure will not be saved — the 3rd party app on macOS.
Then, there is a code block related to DYLD_USE_CLOSURES
logic:

After that, there is code related to loading closure from disk, but in the case of macOS — it is only for system applications. I will not analyze it here.
To summarize, the
initializeClosureMode
ensures the dyld can use prebuild closures when available and valid for dynamic libraries to optimize application startup or fall back to just-in-time loading, which builds such closures that will be used for concurrent program startup. In case of 3rd pary apps on macOS this code ensure the closure will not be saved on the disk.
Just-in-time
We are returning from initializeClosureMode
. The following lines, 564–568, process a set of prebuilt loaders if they were initialized and retrieve the main loader (at index 0). Then, pre-allocate memory for all images.
There is no mainSet
for us. This code will not run for 3rd part apps on macOS.

The condition that follows will be executed, as there is no mainLoader
if there is no mainSet
, so the mainLoader == nullptr
is true
:

The reserve
function here comes from Linker Standard Library. The argument to reserve
specifies the number of elements, not the number of bytes. So, it is preparing space for 512 elements of state.loaded
type.

The function lsl::bit_ceil(newCapacity)
is used to find the smallest power of two that is greater than or equal to the given newCapacity
.

The state.loaded
is a container of pointers to Loader
objects, and it is 8 bytes wide. So this allocate 512*8 == 4096
bytes using reserveExact
:

After this allocation, we have Diagnostics buildDiag
(line 573):

It looks like this zero-out the memory we just allocated at x0+0x270
:


After all these preparations, we are making JIT Loader. The function computes the slice offset, checks if the binary file exists, creates a loader instance based on the provided parameters, and returns a pointer to it:
A slice here is a single architecture Mach-O from Fat binary mapped to memory within the Loader::getOnDiskBinarySliceOffset
function.
The core functionality here lies inside the JustInTimeLoader::make
, which is too long to paste here. Here are some key points what function does:
- Calculates the
size
needed for theJustInTimeLoader
object - Allocates memory using
state.persistentAllocator.malloc
- Creates a new
JustInTimeLoader
object using placement new - Adds the created loader to the runtime state
- Returns the pointer to the created
JustInTimeLoader
object.
After initializing the JIT Loader, we are setting it within the RuntimeState
and notifying the debugger about it:

The setMainLoader
function primarily updates the mainExecutableLoader
field in the RuntimeState
object with the provided loader pointer:
Additionally, it performs logging related to the main executable, such as logging loaded libraries and segment mappings, if logging is enabled:
So overall, we initialized here JIT Loader and set it in RuntimeStates. It will be later used for loading dependent libraries and applying fixups.
Image loading
The STACK_ALLOC_OVERFLOW_SAFE_ARRAY
function is at the beginning of the images (dylibs) loading. It allocates a stack array to hold pointers to Loader
objects, with an initial capacity of 16
. This array will track all images.

In line 591, we are adding the mainLoader
to the topLevelLoaders
array, and from line 592 to 630, we are first loading inserted libraries:
Then, we set some properties and started to recursively load everything needed by the main executable and inserted dylibs (640–680):
The core functionality here lies within loadDependents
function.
We can also observe how the notifyDebuggerLoad
works in lldb by inspecting the image list
before and after the function was executed:


There is also notifyDtrace
. Dylibs can have DOF sections that contain info about static user probes for dtrace
. It finds and registers any such sections:
DOF stands for DTrace Object Format.

Finally, we have code that identifies and registers non-cached dylib loaders to a state
permanent list using addPermamentRanges
:

- Using a stack-allocated array (
STACK_ALLOC_ARRAY
) is efficient regarding memory allocation and deallocation since it avoids heap allocation. - By identifying loaders not part of the dyld cache and adding them to permanent ranges, the system ensures they are retained in memory.
Overall, we loaded all images necessary to run the app in this step.
Fixups
Before we do fixups, there is a code for setting up a weakDefMap
for a runtime state, a mechanism used to manage and resolve weak symbols in dynamically loaded libraries (dylibs) before any actual binding occurs:

Before handling fixups, buildInterposingTables
sets up tables for interposing functions in non-cached dylibs:
Interposing allows a program to override existing functions in shared libraries with custom implementations. This can be blocked by AMFI.

After that, applying fixups begins. The code responsible for that first starts a ScopedTimer
to measure the time taken for applying fixups and acquire a DyldCacheDataConstLazyScopedWriter
for the dyld cache data patching:

Then, we handle strong overrides of weak definitions with a function handleStrongWeakDefOverrides
that identifies dylibs with weak definitions, searches for strong overrides in those dylibs, and applies fixups:
A strong symbol is just a symbol without any additional definition or using the default attribute for visibility:
int strong_symbol = 42;
int strong_symbol __attribute__((visibility("default"))) = 42;
While a weak symbol can be defined like this:
int weak_symbol __attribute__((weak)) = 42;
After handling strong overrides over weak symbols, we iterate over each loaded loader to apply fixups using applyFixups
<- (core logic here). In case of any error during fixups, halt execution and report the fixup error.
There is also applyCachePatches
function for handling any patches in dyld cache (only if dylib overrides something there):

There is also something called singleton patching in Dyld Shared Cache performed by a function doSingletonPatching
:

From the code, it looks like it only applies to the Obj-C code. Here is the structure:

At last, we applyInterposingToDyldCache
if used:

However, it does not count into the timing of applying fixups. So, we can conclude that singleton patching is the last thing in the fixup process:

After all these fixups, we can say that our executable dependant libraries are loaded and symbols are resolved and relocated so it is ready to go.
Libdyld.dylib
The lines between 734–761 do not concern us, as they apply to PrebuiltLoaders
and we are using JustInTimeLoader
:
Similarly, lines 763–796 as they apply to the kdebug
which is off. In case it is on, it notifies kdebug on each image load:
So the first thing we do in reality is check if libdyld.dyld
exist, which was was set in JustInTimeLoader::applyFixups
.

After that we are wiring up the libdyld.dylib
to dyld. The code first get the load address of libdyld by calling loadAddress
on the libdyldLoader
(801).
Then find __dyld4
section within __DATA
segment of libdyld.dylib
(803) and if it is not found in the __DATA
segment, it search the __AUTH
(806).
If it cannot be found, the loading is halted:
Then, we establish a connection between the libdyld.dylib
and the runtime state of the program by providing access to the global APIs through the libdyld4Section
:

We also allow external code | components to access information about all loaded images in the process by providing a pointer to the allImageInfos
field from libdyld4Section
using storeProcessInfoPointer
:
Next, we initialize program variables (vars
) in the runtime state (state
) based on information retrieved from libdyld.dylib
:
There is one thing I do not understand. While debugging, I could not find the C code in the repository corresponding to the below instructions.
__chkstk_darwin
After setting state.vars
, we may observe blraa x16, x17
instruction:

This jumps to the below code, which branches to __chkstk_darwin
:

Going further, we branch to __chkstk_darwin_probe
:

The code below shows the disassembled __chkstk_darwin_probe
. While debugging, this executes instructions +0, +4, +8
and then jumps to +32
:

+0
: Compares the value in the registerx9
(stack size?) with0x1000
(4096), shifted left by 12 bits (resulting in0x1000
, equivalent to 4096). This check likely verifies if the stack size is at least 4096 bytes.+4
: Moves the stack pointer (sp
) value into registerx10
.+8
: Branches low (b.lo) to instruction +32 if the comparison results in instruction+0
indicates that the stack size is less than 4096 bytes.+32:
Subtracts the value inx9
from the value inx10
.+36
: Loads a byte from memory at the new address pointed to byx10
.
So, it is like probing (checking read access using +36
instruction) to see if we can access the stack at [x10]
which holds this value:

This value holds
state.vars = &libdyld4Section->defaultVars
so it seems like it checks if the variables are readable?
partitionDelayLoads
Moreover, after executing __chkstk_darwin
we run into the below function. Unlike __chstk_darwin
, I could find its definition in the Dyld source code repository. However, I could not find where it is called in the dyldMain.cpp
:

The partitionDelayLoads
code can be seen in DyldRuntimeState.cpp
between lines 525–566. Its main purpose is to get the Loaders marked as delay-init, which can now be initiated.
- If a loader in
delayLoaded
is no longer delayed, it is moved toloaded
. - If a loader in
loaded
is now delayed, it is moved todelayLoaded
. - The
undelayedLoaders
vector is populated with loaders initially marked for the delay but is now not delayed.
This function ensures that the dylibs are initialized in the correct order
DYLD_JUST_BUILD_CLOSURE
Before moving forward, a block of code is not executed in normal circumstances on macOS for 3rd party apps, and it is shown below:
It handles the creation and serialization of prebuilt loader sets.

After that, there is a check for DYLD_JUST_BUILD_CLOSURE
variable used for prewarming. If it is used, here the execution will be halted:

I must return to this piece of code, which is very interesting because of serialization and saving closure mechanisms.
I skipped over some executed code here but changed nothing for us:
Prepare main
The last thing we do in prepare
function is to prepare the program's entry point. The logic here is to decide whether to use LC_MAIN
or LC_THREAD
:
The getEntry
can be seen below. It just iterates over Load Commands to check if it is using LC_MAIN
or LC_UNIXTHREAD
and return the offset:

Line 928 converts the base address of the main executable to an integer, adds an offset to it, and then converts the result back to a pointer.
Let’s say mainExecutable
points to the base address 0x100000000
and entryOffset
is 0x2000
. The operation would be:
- Cast
0x100000000
touintptr_t
. - Add
0x2000
, resulting in0x100002000
. - Cast the result back to
void*
.
So,
result
would be a pointer to0x100002000
.
Prepare epilog
The last thing we do here is clean up to avoid resource leaks:

After returning from prepare
, we are also freeing EphemeralAllocator
:

The code ensures any resources allocatedare properly cleaned before main() call.
The program is born!
Finally, we finished the prepare
function, and this was also the last step in the work()
code block executed in the Memory Manager:

The next thing is to call appMain()
, which returns the exit code after finishing main()
of our executable and store it in result
variable.
However, before doing that, we remember that we were executing work()
block, and we must call memoryManager->restorePreviousState
first:
The restorePreviousState
restore the previous write protection state while ensuring data integrity using pointer authentication (PAC):
There is also thread protection using thread protection restrictions (TPRO):

Finally, in dyld`start +2356
we call blraaz x20
, which is our main()
:

When we step in, we can observe our program's decompiled code:

The
+84 ret
instruction will return todyld`start
with exit code inx0
.
The program is dead!
We are still in the context of Dyld, when our executable finishes its execution. The exit code is passed from x0
to x20
register:

There is a check at the end if we are running in the simulator platform context using isSimulatorPlatform
:

If that is the case the _exit
is called. Otherwise, usually (and this is our case) we are calling exit
from libSystemHelpers
:

This further calls
libsystem_c.dylib`exit
which calls__exit
fromlibsystem_kernel.dylib
and that’s it. We are done with Dyld ^^.
Final words
It was the last article about Dyld at the moment. I plan to visit some barely mentioned places in this series, but I do not plan to write about them.
After reviewing all the code mentioned here, I can honestly say that I have gained much knowledge about Dyld and macOS overall.
I hope anyone who follows the series gets it, too. However, I would lie if I said that I understand all the mechanics of Dyld at the atomic level.
All links to the articles with tags can be found in the Snake&Apple repo.