At this point in time, I think it’s safe to say that almost all programmers on Windows take shared libraries (DLLs) for granted. They’re this background thing that always “just works” (even if you do recall the ‘DLL hell’ days). They’re the reason .NET’s side-by-side assemblies work. They’re the main force in localization these days. All of the OS APIs you use are exposed from them. They’re ubiquitous. And very few people understand how they work under the hood.
What I am about to describe is a process gleaned from years of experience, reading the PE32 file format documentation, working on compilers, as well as a few insightful articles posted over the years. It’s not gospel. However, it applies to loading DLLs whether you’ve linked against their stub library (so they get loaded automatically when the application launches), or whether you’re lazy-loading libraries via calls to LoadLibrary.
Eventually, all paths wind up in the same place within the kernel: LdrpLoadLibrary. This is the private function responsible for doing the grunt work of loading up a library, whether it’s done from the executive loader, or from a call to LoadLibrary.
At the highest level, this call does a handful of actions: check to see if the library is already loaded and if it is, we’re done. But if the library isn’t loaded, it maps the library into memory, locates some key pieces of information within the library, does some internal bookkeeping, checks to see if further libraries need to be loaded, and then we’re done.
Sounds simple, right?
Before anything else happens, the loader acquires a global lock to ensure that there’s no monkey business happening behind its back. Whenever the loader finishes its process (either successfully or unsuccessfully), it will release the loader lock.
When the loader checks to see if the library is already loaded, there are two locations (generally speaking) that is checks. One is the “known DLLs” list, which is a list of system DLLs that are considered important enough, and common enough, to warrant always being available to the OS. This list is kept as a section within the object list (which you can view with things like winobj). The loaded first looks to see whether the library is in this object, and if it is, the section can be used directly instead of performing any further work. This list is a performance optimization as well as a security mechanism. If the DLL is listed in the known dlls list, then you can do a trojan attack by inserting your own DLL in the search path. Of course, there are other security mechanisms which prevent this as well these days, but defense in-depth is never a bad way to go!
If the library is not in the known dlls list, then the loader checks the PEB for the executable next. Every process has an “execution block” (or PEB) which keeps process-wide information around for the OS to make use of. The PEB has three linked lists of module information. All three lists contain the same information, just in different orders. One list is the in-memory order, one is the initialization order, and one is the load order. One of these lists is traversed to see if the library in question has already been loaded for the process.
If the library is not a known dll, and is not in the PEB’s list of loaded libraries, then the real work starts. First, the library’s housing is located on disk by using the DLL search path heuristics. If the library’s file can be located, it is converted into a kernel object section handle. This involves mapping the file into memory. However, the location in memory where the DLL is mapped is determined by first processing the PE32 header for the library. If the library supports address space layout randomization, then the loader picks a “random” memory location for the library and maps it there. If the library does not support ASLR, but has a preferred base address, the loader will attempt to map the file there. If the loader cannot map the file there, it will either pick another location in memory for the file or fail to load it (depending on the settings in the PE32 header).
After successfully mapping the file into memory, the loader fires off debugging events to alert debuggers that library has arrived. You’ll see this in the Output window of Visual Studio — it will display the path to the loaded library, and the address it was mapped into memory. I believe this is also the point at which DllMain is called with a DLL_PROCESS_ATTACH event. Note that the loader lock is still held when DllMain is called!
After that, the loader needs to create a module description entry to be placed into the PEB’s set of loaded module lists. So space is allocated for one of these structures, and the information about the module’s path and memory location are written into it, and all three lists are updated as appropriate.
Next, the loader needs to walk over the list of imports for the library. You see, the library you just loaded could rely on other libraries itself! This starts the whole process over again, recursively, until all of the libraries have been loaded. But there are two different types of import descriptors the loader has to worry about. One is the usual import descriptor where there’s a function name and a memory location where to find it. If the descriptor is of that variety, the loader needs to “fix-up” the address in the import address table for that function, since the library has been mapped into memory at some memory location that the compiler could have not known about when the library was created. The other type of import descriptor is called a “forwarded” call. This is the neat way in which you can have a single exported function that simply “forwards” the functionality off to another library and another function name. For instance, HeapAlloc is a function in Kernel32.dll that is forwarded off to NtHeapAlloc in NTDLL.dll. If the loader encounters a forwarded function, it needs to parse out the library name and ensure that it gets loaded too.
So let’s review this tangled maze. A library can import functions from another library. These show up in the import list in the usual fashion. A library can also forward functions to another library, and these need to be fixed up when loading the library as well. This explains why loading one library can suddenly pull in several others — they’re necessary!
After all of the imports have been resolved, the module’s “load count” is updated. Libraries are reference counted entities on Windows, and so the count is incremented when the library has been “loaded”, and decremented when the library is “unloaded.” I put those in quotes because a loaded library will not re-load, it’s load count is simply incremented. Conversely, freeing a library doesn’t actually unload it until the load count drops to zero.
As you can see, loading a library is not a trivial task. There also a lot of minutiae I am glossing over, such as handling thread local storage allocations, setting up SEH exception information, etc. But you should get the general idea with a bit more detail than you had previously.
There are some key failure points in this process which you should be aware of. 1) Each library in the load chain must be located on disk, in the expected search path. If one of the libraries cannot be located, then all of the libraries are unloaded and the call to LoadLibrary fails, or the executable fails to launch. 2) If the loader cannot find a memory location to map any of the libraries into, then the entire load fails. This can happen because a library has flagged itself as only being loaded into a specific memory location, or because the loader cannot find a large enough contiguous block of memory within the application’s address space to map the library. 3) If any of the libraries in the load chain require TLS slots to be allocated as part of their loading, and there are no more TLS slots available (the limit is somewhere around 64, IIRC), then the entire load fails. 4) If any of the imported functions cannot be located in the library chain, then the entire load fails. 5) If the loader lock has already been acquired, and you do something in your DllMain to trigger loading another library via a call to LoadLibrary, you run into a deadlock situation. 6) If you return FALSE from your DllMain’s process attached event, then the entire load fails.
You’ll notice that one obvious problem isn’t mentioned as a failure point. What happens when there is a cycle in the dependently loaded libraries? The loader is smart enough to handle this — because the load list is updated before attempting to process the import list, when the secondary library is loading it looks like the initial library has already been loaded. So everything goes better than expected.
The long and short of it is that DLLs are not magic. But the way they are loaded for a process is complex enough that they could certainly seem like magic some days! But, while DLLs can be an optimization of sorts for an application (by allowing you to share code), they come with a cost in terms of loading them.