Remote Thread Injection on Windows

I happened to have a legitimate case where I needed to inject a thread into another process, and in the process of solving this problem I realized there’s very little accurate information on the topic of remote thread injection available on the web. Most of the information out there on sites like Code Project and various “hacker” websites is riddled with antiquated information, or incorrect assumptions. This post is going to correct a lot of that information.

The basics of thread injection are relatively simple. When you create a remote thread using the aptly named CreateRemoteThread API, the two key parameters are the starting address of the thread callback function, and the pointer to be passed into the callback. Both of these pointers must exist within the address space of the target process. So the goal is to allocate a block of memory in the target process, copy over the function you wish to have executed as the thread procedure, and away you go! But the devil is in the details.

The usual approach to this is to write your thread procedure in the source executable, and then use WriteProcessMemory to copy over the machine code to the target process. This is where you run into your first struggle: anything within this function which uses a pointer is going to use pointers in the wrong address space. For instance, string literals cannot be used because the string literal’s memory does not exist within the target process. Eventually, you come to the conclusion that the easiest thing to do is to copy over a very simple function which gets you back into the source executable’s address space. So your injected function ends up looking like this:

static DWORD WINAPI RemoteThreadProc( LPVOID param )
{
	struct s {
		wchar_t module[ 1024 ];
		char function[ 256 ];
	};
	
	HMODULE hLib = ::LoadLibraryW( ((struct s *)param)->module );
	if (hLib) {
		FARPROC fp = ::GetProcAddress( hLib, ((struct s *)param)->function );
		if (fp) {
			fp();
		}
	}
	return 0;
}

You allocate space for the parameter block, and you pass in the library and function to be loaded. At this point, everyone else on the web says you’re done. However, they’re wrong (these days).

This code snippet relies on two pointers that are not immediately obvious: LoadLibraryW and GetProcAddress. These are functions, and functions are just memory addresses. In the olden days, you could get away with this code because every executable had ntdll.dll and kernel32.dll loaded first. So the memory location to those functions was in the same spot for every application on the system. However, in more modern times, applications are typically compiled with address space layout randomization enabled. When this feature is enabled for an executable, the OS loader will randomize the memory layout of the process, so the load order and memory locations of shared libraries and the executable itself will be different from instance to instance. Those function pointers aren’t likely to be meaningful in the target application.

Now things have gotten a bit more difficult, because in order to inject a function that can do meaningful work, you have to defeat ASLR. This is not an impossible task, but it requires a lot of diligence, and intimate knowledge of how the executive layer of the OS works. Considering how this can be used for evil, I have no intentions of posting the source code. However, since there can be legitimate uses for thread injection, I will cover the basic ideas.

Every process has an execution block associated with it, and this block contains information the OS needs to run the executable. One of those pieces of information is the list of modules that have been loaded for a process. You can walk that list, looking for the kernel32.dll module which is where LoadLibraryW and GetProcAddress live. One you’ve found the target module in the loader list, you know the base address it was loaded into. At that base address is the PE32 header for the shared library, and within that PE32 header is the export address table. This table contains a list of all the functions exported from the library, and their addresses. So you can walk the function list, looking for the target functions, and find their address. Then you simply need to modify the parameter being passed into injected function to contain the new function pointers. The injected function should look something like this now:

static DWORD WINAPI RemoteThreadProc( LPVOID param )
{
	struct s {
		HMODULE (WINAPI *fpLoadLibraryW)( LPCWSTR );
		FARPROC (WINAPI *fpGetProcAddress)( HMODULE, LPCSTR );
		wchar_t module[ 1024 ];
		char function[ 256 ];
	};
	
	struct s *p = (struct s *)param;
	HMODULE hLib = p->fpLoadLibraryW( p->module );
	if (hLib) {
		FARPROC fp = p->fpGetProcAddress( hLib, p->function );
		if (fp) {
			fp();
		}
	}
	return 0;
}

Now that we’ve solved the first major issue that everyone else glosses over (at least in terms of tasks to perform, if not in actual code), we can start to tackle some of the other issues no one mentions.

Getting the address of a function in C and C++ is actually an undefined operation according to the specification. Simply calling memcpy( buffer, SomeFunction, someSize ) isn’t guaranteed to work — the second parameter to memcpy is a const void *, and casting a function pointer to a data pointer is what’s undefined. So the first thing that you need to know is that you’re relying on something which can change at any point in time, depending on your compiler. What I found was that converting my function pointer to a data pointer actually yielded different memory locations! The data pointer would be aligned to the nearest page boundary, but the function pointer wouldn’t be located at the start of the page. So while you could copy over the entire page of data into the target process, you didn’t know the exact place to jump to! It wasn’t until I declared the function to be static that I could get the addresses to line up properly — however, there’s no guarantees it will work. To be perfectly safe, the best thing for you to do would be to compile your target function, and copy the data out and keep it in a constant unsigned char buffer as data. Don’t rely on the conversion from executable pointer to data pointer to work!

Another problem with the thread procedure is the stuff you can’t see — the compiler may inject extra code on your behalf. This is especially true for debug builds, but is also true for release builds. For instance, one common thing that Microsoft’s compilers will do is inject stack checking code to ensure that if the stack gets stomped with a buffer overrun, arbitrary code won’t be executed. However, these intrinsics inserted by the compiler are function pointers just like any other — so it’s likely that ASLR will bite you here too. So when compiling the code for the injected function, you should make liberal use of #pragmas to disable automatic code.

If you’re reading this post, hopefully you’re trying to do something academic and have learned useful information about remote thread injection. But if you’re thinking of using thread injection for production code, please rethink that idea. It’s a dangerous operation under the best of circumstances, but when you take into account the fact that you have to defeat ASLR, your production code is going to look very similar to malware.

And if you’re reading this post because you’re a script kiddie who wants to write a “virus”, you should probably be aware that remote thread injection doesn’t actually gain you anything of value. You can only inject a thread into an application running with the same privileges as the source app. So anything you do in the target can already be done in the source. Too bad for you.

This entry was posted in Win32 and tagged , . Bookmark the permalink.

2 Responses to Remote Thread Injection on Windows

  1. scomurr says:

    Thanks Aaron! Good info. :-)

  2. James says:

    Thanks for the article. I especially like the last paragraph as I work in computer security =)

Leave a Reply

Your email address will not be published. Required fields are marked *