Archive for January, 2008

Dangerous Detours, Wrap-Up

This concludes the little series about the limitations of Detours:

Granted, the probability of experiencing any of the problems described in these posts is rather low. Whether these problems should be considered bugs of Detours or rather an inherent problem of the concept is not quite easy to judge — on the one hand, Detours indeed acts a little naive and especially the unhooking problem could have been easily avoided. Addressing the problems depicted in the previous posts, however, would require a sinificantly more precise analysis of the binary code, which is expensive and comes with its own risks.

Concluding from these facts, my evaluation is that Detours is a decent technology for hooking explicitly chosen functions — functions you may know the disassembly of and whose “detourability” can be tested. In contrast to this, I consider Detours an inappropriate technology for hooking functions determined at runtime, i.e. functions you may not know and whose “detourability” cannot be tested. Using Detours for implementing tracing functionality, as suggested by one of the samples, should thus be considered not a particularly good idea.

Dangerous Detours, Part 4: Undetouring

Having discussed what can go wrong when detouring a function, we will now take a closer look at undetouring. Again, there is a problem — in my opinion an even more severe than the ones discussed previously — that has not been addressed by the Detours library.

Undetouring is a multi-step process and requires the user to follow a certain protocol. The basic idea is as follows: The caller creates a transaction, registers all threads that might have been affected by the detour and specifies which functions to unhook. As soon as a thread is registered, it is suspended until transaction commit. When the user decides to commit the transaction by calling DetourTransactionCommit, Detours unhooks the functions and checks if any of the registered (now suspended) threads was about to execute code of one of the trampolines that are to be freed. If such a thread is found, its context is updated so that the instruction pointer points to the original (now restored) function code again. The trampolines are then freed and all registered threads are resumed.

Chances are good that the process now runs flawlessly again. There is however, a situation Detours neglects to take into account. Let us consider the following code, which illustrates the problem.

static PDETOUR_TRAMPOLINE Trampoline;
static HANDLE WaitHandle;

__declspec(naked)
static void GenericHook2()
{
  __asm pushad;
  wprintf( L"GenericHook\n" );
  __asm popad;
  __asm jmp [Trampoline];
}

__declspec(noinline)
static void WaitForSomething()
{
  WaitForSingleObject( WaitHandle, INFINITE );

  wprintf( L"Wait satisfied\n" );
}

__declspec(noinline)
__declspec(naked)
static void SomeFunction()
{
  _asm pushad;  // any instruction < 5 bytes
  WaitForSomething();
  _asm popad;
  _asm ret;
}

DWORD CALLBACK ThreadProc( PVOID )
{
  SomeFunction();
  wprintf( L"SomeFunction returned\n" );
  return 0;
}

int UnloadDetour()
{
  //
  // Detour
  //
  if ( ERROR_SUCCESS != DetourTransactionBegin() )
  {
    wprintf( L"DetourTransactionBegin failed" );
    return 1;
  }
  
  if ( ERROR_SUCCESS != DetourUpdateThread( GetCurrentThread() ) )
  {
    wprintf( L"DetourUpdateThreadfailed" );
    return 1;
  }
  
  PVOID Func = ( PVOID ) SomeFunction;
  PVOID DetourPtr;
  PVOID TargetPtr;
  if ( ERROR_SUCCESS != DetourAttachEx( 
      &Func, 
      GenericHook2,
      &Trampoline,
      &TargetPtr, 
      &DetourPtr ) )
  {
    wprintf( L"DetourAttachExfailed" );
    return 1;
  }

  if ( ERROR_SUCCESS != DetourTransactionCommit() )
  {
    wprintf( L"DetourTransactionCommitfailed" );
    return 1;
  }

  //
  // Kick off thread
  //
  WaitHandle = CreateEvent( NULL, FALSE, FALSE, NULL );

  HANDLE Thread = CreateThread( NULL, 0, ThreadProc, NULL, 0, NULL ); 
  if ( ! Thread )
  {
    wprintf( L"CreateThread failed" );
    return 1;
  }

  wprintf( L"Trampoline at %p\n", Trampoline );
  Sleep( 1000 );

  //
  // Remove detour
  //
  if ( ERROR_SUCCESS != DetourTransactionBegin() )
  {
    wprintf( L"DetourTransactionBegin failed" );
    return 1;
  }
  
  if ( ERROR_SUCCESS != DetourUpdateThread( Thread ) )
  {
    wprintf( L"DetourUpdateThread failed" );
    return 1;
  }
  
  if ( ERROR_SUCCESS != DetourUpdateThread( GetCurrentThread() ) )
  {
    wprintf( L"DetourUpdateThread failed" );
    return 1;
  }
  
  if ( ERROR_SUCCESS != DetourDetach( 
      &Func, 
      GenericHook2 ) )
  {
    wprintf( L"DetourDetach failed" );
    return 1;
  }

  if ( ERROR_SUCCESS != DetourTransactionCommit() )
  {
    wprintf( L"DetourTransactionCommit failed" );
    return 1;
  }
  
  //
  // unwait thread
  // 
  SetEvent( WaitHandle );
  WaitForSingleObject( Thread, INFINITE );
  CloseHandle( WaitHandle );
  CloseHandle( Thread );

  return 0;
}

To summarize, the code does the following:

  • Detour SomeFunction.
  • Spawn a new thread that calls SomeFunction. SomeFunction in turn calls WaitForSomething and thus waits on an event that is not signalled.
  • The main thread decides to undetour SomeFunction…
  • …and signals the event the other thread is waiting for.
  • The second thread continues execution.

Running the code immediately results in an access violation.

So let us see what has happened. Similar to the previous examples, the root cause of the crash lies in the binary layout of SomeFunction. The key point of the function is that the call instruction is not the first instruction but is also no more than 4 bytes off the beginning of the function. With sufficient bad luck, I am convinced that the optimizer can indeed come up with code that exactly satisfies these two requirements. In order to simulate this behaviour and save some time, I cheated a bit and made SomeFunction naked to suppress the prolog and inserted a bogus instruction, _asm pushad (any instruction with less than 4 bytes will do the trick). The code now looks like this:

SomeFunction:
004117D0 90               nop              
004117D1 E8 CA 00 00 00   call WaitForSomething (4118A0h) 
004117D6 61               popad            
004117D7 C3               ret              

Now the function is hooked and if one has been following so far, it should be clear that Detours has no other choice than moving the jump into the trampoline. Indeed, after detouring, the code looks like this:

SomeFunction:
004117D0 E9 7B 03 00 00   jmp GenericHook2 (411B50h) 
004117D5 CC               int 3   ; Padding inserted by Detours
004117D6 61               popad            
004117D7 C3               ret              

And the trampoline like this:

SomeFunction:
003F0061 E8 3A 18 02 00   call WaitForSomething (4118A0h) 
003F0066 E9 6B 17 02 00   jmp  SomeFunction+6 (4117D6h) 

So far, so good — but now we undetour this function while — and this is the second important point — SomeFunction is on the call stack of the second thread (003f0066() is the trampoline):

ntdll.dll!_ZwWaitForSingleObject@12()  + 0x15 bytes	
kernel32.dll!_WaitForSingleObjectEx@12()  + 0x8f bytes	
kernel32.dll!_WaitForSingleObject@8()  + 0x12 bytes	
Detours.exe!WaitForSomething()  
003f0066()	
Detours.exe!ThreadProc(void * __formal=0x00000000) 
kernel32.dll!@BaseThreadInitThunk@12()  + 0xe bytes	
ntdll.dll!__RtlUserThreadStart@8()  + 0x23 bytes	

As described above, Detours suspends the second thread, unhooks the function, frees the trampoline and finally resumes our second thread again. After the event has been signalled, the second theread’s stack is finally unwound. However, when the trampoline’s stack frame is reached, the trampoline has long been freed and as it happens, the memory has been zeroed out. Still, execution is happily resumed (no DEP used) and the zeros are executed, which decode as add byte ptr [eax], al. Unurprisingly, this quickly ends in an access violation.

How should Detours have behaved instead? Obviously, merely checking that none of the threads is about to execute code of the trampoline is not enough. What should have been made to avoid the crash os to walk each registered thread’s callstack and check that none of the calls has been made from within a trampoline that is to be freed. Of course, this approach is pretty much undeasible, especially in the absence of proper symbols. So the best approach would probably have been to avoid detouring SomeFunction alltogether — Detours should have noticed that moving the call into the trampoline is dangerours and should thus have failed the attempt to detour SomeFunction.

Dangerous Detours, Part 3: Messing execution flow

Last time I showed that as a result of its implementation strategy, Detours is unable to properly instrument certain kinds of functions. Unlike last time, where this merely led to unexpected debugging output, we will see how this can easily lead to a crash.

Consider this code (error checking omitted):

#include <stdio.h>
#include <tchar.h>
#include <windows.h>
#include <detours.h>

static PDETOUR_TRAMPOLINE Trampoline;

__declspec(noinline)
static void DoSomething()
{
  wprintf( L"DoSomething\n" );
  Sleep( 100 );
}

//
// This is the function we want to instrument.
//
__declspec(noinline)
static void OriginalFunction( PCWSTR Arg1, LONG Arg2 )
{
  _asm xor eax, eax;
  for (;;)
  {
    DoSomething();
  }  
}

//
// The generic hook we will install using Detours.
//
__declspec(naked)
static void GenericHook()
{
  //
  // Do something -- not important what...
  //
  __asm pushad;
  wprintf( L"GenericHook\n" );
  __asm popad;
  
  //
  // Done with instrumentation code, continue with
  // the real function. The trampoline (created 
  // by Detours) contains the replaced instructions
  // and a jump to the body of the real function.
  //
  __asm jmp [Trampoline];
}

int wmain()
{
  ... Install hook -- same as last time ...
  
  //
  // Call the (now instrumented) function.
  //
  OriginalFunction( L"Hello", 42 );

  return 0;
}

The only difference to the code last time is the

_asm xor eax, eax;

inserted at the beginning of OriginalFunction. Note that any other instruction occupying less than 5 bytes should work just as well.

Try running this code and you will notice that the process immediately crashes with an Illegal Instruction exception. If you replace

_asm xor eax, eax;

by

_asm nop;

you get a Privileged instruction exception, which is no better. In any case, there is something going completely wrong here, so let us have a look at the assembly code again.

Unlike last time, where OriginalFunction consisted of a single basic block, we now have two blocks:

Flow Chart

The important thing to notice is that the first block is shorter than 5 bytes. This forces Detours to copy this block as well as the beginning of the following block to the trampoline in order to make place for the jump to the hook. After the hooking, the code looks like this:

OriginalFunction2:
00401031  jmp         GenericHook (40101Dh) 
00401036  int         3    		; Padding bytes inserted 
00401037  int         3    		; by Detours
00401038  jmp         OriginalFunction2+2 (401033h) 

Like last time, Detours fails to notice the short jump pointing back into the first 5 bytes. However, this time the consequences are worse — whereas last time the jump pointed to a jump instruction (the jump to the hook), this time the jump points to the third byte of the jump instruction! So after following the jump, the processor begins decoding at the wrong position in code and suddenly sees this:

OriginalFunction2:
00401031  db          e9h  
00401032  db          e7h  
00401033  db          ffh  ; <== EIP after following the jump
00401034  db          ffh  
00401035  dec         esp  
00401037  int         3    
00401038  jmp         OriginalFunction2+2 (401033h) 

Or, when using nop rather that xor:

OriginalFunction2:
00401031  db          e9h  
00401032  out         0FFh,eax ; <== EIP after following the jump
00401034  db          ffh  
00401035  dec         esp  
00401037  jmp         OriginalFunction2+1 (401032h) 

…which explains the exceptions.

This concludes the discussion of detouring functions with unusual prologs. However, there is another interesting catch regarding function un-hooking, which I will describe next time.

Dangerous Detours, Part 2: Unexpected Behaviour

Last time I described how to ‘replace’ a function by another using Detours. The important point was that first, we did not care about the original function any more once it has been hooked and second, we used a hook specially crafted for one particular to-be-hooked function.

This time we will make use of the trampoline in order to enable the hook function to call the original function. Furthermore, we do not want to create a separate hook function for every function we wish to hook but rather implement some kind of a generic hook — a hook function that is capable of hooking any possible function.

Example: Genneric hook

Let us say that we wish to instrument some application we do not have the source code for in order to implement tracing. What we want to achieve is basically the same as __penter does if we had the source code — on entry of each instrumented function, we want to spill out some debug output. The exact output does not matter for this example, so we will just output some sample string to the console.

Here is the code (error checking omitted):

#include <stdio.h>
#include <tchar.h>
#include <windows.h>
#include <detours.h>

static PDETOUR_TRAMPOLINE Trampoline;

__declspec(noinline)
static void DoSomething()
{
  wprintf( L"DoSomething\n" );
  Sleep( 100 );
}

//
// This is the function we want to instrument.
//
__declspec(noinline)
static void OriginalFunction( PCWSTR Arg1, LONG Arg2 )
{
  for (;;)
  {
    DoSomething();
  }  
}

//
// The generic hook we will install using Detours.
//
__declspec(naked)
static void GenericHook()
{
  //
  // Do something -- not important what...
  //
  __asm pushad;
  wprintf( L"GenericHook\n" );
  __asm popad;
  
  //
  // Done with instrumentation code, continue with
  // the real function. The trampoline (created 
  // by Detours) contains the replaced instructions
  // and a jump to the body of the real function.
  //
  __asm jmp [Trampoline];
}

int wmain()
{
  //
  // Install hook.
  //
  DetourTransactionBegin();
  DetourUpdateThread( GetCurrentThread() );
  
  PVOID Func = ( PVOID ) OriginalFunction;
  PVOID DetourPtr;
  PVOID TargetPtr;
  DetourAttachEx( 
      &Func, 
      GenericHook,
      &Trampoline,
      &TargetPtr, 
      &DetourPtr );

  DetourTransactionCommit();

  //
  // Call the (now instrumented) function.
  //
  OriginalFunction( L"Hello", 42 );

  return 0;
}

OriginalFunction calls DoSomething in an infinite loop and DoSomething sleeps a little — not very interesting code, but it will do the trick. GenericHook contains the instrumentation code and a jump to the trampoline in order to continue execution at the original function. It should be clear that in order to keep the stack balanced, we have to make GenericHook a naked function.

We then instruct Detours to hook OriginalFunction with GenericHook — but unlike last time, we use DetourAttachEx instead of DetourAttach and save the Trampoline pointer in a global variable so that we can use it from within GenericHook. In order to keep the compiler from inlining functions, I decorated them with __declspec(noinline). Also note that I compiled with /O1 (optimize for size) and without /RTCs.

Concluding from the code, the following output should be expected:

GenericHook
DoSomething
DoSomething
DoSomething
...

We call OriginalFunction once (which produces the output ‘GenericHook’) and then call DoSomething in a loop. Unfortunately, the output instead looks like this:

GenericHook
DoSomething
GenericHook
DoSomething
GenericHook
DoSomething
GenericHook
DoSomething
GenericHook
DoSomething
...

So what has happened? This is the assembler code of OriginalFunction before hooking:

OriginalFunction:
00401031 call    DoSomething (401008h)                   <----+
00401036 jmp     OriginalFunction (401031h)    short jump ----+

Note that the compiler optimized away the epilog and prolog and that the function thus consists of a single basic block only.

After the hooking, the code looks like this:

; The trampoline
003F0060 call    DoSomething (401008h)          <-----------+
003F0065 jmp     OriginalFunction+5 (401036h)   --------------+
                                                            | |
...                                                         | |
                                                            | |
GenericHook:                                                | |
0040101D pushad                                         <-+ | |
0040101E push    offset string L"GenericHook\n" (4042ACh) | | |
00401023 call    dword ptr [__imp__wprintf (4030D8h)]     | | |
00401029 pop     ecx                                      | | |
0040102A popad                                            | | |
0040102B jmp     dword ptr [__fmode+4 (405390h)]   -------|-+ |
                                                          |   |
...                                                       |   |
                                                          |   |
OriginalFunction:                                         |   |
00401031 jmp     GenericHook (40101Dh)       <-+     -----+   |  
00401036 jmp     OriginalFunction (401031h)  --+     <--------+

As expected, the first 5 bytes of OriginalFunction (the call to DoSomething) have been overwritten by a jump to GenericHook. At the end of the hook, execution jumps to the trampoline, which contains the initially overwritten call to DoSomething, followed by a jump to the remainder of OriginalFunction.

So where is the flaw? First note that whereas a function normally starts with a 3 byte-prolog, the compiler noticed that OriginalFunction does not make use of the stack and thus omitted both prolog and epilog. As a consequence, our for-loop starts right at the top of the function. Finally, as the body of the for-loop starts with a call (i.e. 5 bytes, just what Detours is looking for) Detours split the loop body — the call moved into the trampoline, the short jump stayed where it was. What that means is that the short jump suddenly points to the GenericHook-jump rather than to the DoSomething-call.

So as it turns out, the compiler has created code for OriginalFunction that Detours is unable to digest properly. Indeed by blindly replacing the first couple of instructions in the code, Detours has implicitly assumed that the first basic blocks (to which these instructions belong) are never re-entered by a jump. As almost all functions begin with the ordinary push ebp/mov ebp, esp/sub esp, xxx prolog, this assumptions holds for the vast majority of functions. OriginalFunction however, consisting of only a single basic block violated this assumption and as a result, the hook is re-entered on every loop iteration which leads to the unexpected output.

So can we blindly trust Detours to do the right thing? Obviously not. However to be fair, calling the hook function too many times is annoying and a waste but is does not harm our program. Unfortunately, as we will see next time, it can get worse…

Dangerous Detours, Part 1: Introduction

Detours is a library that allows you to hook arbitrary functions by rewriting machine code. While a description of the exact implementation approach can be found in the corresponding paper as well as in numerous other sources, the basic idea is as follows:

  • In the to-be-hooked function, disassemble the first instructions until you have read at least 5 bytes. As instructions are variable length on x86, we may end up having to read more than 5 bytes to reach the next instruction boundary. Let the number of bytes read be n.
  • Allocate n+5 bytes of memory which will make up the trampoline.
  • Copy the n bytes from the to-be-hooked function to the trampoline, followed by a near jump to the to-be-hooked function + offset n (i.e. the first instruction after the instructions we copied)
  • Now overwrite the first 5 bytes of the to-be-hooked function with a near jump to the hook-function
  • The hook function may either return, so that the original function is never executed or instead jump to the trampoline

If we want to replace the to-be-hooked function, the execution flow is as follows:

  • Entering to-be-hooked function
  • Jump to hook function
  • Hook function does its work
  • Return to caller

The body of the to-be-hooked function is never executed.


        Caller function
       /          ^
      /           |
     v            |
  (via jmp        |
in to-be-hooked   |
  function)       |
     |            |
     v            | return
    Hook  --------+
  function
     

If instead the hook function, after having done its work, wants the original function to execute, the execution flow is as follows:

  • Entering to-be-hooked function
  • Jump to hook function
  • Hook function does its work
  • Jump to trampoline
  • Execute n bytes in trampoline
  • Jump to original function + offset n
  • Original function runs and finally returns

        Caller function
       /                ^
      /                  \
     v                    \
  (via jmp                 \
in to-be-hooked       To-be-hooked 
  function)             function
     |                      ^
     v                      |
    Hook  -------------> Trampoline
  function
     

Example: Replacing a function

As an example, we ‘replace’ OriginalFunction by AlternateFunction. The source code is as follows (error checking omitted):

#include <stdio.h>
#include <tchar.h>
#include <windows.h>
#include <detours.h>

__declspec(noinline)
static void OriginalFunction( PCWSTR Arg1, LONG Arg2 )
{
  wprintf( L"OriginalFunction(%s, %d)\n", Arg1, Arg2 );
}

__declspec(noinline)
static void AlternateFunction( PCWSTR Arg1, LONG Arg2 )
{
  wprintf( L"AlternateFunction(%s, %d)\n", Arg1, Arg2 );
}

int wmain()
{
  //
  // Install hook.
  //
  DetourTransactionBegin();
  DetourUpdateThread( GetCurrentThread() );
  
  PVOID Func = ( PVOID ) OriginalFunction;
  DetourAttach( 
      &Func, 
      AlternateFunction );

  DetourTransactionCommit();
  
  //
  // Call (hooked) function.
  //
  OriginalFunction( L"Hello", 42 );

  return 0;
}

The code is straightforward — we instruct Detours to hook OriginalFunction and call AlternateFunction instead. We do not make use of the trampoline. The output is:

AlternateFunction(Hello, 42)

Up to this point, everything works as advertised. And indeed there is little that can go wrong if we just want to ‘replace’ a function. If, however, we want the hook function eventually call the original function by making use of the trampoline, it gets more interesting, as we will see in Part 2.

Using Import Address Table hooking for testing

For a procedure that is free of side effects, it is a relatively easy task to create a unit test that achieves sufficient code coverage by testing all (or at least all interesting) combinations of input data and verifying the computed results.

If, however, the procedure is not free of side effects, the state (global variables, external data, etc.) modified by the procedure has to be taken into account. A solid testcase has to test both the effects of the state on the correctness of the procedure and the correctness of the procedure’s modifications on the state. As a consequence, the testcase has to initialize the state to a ‘known state’ before running each test and validate the state after each test.

If the state can be queried and easily modified by the testcase’s initialization code to construct the individual testing scenarios, writing such a testcase may be laborious but does not pose a real problem. Things are a little different if the state is outside of the programmers control or at least hard to query and alter. This case is especially common when writing code that interfaces the operating system — a common example may be a library that uses the filesystem or the registry to query and modify data. While both filesystem and registry are accessibly to the testing code, several problems arise:

  1. The state is unknown on test case startup. Any affected files/keys have to be initialized to a defined state first.
  2. Parts of the data may be unaccessible for security reasons. As an example, the procedure under test may need to read settings from HKLM. The affected keys under HKLM thus belong to the state that affects the execution of the procedure and our test case needs to run the procedure using different initial states, i.e. using different values for the keys in HKLM. But, as the testcase should be able to run as normal user (maybe to ensure LUA compliance), modifying these keys is forbidden.
  3. The state may be altered concurrently by other programs running on the same system.

Such a test case may easily contain more code initializing and validating state (key and file creation/deletion etc) than actual test code. Needless to say, creating such test cases is a pain.

For some testcases, touching real files and keys may not only be troublesome but is actually of minor interest. For example, the objective of a testcase might be to check error handling code. In such situations, we are not really interested in modifying/querying the actual state of the system but rather in having the procedure under test see a specific state (e.g. error condition) or make specific changes to the state. The idea is thus to intercept the calls into the OS libraries by appropriate mocks or stubs: Instead of acting on the real state, we give the procedure under test the illusion of executing in a specific state and intercept its modifications. That way, initialization of state becomes trivial, the problem of concurrent access is mitigated and error conditions can be easily simulated.

The question of course is how to create such mocks or stubs given that the code under test will usually be statically linked to the appropriate OS libraries. One easy and rather nice solution is to use IAT hooking.

Import Address Table hooking

The basic idea of IAT hooking is to take a module’s Import Address Table (where the loader puts the function pointers of imported functions) and patch specific entries. Much has been written about IAT hooking, so I will skip the details. In contrast to other hooking techniques, IAT hooking has at least 2 interesting properties.

  • Hooking only requires exchanging a single pointer, which can be done using an interlocked instruction. In comparison to other hooking/patching techniques, IAT hooking can thus be considered low-risk.
  • Modifications affect a specific module only. If both module A and B import ReadFileW and module A’s IAT is patched to use FooReadFileW instead, module B remains unaffected.

Especially the second property makes IAT hooking interesting for use by test code, as the modifications can be scoped to affect the module under test only (assuming that test code and code under test are located in different modules).

The following listing shows a simple function that shows how to access a loaded PE image in order to hook a single entry in a module’s IAT. Note that the function is capable of hooking named imports only and does not provide any thread-safety (see my additional remark at the end of the post).

#define PtrFromRva( base, rva ) ( ( ( PBYTE ) base ) + rva )

/*++
  Routine Description:
    Replace the function pointer in a module's IAT.

  Parameters:
    Module              - Module to use IAT from.
    ImportedModuleName  - Name of imported DLL from which 
                          function is imported.
    ImportedProcName    - Name of imported function.
    AlternateProc       - Function to be written to IAT.
    OldProc             - Original function.

  Return Value:
    S_OK on success.
    (any HRESULT) on failure.
--*/
HRESULT PatchIat(
  __in HMODULE Module,
  __in PSTR ImportedModuleName,
  __in PSTR ImportedProcName,
  __in PVOID AlternateProc,
  __out_opt PVOID *OldProc
  )
{
  PIMAGE_DOS_HEADER DosHeader = ( PIMAGE_DOS_HEADER ) Module;
  PIMAGE_NT_HEADERS NtHeader; 
  PIMAGE_IMPORT_DESCRIPTOR ImportDescriptor;
  UINT Index;

  _ASSERTE( Module );
  _ASSERTE( ImportedModuleName );
  _ASSERTE( ImportedProcName );
  _ASSERTE( AlternateProc );

  NtHeader = ( PIMAGE_NT_HEADERS ) 
    PtrFromRva( DosHeader, DosHeader->e_lfanew );
  if( IMAGE_NT_SIGNATURE != NtHeader->Signature )
  {
    return HRESULT_FROM_WIN32( ERROR_BAD_EXE_FORMAT );
  }

  ImportDescriptor = ( PIMAGE_IMPORT_DESCRIPTOR ) 
    PtrFromRva( DosHeader, 
      NtHeader->OptionalHeader.DataDirectory
        [ IMAGE_DIRECTORY_ENTRY_IMPORT ].VirtualAddress );

  //
  // Iterate over import descriptors/DLLs.
  //
  for ( Index = 0; 
        ImportDescriptor[ Index ].Characteristics != 0; 
        Index++ )
  {
    PSTR dllName = ( PSTR ) 
      PtrFromRva( DosHeader, ImportDescriptor[ Index ].Name );

    if ( 0 == _strcmpi( dllName, ImportedModuleName ) )
    {
      //
      // This the DLL we are after.
      //
      PIMAGE_THUNK_DATA Thunk;
      PIMAGE_THUNK_DATA OrigThunk;

      if ( ! ImportDescriptor[ Index ].FirstThunk ||
         ! ImportDescriptor[ Index ].OriginalFirstThunk )
      {
        return E_INVALIDARG;
      }

      Thunk = ( PIMAGE_THUNK_DATA )
        PtrFromRva( DosHeader, 
          ImportDescriptor[ Index ].FirstThunk );
      OrigThunk = ( PIMAGE_THUNK_DATA )
        PtrFromRva( DosHeader, 
          ImportDescriptor[ Index ].OriginalFirstThunk );

      for ( ; OrigThunk->u1.Function != NULL; 
              OrigThunk++, Thunk++ )
      {
        if ( OrigThunk->u1.Ordinal & IMAGE_ORDINAL_FLAG )
        {
          //
          // Ordinal import - we can handle named imports
          // ony, so skip it.
          //
          continue;
        }

        PIMAGE_IMPORT_BY_NAME import = ( PIMAGE_IMPORT_BY_NAME )
          PtrFromRva( DosHeader, OrigThunk->u1.AddressOfData );

        if ( 0 == strcmp( ImportedProcName, 
                              ( char* ) import->Name ) )
        {
          //
          // Proc found, patch it.
          //
          DWORD junk;
          MEMORY_BASIC_INFORMATION thunkMemInfo;

          //
          // Make page writable.
          //
          VirtualQuery(
            Thunk,
            &thunkMemInfo,
            sizeof( MEMORY_BASIC_INFORMATION ) );
          if ( ! VirtualProtect(
            thunkMemInfo.BaseAddress,
            thunkMemInfo.RegionSize,
            PAGE_EXECUTE_READWRITE,
            &thunkMemInfo.Protect ) )
          {
            return HRESULT_FROM_WIN32( GetLastError() );
          }

          //
          // Replace function pointers (non-atomically).
          //
          if ( OldProc )
          {
            *OldProc = ( PVOID ) ( DWORD_PTR ) 
                Thunk->u1.Function;
          }
#ifdef _WIN64
          Thunk->u1.Function = ( ULONGLONG ) ( DWORD_PTR ) 
              AlternateProc;
#else
          Thunk->u1.Function = ( DWORD ) ( DWORD_PTR ) 
              AlternateProc;
#endif
          //
          // Restore page protection.
          //
          if ( ! VirtualProtect(
            thunkMemInfo.BaseAddress,
            thunkMemInfo.RegionSize,
            thunkMemInfo.Protect,
            &junk ) )
          {
            return HRESULT_FROM_WIN32( GetLastError() );
          }

          return S_OK;
        }
      }
      
      //
      // Import not found.
      //
      return HRESULT_FROM_WIN32( ERROR_PROC_NOT_FOUND );    
    }
  }

  //
  // DLL not found.
  //
  return HRESULT_FROM_WIN32( ERROR_MOD_NOT_FOUND );
}

Using IAT hooks to create stubs or mocks

Having IAT hooking at hands, it is now straightforwanrd to implement and install stubs or mocks. Given a procedure that, for example, uses some of the Reg* functions to query and modify the registry, all we have to do is implement stubs having the same signature as the corresponding Reg* functions and install them using the technique described above. The testcase will then, though being statically linked to advapi32, call our stubs instead of the real registry routines. Within the stub, we are free to delegate to the real registry routines as required, provided that the stubs are either located in a different module (s.t. the IAT hooks do not apply) or that these calls are made using the ‘original’ function pointers.

Various scenarios come to mind where such hooks can help testing, two shall now be discussed.

Scenario 1: Testing Error Checking

When interfacing the Win32 API, error handling code like the following is ubiquitous:

  ...
  res = RegQueryValueEx(
    key,
    Name,
    0,
    &dataType,
    Buffer,
    &dataRead );
  if ( ERROR_ACCESS_DENIED == res )
  {
    ...
  }
  else if ( ERROR_SUCCESS != res )
  {
    ...
  }
  else
  {
    ...
  }

In order to achieve full code coverarge for this code block, we have to implement at least 3 test cases with RegQueryValueEx returning ERROR_ACCESS_DENIED, ERROR_SUCCESS and some other error code, respectively. Under normal conditions, this would require the initialization and teardown code of each of these three testcases to modify the registry appropriately. Using IAT hooks, we can leave the registry untouched and instead use three different alternate implementations of RegQueryValueEx, each returning the appropriate error code and updating any out-parameters.

When using C++ rather than C, we can even save a considerable amount of typing by getting creative with templates.

As an example, consider a procedure that, depending on the parameter values passed, writes to either HKCU or HKLM. As writing to HKLM is permitted to admins only, it is vital to test that the procedure fails gracefully when access to certain keys is forbidden. In order to simulate the following conditions, we may choose to implement a templatized function.

  • Simulate normal user — HKCU is allowed for write access, HKLM not
  • Simulate admin — both HKCU and HKLM allowed for write access
  • Simulate weird ACL settings — deny access to some or all keys in both HKCU and HKLM

At the beginning of each testcase, FailRegCreateKeyEx with appropriate arguments is then installed as a hook for RegCreateKeyEx.


template< BOOL FailOnHkcu, BOOL FailOnHklm >
static LONG FailRegCreateKeyEx (
    __in HKEY hKey,
    __in LPCWSTR lpSubKey,
    __reserved DWORD Reserved,
    __in_opt LPWSTR lpClass,
    __in DWORD dwOptions,
    __in REGSAM samDesired,
    __in_opt LPSECURITY_ATTRIBUTES lpSecurityAttributes,
    __out PHKEY phkResult,
    __out_opt LPDWORD lpdwDisposition
    )
{
  //
  // N.B. Use locals to avoid constant expression-warnings.
  //
  BOOL failOnHklm = FailOnHklm;
  BOOL failOnHkcu = FailOnHkcu;
  if ( hKey == HKEY_LOCAL_MACHINE && failOnHklm ||
     hKey == HKEY_CURRENT_USER && failOnHkcu )
  {
    return ERROR_ACCESS_DENIED;
  }
  else
  {
    //
    // Assume that we are in a different module, so using
    // RegCreateKeyEx will not re-enter the hook.
    //
    return RegCreateKeyEx(
      hKey,
      lpSubKey,
      Reserved,
      lpClass,
      dwOptions,
      samDesired,
      lpSecurityAttributes,
      phkResult,
      lpdwDisposition );
  }
}
Scenario 2: Redirecting access

Rather than only saving us from typing complicated initialization and teardown code, IAT hooking also comes in handy for addressing problem 2. Again consider the scenario where we are to write a test case for an API that reads and writes to both HKLM and HKCU. HKLM may contain machine-wide settings which only administrators can modify using the code under test. HKCU may contain optional per-user settings, which (if present) override machine-wide settings.

In order to test writing machine-wide settings, the testcase process obviously needs administrative privileges, which are normally unavailable (assuming that you always run as limited user). Running the testcase as a different (adminstrative) user is both uncomfortable, makes debugging harder and is also risky — unlimately, a bug in the tested code could also harm your system.

A simple solution to this problem might be to explicitly grant access to the specific keys in HKLM s.t. they become writable. If this is not feasible or not flexible enough, IAT hooking can help again. In order to circumvent the problem of accessing keys in HKLM, we redirect all accesses to HKLM to some temporary key in HKCU.

The following code snippet illsutrates the basic idea (the code is not suitable for all usage scenarios of RegCreateKeyEx — but this usually is not required either). When asked to creare a key in HKLM, the hook instead creates a key in the location referred to by the global variable RedirectPathHklm:

static LONG VirtRegCreateKeyEx (
  __in HKEY hKey,
  __in LPCWSTR lpSubKey,
  __reserved DWORD Reserved,
  __in_opt LPWSTR lpClass,
  __in DWORD dwOptions,
  __in REGSAM samDesired,
  __in_opt LPSECURITY_ATTRIBUTES lpSecurityAttributes,
  __out PHKEY phkResult,
  __out_opt LPDWORD lpdwDisposition
  )
{
  //
  // Create virtualized key if neccessary
  //
  HKEY virtKey = NULL;
  BOOL closeKey = FALSE;
  LONG res = 0;

  if ( hKey == HKEY_LOCAL_MACHINE )
  {
    res = RegCreateKeyEx(
      HKEY_CURRENT_USER,
      RedirectPathHklm,
      0,
      NULL,
      0,
      KEY_ALL_ACCESS,
      NULL,
      &virtKey,
      NULL );
    if ( NOERROR != res ) 
      return res;

    closeKey = TRUE;
  }
  else
  {
    virtKey = hKey;
  }

  res = RegCreateKeyEx(
    virtKey,
    lpSubKey,
    Reserved,
    lpClass,
    dwOptions,
    samDesired,
    lpSecurityAttributes,
    phkResult,
    lpdwDisposition );

  if ( closeKey )
    RegCloseKey( virtKey );

  return res;
}

Multithreading

Up to this point, the discussion has ignored the issue of multithreading. As the Import Address Table is a (module-) global resource, concurrent access by different threads is subject to appropriate synchronization. If the IAT is only adjusted once during initialization of our test suite and does not need to be touched it again, this might not be a problem. More likely, however, is the requirement to modify (and reset) the IAT for each individual test case, which, of course will lead to all sorts of nasty race conditions when multiple testcases are executed in parallel on different threads.

Fortunately, this is not an inherent limitation of this approach — rather, to create an implementation safe for use in a multithreaded environment, it is well conceivable to implement something like thread local IAT hooks — hooks that only apply to a specifiy thread — maybe a topic for another day.

Conclusion

Though a bit tempting, using IAT hooks in each and every testcase would of course thwart the whole idea of a testcase — if the real OS APIs are never called but always intercepted by hooks, the test suite can quickly become pretty much worthless. However, as I have shown in this post, used consciously, IAT hooking can indeed make writing specific testcases substantially easier.


Categories




About me

Johannes Passing, M.Sc., living in Berlin, Germany.

Besides his consulting work, Johannes mainly focusses on Win32, COM, and NT kernel mode development, along with Java and .Net. He also is the author of cfix, a C/C++ unit testing framework for Win32 and NT kernel mode, Visual Assert, a Visual Studio Unit Testing-AddIn, and NTrace, a dynamic function boundary tracing toolkit for Windows NT/x86 kernel/user mode code.

Contact Johannes: jpassing (at) acm org

Johannes' GPG fingerprint is BBB1 1769 B82D CD07 D90A 57E8 9FE1 D441 F7A0 1BB1.

LinkedIn Profile
Xing Profile
Github Profile