Posts Tagged 'Detours'

Dangerous Detours, Wrap-Up

This concludes the little series about the limitations of Detours:

Granted, the probability of experiencing any of the problems described in these posts is rather low. Whether these problems should be considered bugs of Detours or rather an inherent problem of the concept is not quite easy to judge — on the one hand, Detours indeed acts a little naive and especially the unhooking problem could have been easily avoided. Addressing the problems depicted in the previous posts, however, would require a sinificantly more precise analysis of the binary code, which is expensive and comes with its own risks.

Concluding from these facts, my evaluation is that Detours is a decent technology for hooking explicitly chosen functions — functions you may know the disassembly of and whose “detourability” can be tested. In contrast to this, I consider Detours an inappropriate technology for hooking functions determined at runtime, i.e. functions you may not know and whose “detourability” cannot be tested. Using Detours for implementing tracing functionality, as suggested by one of the samples, should thus be considered not a particularly good idea.

Advertisements

Dangerous Detours, Part 4: Undetouring

Having discussed what can go wrong when detouring a function, we will now take a closer look at undetouring. Again, there is a problem — in my opinion an even more severe than the ones discussed previously — that has not been addressed by the Detours library.

Undetouring is a multi-step process and requires the user to follow a certain protocol. The basic idea is as follows: The caller creates a transaction, registers all threads that might have been affected by the detour and specifies which functions to unhook. As soon as a thread is registered, it is suspended until transaction commit. When the user decides to commit the transaction by calling DetourTransactionCommit, Detours unhooks the functions and checks if any of the registered (now suspended) threads was about to execute code of one of the trampolines that are to be freed. If such a thread is found, its context is updated so that the instruction pointer points to the original (now restored) function code again. The trampolines are then freed and all registered threads are resumed.

Chances are good that the process now runs flawlessly again. There is however, a situation Detours neglects to take into account. Let us consider the following code, which illustrates the problem.

static PDETOUR_TRAMPOLINE Trampoline;
static HANDLE WaitHandle;

__declspec(naked)
static void GenericHook2()
{
  __asm pushad;
  wprintf( L"GenericHook\n" );
  __asm popad;
  __asm jmp [Trampoline];
}

__declspec(noinline)
static void WaitForSomething()
{
  WaitForSingleObject( WaitHandle, INFINITE );

  wprintf( L"Wait satisfied\n" );
}

__declspec(noinline)
__declspec(naked)
static void SomeFunction()
{
  _asm pushad;  // any instruction < 5 bytes
  WaitForSomething();
  _asm popad;
  _asm ret;
}

DWORD CALLBACK ThreadProc( PVOID )
{
  SomeFunction();
  wprintf( L"SomeFunction returned\n" );
  return 0;
}

int UnloadDetour()
{
  //
  // Detour
  //
  if ( ERROR_SUCCESS != DetourTransactionBegin() )
  {
    wprintf( L"DetourTransactionBegin failed" );
    return 1;
  }
  
  if ( ERROR_SUCCESS != DetourUpdateThread( GetCurrentThread() ) )
  {
    wprintf( L"DetourUpdateThreadfailed" );
    return 1;
  }
  
  PVOID Func = ( PVOID ) SomeFunction;
  PVOID DetourPtr;
  PVOID TargetPtr;
  if ( ERROR_SUCCESS != DetourAttachEx( 
      &Func, 
      GenericHook2,
      &Trampoline,
      &TargetPtr, 
      &DetourPtr ) )
  {
    wprintf( L"DetourAttachExfailed" );
    return 1;
  }

  if ( ERROR_SUCCESS != DetourTransactionCommit() )
  {
    wprintf( L"DetourTransactionCommitfailed" );
    return 1;
  }

  //
  // Kick off thread
  //
  WaitHandle = CreateEvent( NULL, FALSE, FALSE, NULL );

  HANDLE Thread = CreateThread( NULL, 0, ThreadProc, NULL, 0, NULL ); 
  if ( ! Thread )
  {
    wprintf( L"CreateThread failed" );
    return 1;
  }

  wprintf( L"Trampoline at %p\n", Trampoline );
  Sleep( 1000 );

  //
  // Remove detour
  //
  if ( ERROR_SUCCESS != DetourTransactionBegin() )
  {
    wprintf( L"DetourTransactionBegin failed" );
    return 1;
  }
  
  if ( ERROR_SUCCESS != DetourUpdateThread( Thread ) )
  {
    wprintf( L"DetourUpdateThread failed" );
    return 1;
  }
  
  if ( ERROR_SUCCESS != DetourUpdateThread( GetCurrentThread() ) )
  {
    wprintf( L"DetourUpdateThread failed" );
    return 1;
  }
  
  if ( ERROR_SUCCESS != DetourDetach( 
      &Func, 
      GenericHook2 ) )
  {
    wprintf( L"DetourDetach failed" );
    return 1;
  }

  if ( ERROR_SUCCESS != DetourTransactionCommit() )
  {
    wprintf( L"DetourTransactionCommit failed" );
    return 1;
  }
  
  //
  // unwait thread
  // 
  SetEvent( WaitHandle );
  WaitForSingleObject( Thread, INFINITE );
  CloseHandle( WaitHandle );
  CloseHandle( Thread );

  return 0;
}

To summarize, the code does the following:

  • Detour SomeFunction.
  • Spawn a new thread that calls SomeFunction. SomeFunction in turn calls WaitForSomething and thus waits on an event that is not signalled.
  • The main thread decides to undetour SomeFunction…
  • …and signals the event the other thread is waiting for.
  • The second thread continues execution.

Running the code immediately results in an access violation.

So let us see what has happened. Similar to the previous examples, the root cause of the crash lies in the binary layout of SomeFunction. The key point of the function is that the call instruction is not the first instruction but is also no more than 4 bytes off the beginning of the function. With sufficient bad luck, I am convinced that the optimizer can indeed come up with code that exactly satisfies these two requirements. In order to simulate this behaviour and save some time, I cheated a bit and made SomeFunction naked to suppress the prolog and inserted a bogus instruction, _asm pushad (any instruction with less than 4 bytes will do the trick). The code now looks like this:

SomeFunction:
004117D0 90               nop              
004117D1 E8 CA 00 00 00   call WaitForSomething (4118A0h) 
004117D6 61               popad            
004117D7 C3               ret              

Now the function is hooked and if one has been following so far, it should be clear that Detours has no other choice than moving the jump into the trampoline. Indeed, after detouring, the code looks like this:

SomeFunction:
004117D0 E9 7B 03 00 00   jmp GenericHook2 (411B50h) 
004117D5 CC               int 3   ; Padding inserted by Detours
004117D6 61               popad            
004117D7 C3               ret              

And the trampoline like this:

SomeFunction:
003F0061 E8 3A 18 02 00   call WaitForSomething (4118A0h) 
003F0066 E9 6B 17 02 00   jmp  SomeFunction+6 (4117D6h) 

So far, so good — but now we undetour this function while — and this is the second important point — SomeFunction is on the call stack of the second thread (003f0066() is the trampoline):

ntdll.dll!_ZwWaitForSingleObject@12()  + 0x15 bytes	
kernel32.dll!_WaitForSingleObjectEx@12()  + 0x8f bytes	
kernel32.dll!_WaitForSingleObject@8()  + 0x12 bytes	
Detours.exe!WaitForSomething()  
003f0066()	
Detours.exe!ThreadProc(void * __formal=0x00000000) 
kernel32.dll!@BaseThreadInitThunk@12()  + 0xe bytes	
ntdll.dll!__RtlUserThreadStart@8()  + 0x23 bytes	

As described above, Detours suspends the second thread, unhooks the function, frees the trampoline and finally resumes our second thread again. After the event has been signalled, the second theread’s stack is finally unwound. However, when the trampoline’s stack frame is reached, the trampoline has long been freed and as it happens, the memory has been zeroed out. Still, execution is happily resumed (no DEP used) and the zeros are executed, which decode as add byte ptr [eax], al. Unurprisingly, this quickly ends in an access violation.

How should Detours have behaved instead? Obviously, merely checking that none of the threads is about to execute code of the trampoline is not enough. What should have been made to avoid the crash os to walk each registered thread’s callstack and check that none of the calls has been made from within a trampoline that is to be freed. Of course, this approach is pretty much undeasible, especially in the absence of proper symbols. So the best approach would probably have been to avoid detouring SomeFunction alltogether — Detours should have noticed that moving the call into the trampoline is dangerours and should thus have failed the attempt to detour SomeFunction.

Dangerous Detours, Part 3: Messing execution flow

Last time I showed that as a result of its implementation strategy, Detours is unable to properly instrument certain kinds of functions. Unlike last time, where this merely led to unexpected debugging output, we will see how this can easily lead to a crash.

Consider this code (error checking omitted):

#include <stdio.h>
#include <tchar.h>
#include <windows.h>
#include <detours.h>

static PDETOUR_TRAMPOLINE Trampoline;

__declspec(noinline)
static void DoSomething()
{
  wprintf( L"DoSomething\n" );
  Sleep( 100 );
}

//
// This is the function we want to instrument.
//
__declspec(noinline)
static void OriginalFunction( PCWSTR Arg1, LONG Arg2 )
{
  _asm xor eax, eax;
  for (;;)
  {
    DoSomething();
  }  
}

//
// The generic hook we will install using Detours.
//
__declspec(naked)
static void GenericHook()
{
  //
  // Do something -- not important what...
  //
  __asm pushad;
  wprintf( L"GenericHook\n" );
  __asm popad;
  
  //
  // Done with instrumentation code, continue with
  // the real function. The trampoline (created 
  // by Detours) contains the replaced instructions
  // and a jump to the body of the real function.
  //
  __asm jmp [Trampoline];
}

int wmain()
{
  ... Install hook -- same as last time ...
  
  //
  // Call the (now instrumented) function.
  //
  OriginalFunction( L"Hello", 42 );

  return 0;
}

The only difference to the code last time is the

_asm xor eax, eax;

inserted at the beginning of OriginalFunction. Note that any other instruction occupying less than 5 bytes should work just as well.

Try running this code and you will notice that the process immediately crashes with an Illegal Instruction exception. If you replace

_asm xor eax, eax;

by

_asm nop;

you get a Privileged instruction exception, which is no better. In any case, there is something going completely wrong here, so let us have a look at the assembly code again.

Unlike last time, where OriginalFunction consisted of a single basic block, we now have two blocks:

Flow Chart

The important thing to notice is that the first block is shorter than 5 bytes. This forces Detours to copy this block as well as the beginning of the following block to the trampoline in order to make place for the jump to the hook. After the hooking, the code looks like this:

OriginalFunction2:
00401031  jmp         GenericHook (40101Dh) 
00401036  int         3    		; Padding bytes inserted 
00401037  int         3    		; by Detours
00401038  jmp         OriginalFunction2+2 (401033h) 

Like last time, Detours fails to notice the short jump pointing back into the first 5 bytes. However, this time the consequences are worse — whereas last time the jump pointed to a jump instruction (the jump to the hook), this time the jump points to the third byte of the jump instruction! So after following the jump, the processor begins decoding at the wrong position in code and suddenly sees this:

OriginalFunction2:
00401031  db          e9h  
00401032  db          e7h  
00401033  db          ffh  ; <== EIP after following the jump
00401034  db          ffh  
00401035  dec         esp  
00401037  int         3    
00401038  jmp         OriginalFunction2+2 (401033h) 

Or, when using nop rather that xor:

OriginalFunction2:
00401031  db          e9h  
00401032  out         0FFh,eax ; <== EIP after following the jump
00401034  db          ffh  
00401035  dec         esp  
00401037  jmp         OriginalFunction2+1 (401032h) 

…which explains the exceptions.

This concludes the discussion of detouring functions with unusual prologs. However, there is another interesting catch regarding function un-hooking, which I will describe next time.

Dangerous Detours, Part 2: Unexpected Behaviour

Last time I described how to ‘replace’ a function by another using Detours. The important point was that first, we did not care about the original function any more once it has been hooked and second, we used a hook specially crafted for one particular to-be-hooked function.

This time we will make use of the trampoline in order to enable the hook function to call the original function. Furthermore, we do not want to create a separate hook function for every function we wish to hook but rather implement some kind of a generic hook — a hook function that is capable of hooking any possible function.

Example: Genneric hook

Let us say that we wish to instrument some application we do not have the source code for in order to implement tracing. What we want to achieve is basically the same as __penter does if we had the source code — on entry of each instrumented function, we want to spill out some debug output. The exact output does not matter for this example, so we will just output some sample string to the console.

Here is the code (error checking omitted):

#include <stdio.h>
#include <tchar.h>
#include <windows.h>
#include <detours.h>

static PDETOUR_TRAMPOLINE Trampoline;

__declspec(noinline)
static void DoSomething()
{
  wprintf( L"DoSomething\n" );
  Sleep( 100 );
}

//
// This is the function we want to instrument.
//
__declspec(noinline)
static void OriginalFunction( PCWSTR Arg1, LONG Arg2 )
{
  for (;;)
  {
    DoSomething();
  }  
}

//
// The generic hook we will install using Detours.
//
__declspec(naked)
static void GenericHook()
{
  //
  // Do something -- not important what...
  //
  __asm pushad;
  wprintf( L"GenericHook\n" );
  __asm popad;
  
  //
  // Done with instrumentation code, continue with
  // the real function. The trampoline (created 
  // by Detours) contains the replaced instructions
  // and a jump to the body of the real function.
  //
  __asm jmp [Trampoline];
}

int wmain()
{
  //
  // Install hook.
  //
  DetourTransactionBegin();
  DetourUpdateThread( GetCurrentThread() );
  
  PVOID Func = ( PVOID ) OriginalFunction;
  PVOID DetourPtr;
  PVOID TargetPtr;
  DetourAttachEx( 
      &Func, 
      GenericHook,
      &Trampoline,
      &TargetPtr, 
      &DetourPtr );

  DetourTransactionCommit();

  //
  // Call the (now instrumented) function.
  //
  OriginalFunction( L"Hello", 42 );

  return 0;
}

OriginalFunction calls DoSomething in an infinite loop and DoSomething sleeps a little — not very interesting code, but it will do the trick. GenericHook contains the instrumentation code and a jump to the trampoline in order to continue execution at the original function. It should be clear that in order to keep the stack balanced, we have to make GenericHook a naked function.

We then instruct Detours to hook OriginalFunction with GenericHook — but unlike last time, we use DetourAttachEx instead of DetourAttach and save the Trampoline pointer in a global variable so that we can use it from within GenericHook. In order to keep the compiler from inlining functions, I decorated them with __declspec(noinline). Also note that I compiled with /O1 (optimize for size) and without /RTCs.

Concluding from the code, the following output should be expected:

GenericHook
DoSomething
DoSomething
DoSomething
...

We call OriginalFunction once (which produces the output ‘GenericHook’) and then call DoSomething in a loop. Unfortunately, the output instead looks like this:

GenericHook
DoSomething
GenericHook
DoSomething
GenericHook
DoSomething
GenericHook
DoSomething
GenericHook
DoSomething
...

So what has happened? This is the assembler code of OriginalFunction before hooking:

OriginalFunction:
00401031 call    DoSomething (401008h)                   <----+
00401036 jmp     OriginalFunction (401031h)    short jump ----+

Note that the compiler optimized away the epilog and prolog and that the function thus consists of a single basic block only.

After the hooking, the code looks like this:

; The trampoline
003F0060 call    DoSomething (401008h)          <-----------+
003F0065 jmp     OriginalFunction+5 (401036h)   --------------+
                                                            | |
...                                                         | |
                                                            | |
GenericHook:                                                | |
0040101D pushad                                         <-+ | |
0040101E push    offset string L"GenericHook\n" (4042ACh) | | |
00401023 call    dword ptr [__imp__wprintf (4030D8h)]     | | |
00401029 pop     ecx                                      | | |
0040102A popad                                            | | |
0040102B jmp     dword ptr [__fmode+4 (405390h)]   -------|-+ |
                                                          |   |
...                                                       |   |
                                                          |   |
OriginalFunction:                                         |   |
00401031 jmp     GenericHook (40101Dh)       <-+     -----+   |  
00401036 jmp     OriginalFunction (401031h)  --+     <--------+

As expected, the first 5 bytes of OriginalFunction (the call to DoSomething) have been overwritten by a jump to GenericHook. At the end of the hook, execution jumps to the trampoline, which contains the initially overwritten call to DoSomething, followed by a jump to the remainder of OriginalFunction.

So where is the flaw? First note that whereas a function normally starts with a 3 byte-prolog, the compiler noticed that OriginalFunction does not make use of the stack and thus omitted both prolog and epilog. As a consequence, our for-loop starts right at the top of the function. Finally, as the body of the for-loop starts with a call (i.e. 5 bytes, just what Detours is looking for) Detours split the loop body — the call moved into the trampoline, the short jump stayed where it was. What that means is that the short jump suddenly points to the GenericHook-jump rather than to the DoSomething-call.

So as it turns out, the compiler has created code for OriginalFunction that Detours is unable to digest properly. Indeed by blindly replacing the first couple of instructions in the code, Detours has implicitly assumed that the first basic blocks (to which these instructions belong) are never re-entered by a jump. As almost all functions begin with the ordinary push ebp/mov ebp, esp/sub esp, xxx prolog, this assumptions holds for the vast majority of functions. OriginalFunction however, consisting of only a single basic block violated this assumption and as a result, the hook is re-entered on every loop iteration which leads to the unexpected output.

So can we blindly trust Detours to do the right thing? Obviously not. However to be fair, calling the hook function too many times is annoying and a waste but is does not harm our program. Unfortunately, as we will see next time, it can get worse…

Dangerous Detours, Part 1: Introduction

Detours is a library that allows you to hook arbitrary functions by rewriting machine code. While a description of the exact implementation approach can be found in the corresponding paper as well as in numerous other sources, the basic idea is as follows:

  • In the to-be-hooked function, disassemble the first instructions until you have read at least 5 bytes. As instructions are variable length on x86, we may end up having to read more than 5 bytes to reach the next instruction boundary. Let the number of bytes read be n.
  • Allocate n+5 bytes of memory which will make up the trampoline.
  • Copy the n bytes from the to-be-hooked function to the trampoline, followed by a near jump to the to-be-hooked function + offset n (i.e. the first instruction after the instructions we copied)
  • Now overwrite the first 5 bytes of the to-be-hooked function with a near jump to the hook-function
  • The hook function may either return, so that the original function is never executed or instead jump to the trampoline

If we want to replace the to-be-hooked function, the execution flow is as follows:

  • Entering to-be-hooked function
  • Jump to hook function
  • Hook function does its work
  • Return to caller

The body of the to-be-hooked function is never executed.


        Caller function
       /          ^
      /           |
     v            |
  (via jmp        |
in to-be-hooked   |
  function)       |
     |            |
     v            | return
    Hook  --------+
  function
     

If instead the hook function, after having done its work, wants the original function to execute, the execution flow is as follows:

  • Entering to-be-hooked function
  • Jump to hook function
  • Hook function does its work
  • Jump to trampoline
  • Execute n bytes in trampoline
  • Jump to original function + offset n
  • Original function runs and finally returns

        Caller function
       /                ^
      /                  \
     v                    \
  (via jmp                 \
in to-be-hooked       To-be-hooked 
  function)             function
     |                      ^
     v                      |
    Hook  -------------> Trampoline
  function
     

Example: Replacing a function

As an example, we ‘replace’ OriginalFunction by AlternateFunction. The source code is as follows (error checking omitted):

#include <stdio.h>
#include <tchar.h>
#include <windows.h>
#include <detours.h>

__declspec(noinline)
static void OriginalFunction( PCWSTR Arg1, LONG Arg2 )
{
  wprintf( L"OriginalFunction(%s, %d)\n", Arg1, Arg2 );
}

__declspec(noinline)
static void AlternateFunction( PCWSTR Arg1, LONG Arg2 )
{
  wprintf( L"AlternateFunction(%s, %d)\n", Arg1, Arg2 );
}

int wmain()
{
  //
  // Install hook.
  //
  DetourTransactionBegin();
  DetourUpdateThread( GetCurrentThread() );
  
  PVOID Func = ( PVOID ) OriginalFunction;
  DetourAttach( 
      &Func, 
      AlternateFunction );

  DetourTransactionCommit();
  
  //
  // Call (hooked) function.
  //
  OriginalFunction( L"Hello", 42 );

  return 0;
}

The code is straightforward — we instruct Detours to hook OriginalFunction and call AlternateFunction instead. We do not make use of the trampoline. The output is:

AlternateFunction(Hello, 42)

Up to this point, everything works as advertised. And indeed there is little that can go wrong if we just want to ‘replace’ a function. If, however, we want the hook function eventually call the original function by making use of the trampoline, it gets more interesting, as we will see in Part 2.


Categories




About me

Johannes Passing, M.Sc., living in Berlin, Germany.

Besides his consulting work, Johannes mainly focusses on Win32, COM, and NT kernel mode development, along with Java and .Net. He also is the author of cfix, a C/C++ unit testing framework for Win32 and NT kernel mode, Visual Assert, a Visual Studio Unit Testing-AddIn, and NTrace, a dynamic function boundary tracing toolkit for Windows NT/x86 kernel/user mode code.

Contact Johannes: jpassing (at) acm org

Johannes' GPG fingerprint is BBB1 1769 B82D CD07 D90A 57E8 9FE1 D441 F7A0 1BB1.

LinkedIn Profile
Xing Profile
Github Profile