Windows Dangerous Detours, Part 2: Unexpected Behaviour

Last time I described how to ‘replace’ a function by another using Detours. The important point was that first, we did not care about the original function any more once it has been hooked and second, we used a hook specially crafted for one particular to-be-hooked function.

This time we will make use of the trampoline in order to enable the hook function to call the original function. Furthermore, we do not want to create a separate hook function for every function we wish to hook but rather implement some kind of a generic hook – a hook function that is capable of hooking any possible function.

Example: Genneric hook

Let us say that we wish to instrument some application we do not have the source code for in order to implement tracing. What we want to achieve is basically the same as __penter does if we had the source code – on entry of each instrumented function, we want to spill out some debug output. The exact output does not matter for this example, so we will just output some sample string to the console.

Here is the code (error checking omitted):

    #include <stdio.h>
    #include <tchar.h>
    #include <windows.h>
    #include <detours.h>
    
    static PDETOUR_TRAMPOLINE Trampoline;
    
    __declspec(noinline)
    static void DoSomething()
    {
      wprintf( L"DoSomething\n" );
      Sleep( 100 );
    }
    
    //
    // This is the function we want to instrument.
    //
    __declspec(noinline)
    static void OriginalFunction( PCWSTR Arg1, LONG Arg2 )
    {
      for (;;)
      {
        DoSomething();
      }  
    }
    
    //
    // The generic hook we will install using Detours.
    //
    __declspec(naked)
    static void GenericHook()
    {
      //
      // Do something -- not important what...
      //
      __asm pushad;
      wprintf( L"GenericHook\n" );
      __asm popad;
      
      //
      // Done with instrumentation code, continue with
      // the real function. The trampoline (created 
      // by Detours) contains the replaced instructions
      // and a jump to the body of the real function.
      //
      __asm jmp [Trampoline];
    }
    
    int wmain()
    {
      //
      // Install hook.
      //
      DetourTransactionBegin();
      DetourUpdateThread( GetCurrentThread() );
      
      PVOID Func = ( PVOID ) OriginalFunction;
      PVOID DetourPtr;
      PVOID TargetPtr;
      DetourAttachEx( 
          &Func, 
          GenericHook,
          &Trampoline,
          &TargetPtr, 
          &DetourPtr );
    
      DetourTransactionCommit();
    
      //
      // Call the (now instrumented) function.
      //
      OriginalFunction( L"Hello", 42 );
    
      return 0;
    }
    

OriginalFunction calls DoSomething in an infinite loop and DoSomething sleeps a little – not very interesting code, but it will do the trick. GenericHook contains the instrumentation code and a jump to the trampoline in order to continue execution at the original function. It should be clear that in order to keep the stack balanced, we have to make GenericHook a naked function.

We then instruct Detours to hook OriginalFunction with GenericHook – but unlike last time, we use DetourAttachEx instead of DetourAttach and save the Trampoline pointer in a global variable so that we can use it from within GenericHook. In order to keep the compiler from inlining functions, I decorated them with __declspec(noinline). Also note that I compiled with /O1 (optimize for size) and without /RTCs.

Concluding from the code, the following output should be expected:

    GenericHook
    DoSomething
    DoSomething
    DoSomething
    ...
    

We call OriginalFunction once (which produces the output ‘GenericHook’) and then call DoSomething in a loop. Unfortunately, the output instead looks like this:

    GenericHook
    DoSomething
    GenericHook
    DoSomething
    GenericHook
    DoSomething
    GenericHook
    DoSomething
    GenericHook
    DoSomething
    ...
    

So what has happened? This is the assembler code of OriginalFunction before hooking:

    OriginalFunction:
    00401031 call    DoSomething (401008h)                   <----+
    00401036 jmp     OriginalFunction (401031h)    short jump ----+
    

Note that the compiler optimized away the epilog and prolog and that the function thus consists of a single basic block only.

After the hooking, the code looks like this:

    ; The trampoline
    003F0060 call    DoSomething (401008h)          <-----------+
    003F0065 jmp     OriginalFunction+5 (401036h)   --------------+
                                                                | |
    ...                                                         | |
                                                                | |
    GenericHook:                                                | |
    0040101D pushad                                         <-+ | |
    0040101E push    offset string L"GenericHook\n" (4042ACh) | | |
    00401023 call    dword ptr [__imp__wprintf (4030D8h)]     | | |
    00401029 pop     ecx                                      | | |
    0040102A popad                                            | | |
    0040102B jmp     dword ptr [__fmode+4 (405390h)]   -------|-+ |
                                                              |   |
    ...                                                       |   |
                                                              |   |
    OriginalFunction:                                         |   |
    00401031 jmp     GenericHook (40101Dh)       <-+     -----+   |  
    00401036 jmp     OriginalFunction (401031h)  --+     <--------+
    

As expected, the first 5 bytes of OriginalFunction (the call to DoSomething) have been overwritten by a jump to GenericHook. At the end of the hook, execution jumps to the trampoline, which contains the initially overwritten call to DoSomething, followed by a jump to the remainder of OriginalFunction.

So where is the flaw? First note that whereas a function normally starts with a 3 byte-prolog, the compiler noticed that OriginalFunction does not make use of the stack and thus omitted both prolog and epilog. As a consequence, our for-loop starts right at the top of the function. Finally, as the body of the for-loop starts with a call (i.e. 5 bytes, just what Detours is looking for) Detours split the loop body – the call moved into the trampoline, the short jump stayed where it was. What that means is that the short jump suddenly points to the GenericHook-jump rather than to the DoSomething-call.

So as it turns out, the compiler has created code for OriginalFunction that Detours is unable to digest properly. Indeed by blindly replacing the first couple of instructions in the code, Detours has implicitly assumed that the first basic blocks (to which these instructions belong) are never re-entered by a jump. As almost all functions begin with the ordinary push ebp/mov ebp, esp/sub esp, xxx prolog, this assumptions holds for the vast majority of functions. OriginalFunction however, consisting of only a single basic block violated this assumption and as a result, the hook is re-entered on every loop iteration which leads to the unexpected output.

So can we blindly trust Detours to do the right thing? Obviously not. However to be fair, calling the hook function too many times is annoying and a waste but is does not harm our program. Unfortunately, as we will see next time, it can get worse…

Any opinions expressed on this blog are Johannes' own. Refer to the respective vendor’s product documentation for authoritative information.
« Back to home