Dangerous Detours, Part 4: Undetouring

Having discussed what can go wrong when detouring a function, we will now take a closer look at undetouring. Again, there is a problem — in my opinion an even more severe than the ones discussed previously — that has not been addressed by the Detours library.

Undetouring is a multi-step process and requires the user to follow a certain protocol. The basic idea is as follows: The caller creates a transaction, registers all threads that might have been affected by the detour and specifies which functions to unhook. As soon as a thread is registered, it is suspended until transaction commit. When the user decides to commit the transaction by calling DetourTransactionCommit, Detours unhooks the functions and checks if any of the registered (now suspended) threads was about to execute code of one of the trampolines that are to be freed. If such a thread is found, its context is updated so that the instruction pointer points to the original (now restored) function code again. The trampolines are then freed and all registered threads are resumed.

Chances are good that the process now runs flawlessly again. There is however, a situation Detours neglects to take into account. Let us consider the following code, which illustrates the problem.

static PDETOUR_TRAMPOLINE Trampoline;
static HANDLE WaitHandle;

__declspec(naked)
static void GenericHook2()
{
  __asm pushad;
  wprintf( L"GenericHook\n" );
  __asm popad;
  __asm jmp [Trampoline];
}

__declspec(noinline)
static void WaitForSomething()
{
  WaitForSingleObject( WaitHandle, INFINITE );

  wprintf( L"Wait satisfied\n" );
}

__declspec(noinline)
__declspec(naked)
static void SomeFunction()
{
  _asm pushad;  // any instruction < 5 bytes
  WaitForSomething();
  _asm popad;
  _asm ret;
}

DWORD CALLBACK ThreadProc( PVOID )
{
  SomeFunction();
  wprintf( L"SomeFunction returned\n" );
  return 0;
}

int UnloadDetour()
{
  //
  // Detour
  //
  if ( ERROR_SUCCESS != DetourTransactionBegin() )
  {
    wprintf( L"DetourTransactionBegin failed" );
    return 1;
  }
  
  if ( ERROR_SUCCESS != DetourUpdateThread( GetCurrentThread() ) )
  {
    wprintf( L"DetourUpdateThreadfailed" );
    return 1;
  }
  
  PVOID Func = ( PVOID ) SomeFunction;
  PVOID DetourPtr;
  PVOID TargetPtr;
  if ( ERROR_SUCCESS != DetourAttachEx( 
      &Func, 
      GenericHook2,
      &Trampoline,
      &TargetPtr, 
      &DetourPtr ) )
  {
    wprintf( L"DetourAttachExfailed" );
    return 1;
  }

  if ( ERROR_SUCCESS != DetourTransactionCommit() )
  {
    wprintf( L"DetourTransactionCommitfailed" );
    return 1;
  }

  //
  // Kick off thread
  //
  WaitHandle = CreateEvent( NULL, FALSE, FALSE, NULL );

  HANDLE Thread = CreateThread( NULL, 0, ThreadProc, NULL, 0, NULL ); 
  if ( ! Thread )
  {
    wprintf( L"CreateThread failed" );
    return 1;
  }

  wprintf( L"Trampoline at %p\n", Trampoline );
  Sleep( 1000 );

  //
  // Remove detour
  //
  if ( ERROR_SUCCESS != DetourTransactionBegin() )
  {
    wprintf( L"DetourTransactionBegin failed" );
    return 1;
  }
  
  if ( ERROR_SUCCESS != DetourUpdateThread( Thread ) )
  {
    wprintf( L"DetourUpdateThread failed" );
    return 1;
  }
  
  if ( ERROR_SUCCESS != DetourUpdateThread( GetCurrentThread() ) )
  {
    wprintf( L"DetourUpdateThread failed" );
    return 1;
  }
  
  if ( ERROR_SUCCESS != DetourDetach( 
      &Func, 
      GenericHook2 ) )
  {
    wprintf( L"DetourDetach failed" );
    return 1;
  }

  if ( ERROR_SUCCESS != DetourTransactionCommit() )
  {
    wprintf( L"DetourTransactionCommit failed" );
    return 1;
  }
  
  //
  // unwait thread
  // 
  SetEvent( WaitHandle );
  WaitForSingleObject( Thread, INFINITE );
  CloseHandle( WaitHandle );
  CloseHandle( Thread );

  return 0;
}

To summarize, the code does the following:

  • Detour SomeFunction.
  • Spawn a new thread that calls SomeFunction. SomeFunction in turn calls WaitForSomething and thus waits on an event that is not signalled.
  • The main thread decides to undetour SomeFunction…
  • …and signals the event the other thread is waiting for.
  • The second thread continues execution.

Running the code immediately results in an access violation.

So let us see what has happened. Similar to the previous examples, the root cause of the crash lies in the binary layout of SomeFunction. The key point of the function is that the call instruction is not the first instruction but is also no more than 4 bytes off the beginning of the function. With sufficient bad luck, I am convinced that the optimizer can indeed come up with code that exactly satisfies these two requirements. In order to simulate this behaviour and save some time, I cheated a bit and made SomeFunction naked to suppress the prolog and inserted a bogus instruction, _asm pushad (any instruction with less than 4 bytes will do the trick). The code now looks like this:

SomeFunction:
004117D0 90               nop              
004117D1 E8 CA 00 00 00   call WaitForSomething (4118A0h) 
004117D6 61               popad            
004117D7 C3               ret              

Now the function is hooked and if one has been following so far, it should be clear that Detours has no other choice than moving the jump into the trampoline. Indeed, after detouring, the code looks like this:

SomeFunction:
004117D0 E9 7B 03 00 00   jmp GenericHook2 (411B50h) 
004117D5 CC               int 3   ; Padding inserted by Detours
004117D6 61               popad            
004117D7 C3               ret              

And the trampoline like this:

SomeFunction:
003F0061 E8 3A 18 02 00   call WaitForSomething (4118A0h) 
003F0066 E9 6B 17 02 00   jmp  SomeFunction+6 (4117D6h) 

So far, so good — but now we undetour this function while — and this is the second important point — SomeFunction is on the call stack of the second thread (003f0066() is the trampoline):

ntdll.dll!_ZwWaitForSingleObject@12()  + 0x15 bytes	
kernel32.dll!_WaitForSingleObjectEx@12()  + 0x8f bytes	
kernel32.dll!_WaitForSingleObject@8()  + 0x12 bytes	
Detours.exe!WaitForSomething()  
003f0066()	
Detours.exe!ThreadProc(void * __formal=0x00000000) 
kernel32.dll!@BaseThreadInitThunk@12()  + 0xe bytes	
ntdll.dll!__RtlUserThreadStart@8()  + 0x23 bytes	

As described above, Detours suspends the second thread, unhooks the function, frees the trampoline and finally resumes our second thread again. After the event has been signalled, the second theread’s stack is finally unwound. However, when the trampoline’s stack frame is reached, the trampoline has long been freed and as it happens, the memory has been zeroed out. Still, execution is happily resumed (no DEP used) and the zeros are executed, which decode as add byte ptr [eax], al. Unurprisingly, this quickly ends in an access violation.

How should Detours have behaved instead? Obviously, merely checking that none of the threads is about to execute code of the trampoline is not enough. What should have been made to avoid the crash os to walk each registered thread’s callstack and check that none of the calls has been made from within a trampoline that is to be freed. Of course, this approach is pretty much undeasible, especially in the absence of proper symbols. So the best approach would probably have been to avoid detouring SomeFunction alltogether — Detours should have noticed that moving the call into the trampoline is dangerours and should thus have failed the attempt to detour SomeFunction.

Advertisements

1 Response to “Dangerous Detours, Part 4: Undetouring”


  1. 1 Green Trampoline Plea May 2, 2010 at 12:02 pm

    A little off the subject maybe, but a plea for you to think about the ethics of purchasing cheap trampolines. Do try and consider, for example, the materials the item is manufactured from, the human rights of the employees where they’re made and the ethics of the retailer. And endeavour to recycle your trampoline instead of throwing away. Thanks!!!!


Comments are currently closed.



Categories




About me

Johannes Passing, M.Sc., living in Berlin, Germany.

Besides his consulting work, Johannes mainly focusses on Win32, COM, and NT kernel mode development, along with Java and .Net. He also is the author of cfix, a C/C++ unit testing framework for Win32 and NT kernel mode, Visual Assert, a Visual Studio Unit Testing-AddIn, and NTrace, a dynamic function boundary tracing toolkit for Windows NT/x86 kernel/user mode code.

Contact Johannes: jpassing (at) acm org

Johannes' GPG fingerprint is BBB1 1769 B82D CD07 D90A 57E8 9FE1 D441 F7A0 1BB1.

LinkedIn Profile
Xing Profile
Github Profile

%d bloggers like this: