Trace and Watch Data — How does it work

One of the builtin WinDBG commands is wt (Trace and Watch Data), which can be used to trace the execution flow of a function. Given source code like the following:

void foo()

void bar()

int main()
  // Some random code...
  int a = 1, b = 2;
  // Call a child function.
  // More useless code...
  if ( a == b) a = b;
  // Call another child function.
  return 0;

wt will produce the following output:

0:000> wt
Tracing test!main to return address 00401291
    6     0 [  0] test!main
    1     0 [  1]   test!ILT+5(_foo)
    4     0 [  1]   test!foo
   13     5 [  0] test!main
    1     0 [  1]   test!ILT+0(_bar)
    4     0 [  1]   test!bar
   17    10 [  0] test!main

27 instructions were executed in 26 events 
                                  (0 from other threads)

Function Name         Invocations MinInst MaxInst AvgInst
test!ILT+0(_bar)                1       1       1       1
test!ILT+5(_foo)                1       1       1       1
test!bar                        1       4       4       4
test!foo                        1       4       4       4
test!main                       1      17      17      17

0 system calls were executed

Although helpful, tracing a larger function calling a multitude of other functions slows down the debuggee significantly. An interesting question is thus how wt is implemented. Three possible implementation strategies come to mind:

  1. Use single-stepping. After each instruction executed, a debug trap is raised and the debugger is delivered a single-step debugging event. Though all non-branching instructions are probably irrelevant to wt, by intercepting each call and ret instruction, the debugger is able to trace function entry and exit.
  2. Explicitly set breakpoints. The debugger disassembles the function to be traced and places an ordinary breakpoint on each call instruction as well on as the return address of the function. Whenever one of the call-breakpoints fires, the debugger instruments the target function in the same way (i.e. place breakpoints on each call instruction as well as the return address) and continues execution (without single-stepping). By intercepting all function calls and returns, the debugger is able to deduce the call tree. This approach would be similar to UMSS.
  3. Use Last Branch Recording. This is a rather new additon to the IA-32 instruction set that allows setting breakpoints on taken branches, interrupts, and exceptions, and to single-step from one branch to the next.

In order to find out, we have to debug the debugger to observe how it debugs the target. We thus start WinDBG, choose our test application as target and let it break on main. We then start another WinDBG instance and attach it to the first WinDBG instance. In order to find out which debugging events are consumed by the first instance, we use the second debugger to trace function calls made by the first debugger.

All usermode debuggers eventually end up calling ntdll!NtWaitForDebugEvent in a loop — so to find out which debugging events are consumed, all we need to do is trace all calls to this function. While being an undocumented native function, there is an excellent summary on the inner workings of user mode debugging which also covers ntdll!NtWaitForDebugEvent. Given this information, all we need to do to check whether strategy #1 or strategy #2 has been implemented (I assume #3 may safely be neglected) is to put together a little breakpoint command like the following (line breaks added for clarity):

bp ntdll!NtWaitForDebugEvent "
   r @$t1=poi(esp+10); 
   g @$ra; 
   .if (poi(@$t1)==8) {.echo \"SingleStep\n\" } 
   .else {.printf \"Excp %p\\n\", poi(@$t1+c)};
   g "

When entering ntdll!NtWaitForDebugEvent, we store the address of the fourth parameter (which receives a PDBGUI_WAIT_STATE_CHANGE structure) in $t1 and step out of the function. Then we reach into the structure whose address is stored in $t1 and check if the first field marks the event of being of type DbgSingleStepStateChange (0x8) and output an appropriate message. If we receive about 30 single-step events, strategy #1 has probably been chosen. For #2 we would expect to receive 5 breakpoint events.

Back to the first debugger, we now opt to trace the main function by running wt. This yields the output shown above. Switching to the second debugger again, we now see the following output:


[...about 20 more...]


Quite obviously, wt implements strategy #1 — it does single stepping. Although this does not really come as a surprise, it is still unfortunate as it is most likely the slowest approach of tracing calls. And as anybody who has ever used wt can probably confirm, wt is really slow.

As an interesting side note, as of Linux kernel 2.6.25, ptrace on x86 has been enhanced to facilitate Last Branch Recording on CPUs that support it.


1 Response to “Trace and Watch Data — How does it work”

  1. 1 windbg noob December 31, 2011 at 8:57 am

    Hey thanks for the informative post :) I just stumbled onto your blog while searching for something single-stepping related: how to modify the event filters such that single-stepping is ignored/disabled. If you have a clue I’d like to hear!

Comments are currently closed.


About me

Johannes Passing, M.Sc., living in Berlin, Germany.

Besides his consulting work, Johannes mainly focusses on Win32, COM, and NT kernel mode development, along with Java and .Net. He also is the author of cfix, a C/C++ unit testing framework for Win32 and NT kernel mode, Visual Assert, a Visual Studio Unit Testing-AddIn, and NTrace, a dynamic function boundary tracing toolkit for Windows NT/x86 kernel/user mode code.

Contact Johannes: jpassing (at) acm org

Johannes' GPG fingerprint is BBB1 1769 B82D CD07 D90A 57E8 9FE1 D441 F7A0 1BB1.

LinkedIn Profile
Xing Profile
Github Profile

%d bloggers like this: