Archive for the 'Debugging' Category



Explicit Symbol Loading with dbghelp

When working with symbols, the default case is that you either analyze the current process, a concurrently running process or maybe even the kernel. dbghelp provides special support for these use cases and getting the right symbols to load is usually easy — having access to the process being analyzed, dbghelp can obtain the necessary module information by itself and will come up with the matching symbols.

Things are not quite as easy when analyzing symbols for a process (or kernel) that is not running any more or executes on a different machine. In this case, manual intervention is necessary — without your help, dbghelp will not be able to decide which PDB to load.

An example for a case where this applies is writing a binary tracefile. To keep the trace file small and to make writing entries as efficient as possible, it makes sense to write raw instruction pointer and function pointer addresses to the file. This way, resolving symbols and finding the names of the affected routines is delayed until the file is actually read.

However, when the trace file is read, the traced process (or kernel) may not be alive any more, so dbghelp’s automatisms cannot be used. Moreover, the version of the OS and the individual modules involved may not be the same on the machine where the trace is obtained and the machine where the trace file is read. It is thus crucial that the trace file contains all neccessary information required for the reading application to find and load the matching symbols.

Capturing the debug information

That sounds easy enough. But once again, the documentation on this topic is flaky enough to make this a non-trivial task. This thread on the OSR WinDBG list revealed some important insights, yet the summary is not entirely correct. So I will try to explain it here. Note that the discussion equally applies to user and kernel mode.

Two basic pieces of information need to be captured and stored in the trace file: The load address and the debug data of each module involved. The load address is required in order to later be able to convert from virtual addresses (VAs) to relative virtual addresses (RVAs). The debug data is neccessary for finding the PDB exactly matching the respective module.

The debug data is stored in the PE image and can be obtained via the debug directory. The following diagram illustrates the basic layout of the affected parts of a PE image:

Via the _IMAGE_DATA_DIRECTORY structure, which is at index IMAGE_DIRECTORY_ENTRY_DEBUG of the DataDirectory, we can obtain the RVA of the debug directory — as always, RVAs are relative to the load address. The debug directory is a sequence of IMAGE_DEBUG_DIRECTORY structures. Although the CodeView entry should be the one of primary interest, expect more than one entry in this sequence. The entry count can be calculated by dividing _IMAGE_DATA_DIRECTORY::Size by sizeof( IMAGE_DEBUG_DIRECTORY ).

The IMAGE_DEBUG_DIRECTORY structures do not contain all necessary information by themselves — rather, they refer to a blob of debug data located elsewhere in the image. IMAGE_DEBUG_DIRECTORY::AddressOfRawData contains the RVA (again, relative to the load address) to this blob. Note that the blobs are not necessarily laid out in the same order as the corresponding IMAGE_DEBUG_DIRECTORY structures are. (The role of the PointerToRawData member in this context remains unclear.)

Summing up, we need to write to the trace file:

  • The module load address
  • all IMAGE_DEBUG_DIRECTORY entries
  • all debug data blobs referred to

In order to maintain the relation between the IMAGE_DEBUG_DIRECTORY entries and their corresponding blobs, the AddressOfRawData and PointerToRawData members have to be updated before being written to the file. In order to make loading symbols a bit easier, it makes sense to save the entire sequence of IMAGE_DEBUG_DIRECTORY structs (with updated RVAs) contigously to the file, immediately followed by the debug data blobs.

Loading symbols

Having all neccessary information in the trace file, we can now load the symbols. The key is to use SymLoadModuleEx rather than SymLoadModule64 and passing it a MODLOAD_DATA struct:

  • MODLOAD_DATA::data receives a pointer to the sequence of IMAGE_DEBUG_DIRECTORY structs. For each struct, the AddressOfRawData member should be set to 0 and PointerToRawData should receive the RVA of the corresponding debug data blob relative to the beginning of the respective IMAGE_DEBUG_DIRECTORY struct.

    As indicated before, it therefore makes sense to write the IMAGE_DEBUG_DIRECTORY structs and the blobs consecutievly to the file and — at the time of writing — adjust the AddressOfRawData and PointerToRawData members. This way, you can just assign the entire memory chunk to MODLOAD_DATA::data.

  • MODLOAD_DATA::size receives the size of the entire memory chunk, including both the sequence of IMAGE_DEBUG_DIRECTORY structs and all blobs.

There you go, dbghelp should now be able to identify the proper PDB and load the symbols.

Advertisements

Fun with low level SEH

Most code that uses Structured Exception Handling does this with the help of the compiler, e.g. by using __try/__except/__finally. Still, it is possible to do everything by hand, i.e. to provide your own exception handlers and set up the exception registration records manually. However, as this entire topic is not documented very well, doing so opens room for all kind of surprises…

Although more than 10 years old, the best article on this topic still seems to be Matt Pirtrek’s A Crash Course on the Depths of Win32™ Structured Exception Handling, which I assume you have read. However, note that this article as well as this post refer to i386 only, albeit both to user and kernel mode.

Exception Registration Record Validation

On the i386, SEH uses a linked list of exception registration records. The first record is pointed to by the first member of the TIB. In user mode, the TIB is part of the TEB, in kernel mode it is part of the KPCR — in any case, it is at fs:[0]. Each record, besides containing a pointer to the next lower record, stores a pointer to an exception handler routine.

Installing an exception registration record is thus straightforward and merely requires adjusting the TIB pointer and having the new record point to next lower record. So I set up my custom exception registration record, registered it properly, verified that all pointers are correct and tried using it. However, I was unpleasently surprised that exeption handling totally failed as soon as my exception registration got involved. !exchain reported an “Invalid exception stack”, although checking the pointers manually again seemed to show that the chain of exception registration records was fine and my record seemed ok.

Digging a little deeper I found the reason for that — and in fact I cannot remember ever having heard or read about this requirement before: Windows requires all EXCEPTION_REGISTRATION_RECORDs to be located on the stack. Both RtlDispatchException and RtlUnwind check the location of each EXCEPTION_REGISTRATION_RECORD against the stack limits and abort exception handling as soon as a record is found to be not stack-located. Aborting exception handling in this case means that RaiseException/ExRaiseStatus will just return and execution will be resumed at the caller site as if nothing happened.

This requirement is fair enough, actually, but in my case it totally wrecked my design. I did not have the 8 spare bytes to store this record and thus therefore put the record on some dedicated place on the heap. Urgh. Anyway…

As an interesting side note, Windows Server 2003 performs this stack check against both limits — minimum and maximum address of the stack. Vista, however, only checks against the maximum address (i.e. bottom of the stack) and does not care whether the minimum address (i.e. top of stack) has been exceeded.

Moreover, there is another restriction on exception records that only applies to user mode: The handler routine pointed to by the exception record is verified to not point into the stack. This is obviously another security measure to avoid SEH records to point into some overflown buffers.

SafeSEH

It is worth pointing out that all these checks are unrelated to SafeSEH and are performed regardless of whether your module is SafeSEH compatible or not. Not before these checks have all passed, the exception handler has to undergo the SafeSEH validation: The image base is calculated, the table listing the trusted SEH handlers is looked up and it is checked whether the handler routine pointed to by the current exception record is located in this table.

SafeSEH Handler Registration

Using SafeSEH is a good thing and I link all my modules with /SafeSEH. So when you use low level SEH, i.e. without using the __try/__except compiler support, the obvious question is how to get your SEH handler to be recognized as a trusted handler and be included in the SafeSEH table. After all, the compiler will not be able to recognize that the routine you have just written will in fact be used as an exception handler. The C compiler does not seem to offer support for that — luckily however, ml does by providing the .SAFESEH directive.

If you like writing your exception handler in assmbler, this is all you need. If, however, you prefer C, this is somewhat unsatisfying. The documentation of .SAFESEH states that it can be used with an extrn proc, but that does not seem to work. My solution was thus to write the actual routine in C and write a little thunk in assembler, which I was then able to register using the .SAFESEH directive:

.586               
.model flat, stdcall
option casemap :none

extrn RealExceptionHandlerWrittenInC@16

...

ExceptionHandlerThunk proto
.SAFESEH ExceptionHandlerThunk

...

.code

ExceptionHandlerThunk proc
	jmp RealExceptionHandlerWrittenInC@16
ExceptionHandlerThunk endp

Stupid things you should not do

Finally, there is another little quirk that bit me: Do not use EXCEPTION_CONTINUE_SEARCH where ExceptionContinueSearch would have been appropriate. The EXCEPTION_* constants are for use by exception filters as used for __except statements, whereas the Exception* values have to be used for low level exception handlers. Should be obvious, right? :)

Having chosen the wrong group of constants, I returned EXCEPTION_CONTINUE_SEARCH from my exception handler to indicate that the handler is unable to handle certain exceptions. However, as it turns out, EXCEPTION_CONTINUE_SEARCH has the value 0 and is thus interpreted as ExceptionContinueExecution. Now, returning ExceptionContinueExecution when being requested to handle an exception raised by ExRaiseStatus is obviously a bad idea and in this case led to a STATUS_NONCONTINUABLE_EXCEPTION. After a few of those had stacked up (in kernel mode), VirtualPC crashed with an unrecoverable CPU error. Nice :)

The case of the mysterious JVM x64 crashes

To date, I did all my Java development uding 32 bit JVMs. After all, as long as you do not have extreme memory requirements, the 64 bit JVM should not buy you much. Today I installed the Java 6 Update 6 x64 JDK on my Vista x64 machine and tried to run some of my JUnit tests on this VM.

With little success:

#
# An unexpected error has been detected by Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000772b219b, pid=4188, tid=2272
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode windows-amd64)
# Problematic frame:
# C  [ntdll.dll+0x5219b]
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

The project uses a native library via JNI, so of course I immediately suspected this to be the problem. So I placed a Java breakpoint on the respective System.loadLibrary call with the intent of attaching WinDBG as soon as this breakpoint is hit. In WinDBG, I could then break on the exception and see what the problem is.

But to my surprise, the Java breakpoint was not hit — the VM crashed immediately and I received the same output about an unexpected error having occured. That seemed strange to me — maybe it was not the fault of the JNI library after all? So I created a simple Hello World application and ran it — that worked. Then I created this innocent JUnit test:

public class JTest
{
  @org.junit.Test
  public void testname() throws Exception
  {    
  }
}

This one failed again, yielding the same error message as above. Well, at least that gave me evidence that not my JNI library but the JVM was the culprit of the crash — but still, the situation seemed weird. Running the test again under WinDBG, I could see that the AV occured during a heap free operation (As a side node, it is annoying that Sun does not supply symbols for its binaries):

00000000`0404ca88 00000000`7727e7e2 ntdll!RtlCaptureContext+0x8c
00000000`0404ca98 00000000`7727e72b ntdll!RtlpWalkFrameChain+0x52
00000000`0404d018 00000000`773352f2 ntdll!RtlCaptureStackBackTrace+0x4b
00000000`0404d048 00000000`772e1d35 ntdll!RtlpStackTraceDatabaseLogPrefix+0x42
00000000`0404d178 00000000`7715d9fa ntdll! ?? ::FNODOBFM::`string'+0xa93f
00000000`0404d1f8 000007fe`fef0175c kernel32!HeapFree+0xa
*** WARNING: Unable to verify checksum for C:\Program Files\Java\jdk1.6.0_06\jre\bin\server\jvm.dll
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Program Files\Java\jdk1.6.0_06\jre\bin\server\jvm.dll - 
00000000`0404d228 00000000`08101c09 msvcrt!free+0x1c
00000000`0404d258 00000000`081026cc jvm!JVM_EnqueueOperation+0x8c139
00000000`0404d288 00000000`040b4937 jvm!JVM_EnqueueOperation+0x8cbfc
00000000`0404d318 00000000`0404d338 0x40b4937

Well, could be a heap corruption — but it is interesting that crash did not occur during block coalescence or similar operations but during stack trace capturing. As a matter of fact, I always run my machine with user mode stack trace database creation enabled for debugging purposes. So I disabled the stack trace database in gflags, rebooted the machine and — voilà, the crash disappeared!

Wow. I think this is worth being filed as a bug.

Trace and Watch Data — How does it work

One of the builtin WinDBG commands is wt (Trace and Watch Data), which can be used to trace the execution flow of a function. Given source code like the following:

void foo()
{
}

void bar()
{
}

int main()
{
  // Some random code...
  int a = 1, b = 2;
  
  // Call a child function.
  foo();
  
  // More useless code...
  a+=b;
  if ( a == b) a = b;
  
  // Call another child function.
  bar();  
  
  return 0;
}

wt will produce the following output:

0:000> wt
Tracing test!main to return address 00401291
    6     0 [  0] test!main
    1     0 [  1]   test!ILT+5(_foo)
    4     0 [  1]   test!foo
   13     5 [  0] test!main
    1     0 [  1]   test!ILT+0(_bar)
    4     0 [  1]   test!bar
   17    10 [  0] test!main

27 instructions were executed in 26 events 
                                  (0 from other threads)

Function Name         Invocations MinInst MaxInst AvgInst
test!ILT+0(_bar)                1       1       1       1
test!ILT+5(_foo)                1       1       1       1
test!bar                        1       4       4       4
test!foo                        1       4       4       4
test!main                       1      17      17      17

0 system calls were executed

Although helpful, tracing a larger function calling a multitude of other functions slows down the debuggee significantly. An interesting question is thus how wt is implemented. Three possible implementation strategies come to mind:

  1. Use single-stepping. After each instruction executed, a debug trap is raised and the debugger is delivered a single-step debugging event. Though all non-branching instructions are probably irrelevant to wt, by intercepting each call and ret instruction, the debugger is able to trace function entry and exit.
  2. Explicitly set breakpoints. The debugger disassembles the function to be traced and places an ordinary breakpoint on each call instruction as well on as the return address of the function. Whenever one of the call-breakpoints fires, the debugger instruments the target function in the same way (i.e. place breakpoints on each call instruction as well as the return address) and continues execution (without single-stepping). By intercepting all function calls and returns, the debugger is able to deduce the call tree. This approach would be similar to UMSS.
  3. Use Last Branch Recording. This is a rather new additon to the IA-32 instruction set that allows setting breakpoints on taken branches, interrupts, and exceptions, and to single-step from one branch to the next.

In order to find out, we have to debug the debugger to observe how it debugs the target. We thus start WinDBG, choose our test application as target and let it break on main. We then start another WinDBG instance and attach it to the first WinDBG instance. In order to find out which debugging events are consumed by the first instance, we use the second debugger to trace function calls made by the first debugger.

All usermode debuggers eventually end up calling ntdll!NtWaitForDebugEvent in a loop — so to find out which debugging events are consumed, all we need to do is trace all calls to this function. While being an undocumented native function, there is an excellent summary on the inner workings of user mode debugging which also covers ntdll!NtWaitForDebugEvent. Given this information, all we need to do to check whether strategy #1 or strategy #2 has been implemented (I assume #3 may safely be neglected) is to put together a little breakpoint command like the following (line breaks added for clarity):

bp ntdll!NtWaitForDebugEvent "
   r @$t1=poi(esp+10); 
   g @$ra; 
   .if (poi(@$t1)==8) {.echo \"SingleStep\n\" } 
   .else {.printf \"Excp %p\\n\", poi(@$t1+c)};
   g "

When entering ntdll!NtWaitForDebugEvent, we store the address of the fourth parameter (which receives a PDBGUI_WAIT_STATE_CHANGE structure) in $t1 and step out of the function. Then we reach into the structure whose address is stored in $t1 and check if the first field marks the event of being of type DbgSingleStepStateChange (0x8) and output an appropriate message. If we receive about 30 single-step events, strategy #1 has probably been chosen. For #2 we would expect to receive 5 breakpoint events.

Back to the first debugger, we now opt to trace the main function by running wt. This yields the output shown above. Switching to the second debugger again, we now see the following output:

SingleStep
SingleStep
SingleStep

[...about 20 more...]

SingleStep
SingleStep
SingleStep
SingleStep
SingleStep

Quite obviously, wt implements strategy #1 — it does single stepping. Although this does not really come as a surprise, it is still unfortunate as it is most likely the slowest approach of tracing calls. And as anybody who has ever used wt can probably confirm, wt is really slow.

As an interesting side note, as of Linux kernel 2.6.25, ptrace on x86 has been enhanced to facilitate Last Branch Recording on CPUs that support it.

Walking the stack of the current thread

StackWalk64 provides a convenient and platform-independent way to walk the stack of a thread. Although useful and powerful indeed, it is definitively one of the less trivial to use functions. The additional fact that the MSDN documentation for StackWalk64 is rather light does not make things easier. There is, however, a decent article on CodeProject covering the usage of StackWalk64.

Although you can walk the stack of any thread using StackWalk64, this post only deals with walking the current thread’s stack. In order to walk the stack of the current thread, you first have to obtain the CONTEXT of the current thread. The naive way to obtain this context would be to call GetThreadContext( GetCurrentThread() ) — however, as the documentation for GetThreadContext clearly states, the result of this function is undefined if used on the current thread.

Obtaining a CONTEXT

64 Bit windows provides the function RtlCaptureContext for this purpose, so obtaining the context is trivial. 32 Bit Windows, however, does not implement this function, yet it is not hard to come up with a surregate.

One way to obtain the context is to set up a SEH frame, deliberately raise an exception and obtain the context in the exception filter using GetExceptionInformation. Although that works, it is probably the slowest way to do it. An easier and quicker solution is to use inline assembly (we only have to support 32 bit as we have RtlCaptureContext for x64/IA64, so using assembly is ok). Using appropriate instructions, we can manually populate the x86 CONTEXT structure. However, as eip cannot be accessed using a mov instruction, we have to use a little trick to obtain the value of the instruction pointer — use an (otherwise useless) label:

CONTEXT Context;
ZeroMemory( &Context, sizeof( CONTEXT ) );
Context.ContextFlags = CONTEXT_CONTROL;

__asm
{
Label:
  mov [Context.Ebp], ebp;
  mov [Context.Esp], esp;
  mov eax, [Label];
  mov [Context.Eip], eax;
}

Preparing to call StackWalk64

Having obtained the context, we have to initialize the STACKFRAME64 structure for the first call. This is trivial, but platform-specific.

In order to enable StackWalk64 to use symbol information, dbghelp has to be initialized by calling SymInitialize, passing the current process handle as parameter. Unless you plan to manually call SymLoadModule64 for all affected modules, it is crucial to pass TRUE for the fInvadeProcess argument. If you pass FALSE or forget to call SymInitialize altogether (After all, the documentation of StackWalk64 does not state that calling SymInitialize is required), you will notice that StackWalk64 works on x86 builds, but fails (with last error set to 0x1E7) on x64. Also, as dbghelp is not threadsafe, it is usually a good idea to protect any calls to dbghelp by a critical section

Calling StackWalk64

After these provisions have been made, calling StackWalk64 is a snap. The only unfortnate thing is that you cannot really distinguish between StackWalk64 returning FALSE because of a failure or because of the bottom of the stack having been reached as StackWalk64 does not reliably set the last Win32 error.

The following code puts it all together. Note that only a the minimal amount of stack trace information is gathered, i.e. only the return addresses are saved in the array.

typedef struct _STACKTRACE
{
  //
  // Number of frames in Frames array.
  //
  UINT FrameCount;

  //
  // PC-Addresses of frames. Index 0 contains the topmost frame.
  //
  ULONGLONG Frames[ ANYSIZE_ARRAY ];
} STACKTRACE, *PSTACKTRACE;

/*++
  Routine Description:
    Capture the stack trace based on the given CONTEXT.

  Parameters:
    Context    Thread context. If NULL, the current context is
               used.
    Event      Event structure to populate.
    MaxFrames  Max number of frames the structure can hold.
--*/
#ifdef _M_IX86
  //
  // Disable global optimization and ignore /GS waning caused by 
  // inline assembly.
  //
  #pragma optimize( "g", off )
  #pragma warning( push )
  #pragma warning( disable : 4748 )
#endif
HRESULT CaptureStackTrace(
  __in_opt CONST PCONTEXT InitialContext,
  __in PSTACKTRACE StackTrace,
  __in UINT MaxFrames 
  )
{
  DWORD MachineType;
  CONTEXT Context;
  HRESULT Hr;
  STACKFRAME64 StackFrame;

  ASSERT( StackTrace );
  ASSERT( MaxFrames > 0 );

  if ( InitialContext == NULL )
  {
    //
    // Use current context.
    //
    // N.B. GetThreadContext cannot be used on the current thread.
    // Capture own context - on i386, there is no API for that.
    //
#ifdef _M_IX86
    ZeroMemory( &Context, sizeof( CONTEXT ) );

    Context.ContextFlags = CONTEXT_CONTROL;
    
    //
    // Those three registers are enough.
    //
    __asm
    {
    Label:
      mov [Context.Ebp], ebp;
      mov [Context.Esp], esp;
      mov eax, [Label];
      mov [Context.Eip], eax;
    }
#else
    RtlCaptureContext( &Context );
#endif  
  }
  else
  {
    CopyMemory( &Context, InitialContext, sizeof( CONTEXT ) ); 
  }

  //
  // Set up stack frame.
  //
  ZeroMemory( &StackFrame, sizeof( STACKFRAME64 ) );
#ifdef _M_IX86
  MachineType                 = IMAGE_FILE_MACHINE_I386;
  StackFrame.AddrPC.Offset    = Context.Eip;
  StackFrame.AddrPC.Mode      = AddrModeFlat;
  StackFrame.AddrFrame.Offset = Context.Ebp;
  StackFrame.AddrFrame.Mode   = AddrModeFlat;
  StackFrame.AddrStack.Offset = Context.Esp;
  StackFrame.AddrStack.Mode   = AddrModeFlat;
#elif _M_X64
  MachineType                 = IMAGE_FILE_MACHINE_AMD64;
  StackFrame.AddrPC.Offset    = Context.Rip;
  StackFrame.AddrPC.Mode      = AddrModeFlat;
  StackFrame.AddrFrame.Offset = Context.Rsp;
  StackFrame.AddrFrame.Mode   = AddrModeFlat;
  StackFrame.AddrStack.Offset = Context.Rsp;
  StackFrame.AddrStack.Mode   = AddrModeFlat;
#elif _M_IA64
  MachineType                 = IMAGE_FILE_MACHINE_IA64;
  StackFrame.AddrPC.Offset    = Context.StIIP;
  StackFrame.AddrPC.Mode      = AddrModeFlat;
  StackFrame.AddrFrame.Offset = Context.IntSp;
  StackFrame.AddrFrame.Mode   = AddrModeFlat;
  StackFrame.AddrBStore.Offset= Context.RsBSP;
  StackFrame.AddrBStore.Mode  = AddrModeFlat;
  StackFrame.AddrStack.Offset = Context.IntSp;
  StackFrame.AddrStack.Mode   = AddrModeFlat;
#else
  #error "Unsupported platform"
#endif

  StackTrace->FrameCount = 0;

  //
  // Dbghelp is is singlethreaded, so acquire a lock.
  //
  // Note that the code assumes that 
  // SymInitialize( GetCurrentProcess(), NULL, TRUE ) has 
  // already been called.
  //
  EnterCriticalSection( &DbgHelpLock );

  while ( StackTrace->FrameCount < MaxFrames )
  {
    if ( ! StackWalk64(
      MachineType,
      GetCurrentProcess(),
      GetCurrentThread(),
      &StackFrame,
      MachineType == IMAGE_FILE_MACHINE_I386 
        ? NULL
        : &Context,
      NULL,
      SymFunctionTableAccess64,
      SymGetModuleBase64,
      NULL ) )
    {
      //
      // Maybe it failed, maybe we have finished walking the stack.
      //
      break;
    }

    if ( StackFrame.AddrPC.Offset != 0 )
    {
      //
      // Valid frame.
      //
      StackTrace->Frames[ StackTrace->FrameCount++ ] = 
        StackFrame.AddrPC.Offset;
    }
    else
    {
      //
      // Base reached.
      //
      break;
    }
  }
  
  LeaveCriticalSection( &DbgHelpLock );

  return S_OK;
}
#ifdef _M_IX86
  #pragma warning( pop )
  #pragma optimize( "g", on )
#endif

When HEAP_NO_SERIALIZE heap operations are not quite ‘not serialized’

While the RTL Heap performs all heap operation in an interlocked manner by default, it can be requested not to serialize operations by passing the HEAP_NO_SERIALIZE flag to HeapCreate or HeapAlloc. In this case, the caller is in charge to provide proper synchronization.

As it turns out, things change when Application Verifier heap checks are enabled for this process. To check heap operations, Application Verifier intercepts certain operations. Presumably to protect its own internal data structures, at some points, Avrf enters a critical section, as shown in this stack trace:

0:009> k
ChildEBP RetAddr  
0840f540 77d9cfad ntdll!ZwWaitForSingleObject+0x15
0840f5a4 77d9cf78 ntdll!RtlpWaitOnCriticalSection+0x154
0840f5cc 6db25f92 ntdll!RtlEnterCriticalSection+0x152
0840f5f0 6db29d99 vfbasics!AVrfpFreeMemLockChecks+0x42 
0840f60c 6db2bae8 vfbasics!AVrfpFreeMemNotify+0x39 
0840f668 77731d27 vfbasics!AVrfpRtlFreeHeap+0x148 
0840f67c 6d242757 kernel32!HeapFree+0x14
...

Entering a critical section is no big thing — nontheless, it effectvely means that heap operations, whether they are conducted with HEAP_NO_SERIALIZE flag set or not, must be considered to be at least partially serialized by Application Verifier.

Although this somewhat contradicts the idea of the HEAP_NO_SERIALIZE flag, this should not be a problem in most cases — especially when HEAP_NO_SERIALIZE has been passed for performance reasons only. However, if HEAP_NO_SERIALIZE has been used for correctness reasons, this might indeed bite you. In my specific case, I used HEAP_NO_SERIALIZE and did the synchronization myself in order to break a deadlock situation. And while the application ran fine (and deadlock-free) without Avrf enabled, it began deadlocking randomly as soon as I enabled Avrf heap checks — for the very reason just described. This is especially annoying as I am now left with two less-than-optimal choices: Leave Avrf enabled for my test suite and experience random deadlocks or forgo Avrf’s helpful heap checks in order not to deadlock. Grrr, I need more choices!

Glitch in documentation of SYMBOL_INFO

The documentation of the SYMBOL_INFO structure used by dbghelp contains a glitch. As the structure is of variable length, the documentation for SizeOfStruct states:

Size of the structure, in bytes. This member must be set to sizeof(SYMBOL_INFO). Note that the total size of the data is the SizeOfStruct + MaxNameLen – 1. The reason to subtract one is that the first character in the name is accounted for in the size of the structure.

While this is true for SYMBOL_INFO (the ANSI version), it is wrong for SYMBOL_INFOW (the Unicode version). As MaxNameLen contains the character count rather than the byte count of the Name field, it should read:

Size of the structure, in bytes. This member must be set to sizeof(SYMBOL_INFO). Note that the total size of the data is SizeOfStruct + ( MaxNameLen – 1 ) * sizeof( TCHAR ). The reason to subtract one is that the first character in the name is accounted for in the size of the structure.

I have reported this bug, let’s see how soon it gets fixed…

Update: The documentation has meanwhile been fixed.


Categories




About me

Johannes Passing lives in Berlin, Germany and works as a Solutions Architect at Google Cloud.

While mostly focusing on Cloud-related stuff these days, Johannes still enjoys the occasional dose of Win32, COM, and NT kernel mode development.

He also is the author of cfix, a C/C++ unit testing framework for Win32 and NT kernel mode, Visual Assert, a Visual Studio Unit Testing-AddIn, and NTrace, a dynamic function boundary tracing toolkit for Windows NT/x86 kernel/user mode code.

Contact Johannes: jpassing (at) hotmail com

LinkedIn Profile
Xing Profile
Github Profile
Advertisements