Archive for the 'Kernel' Category

Windows Hotpatching: A Walkthrough

As discussed in the last post, Windows 2003 SP1 introduced a technology known as Hotpatching. An integral part of this technology is Hotpatching, which refers to the process of applying an updated on the fly by using runtime code modification techniques.

Although Hotpatching has caught a bit of attention, suprisingly little information has been published about its inner workings. As the technology is patented, however, there is quite a bit of information that can be obtained by reading the patent description. Moreover, there is this (admittedly very terse) discussion about the actual implementation of hotpatching.

Armed with this information, it is possible to get into more detail by looking what is actually happening under the hood when a hoftix is applied: I did so and chose KB911897 as an example, which fixes some flaw in mrxsmb.sys and rdbss.sys. I have also gone through the hassle of translating key parts of the respective assembly code back to C.

Preparing the machine

First, we need a proper machine image which can be used for the experiment. Unfortunately, KB911897 is an SP1 package, so we have to use an old Win 2003 Server SP1 system to apply this update. Once we have the machine running, we can attach the kernel debugger and see what is happening when the hotfix is installed.

Observing the update

When launched with /hotpatch:enable, after some initialization work, the updater calls NtSetSystemInformation (which delegates to ExApplyCodePatch) to apply the hotpatch. Hotpatching includes a coldpatch, which I do not care about here and the actual hotpatch. The first two calls to NtSetSystemInformation (and thus to ExApplyCodePatch) are coldpatching-related and I will thus ignore them here. The third call, however, is made to apply the actual hotpatch, so let’s observe this one further.

Requiring a kernel mode-patch, ExApplyCodePatch then calls MmHotPatchRoutine, which is where the fun starts. Expressed in C, MmHotPatchRoutine, MmHotPatchRoutine roughly looks like this (reverse engineered from assembly, might be slightly incorrect):

NTSTATUS MmHotPatchRoutine(
  __in PSYSTEM_HOTPATCH_CODE_INFORMATION RemoteInfo
  )
{
  UNICODE_STRING ImageFileName;
  DWORD Flags = RemoteInfo->Flags;
  PVOID ImageBaseAddress;
  PVOID ImageHandle;
  NTSTATUS Status, LoadStatus;
  KTHREAD CurrentThread;

  ImageFileName.Length = RemoteInfo->KernelInfo.NameLength;
  ImageFileName.MaximumLength = RemoteInfo->KernelInfo.NameLength;
  ImageFileName.Buffer = ( PBYTE ) RemoteInfo + NameOffset;

  CurrentThread = KeGetCurrentThread();
  KeEnterCriticalRegion( CurrentThread );

  KeWaitForSingleObject(
    MmSystemLoadLock,
    WrVirtualMemory,
    0,
    0,
    0 );

  LoadStatus = MmLoadSystemImage(
    &ImageFileName,
    0,
    0,
    0,
    &ImageHandle,
    &ImageBaseAddress );
  if ( NT_SUCCESS( Status ) || Status == STATUS_IMAGE_ALREADY_LOADED )
  {

    Status = MiPerformHotPatch(
      ImageHandle,
      ImageBaseAddress,
      Flags );
    
    if ( NT_SUCCESS( Status ) || LoadStatus == STATUS_IMAGE_ALREADY_LOADED )
    {
      NOTHING;
    }
    else
    {
      MmUnloadSystemImage( ImageHandle );
    }
    
    LoadStatus = Status;
  }


  KeReleaseMutant(
    MmSystemLoadLock,
    1,  // increment
    FALSE,
    FALSE );

  KeLeaveCriticalRegion( CurrentThread );

  return LoadStatus;
}

As you see in the code, MmHotPatchRoutine will try load the hotpatch image — we can verify this in the debugger:

kd> bp nt!MmLoadSystemImage

kd> g
Breakpoint 3 hit
nt!MmLoadSystemImage:
808ec4b5 6878010000      push    178h

kd> k
ChildEBP RetAddr  
f6acbb28 80990c9e nt!MmLoadSystemImage
f6acbb68 809b2d67 nt!MmHotPatchRoutine+0x59
f6acbba8 808caeff nt!ExApplyCodePatch+0x191
f6acbd50 8082337b nt!NtSetSystemInformation+0xa1e
f6acbd50 7c82ed54 nt!KiFastCallEntry+0xf8
0006bc50 7c821f24 ntdll!KiFastSystemCallRet
0006bd44 7c8304c9 ntdll!ZwSetSystemInformation+0xc
[...]

kd> dt _UNICODE_STRING poi(@esp+4)
ntdll!_UNICODE_STRING
 "\??\c:\windows\system32\drivers\hpf3.tmp"
   +0x000 Length           : 0x50
   +0x002 MaximumLength    : 0x50
   +0x004 Buffer           : 0x81623fa8  "\??\c:\windows\system32\drivers\hpf3.tmp"
   
kd> gu

kd> lm
start    end        module name
[...]           
f6ba4000 f6bad000   hpf3       (deferred)  
[...]
f95cb000 f9641000   mrxsmb     (deferred)  
f9641000 f9671000   rdbss      (deferred)      
[...]

Having loaded the hotpatch image, MmHotPatchRoutine proceeds be calling MiPerformHotPatch, which looks about like this:

NTSTATUS
MiPerformHotPatch(
  IN PLDR_DATA_TABLE_ENTRY ImageHandle,
  IN PVOID ImageBaseAddress,
  IN DWORD Flags
  )
{
  PHOTPATCH_HEADER SectionData ;
  PRTL_PATCH_HEADER Header;    
  NTSTATUS Status;
  PVOID LockVariable;
  PVOID LockedBuffer;
  BOOLEAN f;
  PLDR_DATA_TABLE_ENTRY LdrEntry;

  SectionData = RtlGetHotpatchHeader( ImageBaseAddress );
  if ( ! SectionData  )
  {
    return STATUS_INVALID_PARAMETER;
  }
  
  //
  // Try to get header from MiHotPatchList
  //
  Header = RtlFindRtlPatchHeader(
    MiHotPatchList,
    ImageHandle );

  if ( ! Header )
  {
    PLIST_ENTRY Entry;

    if ( Flags & FLG_HOTPATCH_ACTIVE )
    {
      return STATUS_NOT_SUPPORTED;
    }

    Status = RtlCreateHotPatch(
      &Header,
      SectionData,
      ImageHandle,
      Flags
      );
    if ( ! NT_SUCCESS( Status ) )
    {
      return Status;
    }

    ExAcquireResourceExclusiveLite(
      PsLoadedModuleResource,
      TRUE
      );

    Entry =  PsLoadedModuleList;
    while ( Entry != PsLoadedModuleList )
    {
      LdrEntry = DataTableEntry = CONTAINING_RECORD( Entry,
                                            KLDR_DATA_TABLE_ENTRY,
                                            InLoadOrderLinks )
      if ( LdrEntry->DllBase DllBase >= MiSessionImageEnd )
      {
        if ( RtlpIsSameImage( Header, LdrEntry ) )
        {
          break;
        }
      }
    }

    ExReleaseResourceLite( PsLoadedModuleResource );

    if ( ! PatchHeader->TargetDllBase )
    {
      Status = STATUS_DLL_NOT_FOUND ;
    }

    Status = ExLockUserBuffer(
      ImageHandle->DllBase,
      ImageHandle->SizeOfImage,
      KernelMode,
      IoWriteAccess,
      LockedBuffer,
      LockVariable
      );
    if ( ! NT_SUCCESS( Status ) )
    {
      FreeHotPatchData( Header );
      return Status;
    }


    Status = RtlInitializeHotPatch(
      ( PRTL_PATCH_HEADER ) Header,
      ( PBYTE ) LockedBuffer - ImageHandle->DllBase
      );

    ExUnlockUserBuffer( LockVariable );

    if ( ! NT_SUCCESS( Status ) )
    {
      FreeHotPatchData( ImageHandle );
      return Status;
    }

    f = 1;
  }
  else
  {
    if ( ( Flags ^ ImageHandle->CodeInfo->Flags ) & FLG_HOTPATCH_ACTIVE )
    {
      return STATUS_NOT_SUPPORTED;
    }

    if ( ! ( ImageHandle->CodeInfo->Flags & FLG_HOTPATCH_ACTIVE ) )
    {
      Status = RtlReadHookInformation( Header );
      if ( ! NT_SUCCESS( Status ) )
      {
        return Status;
      }
    }

    f = 0;
  }
  
  Status = MmLockAndCopyMemory(
    ImageHandle->CodeInfo,
    KernelMode
    );
  if ( NT_SUCCESS( Status ) )
  {
    if ( ! f  )
    {
      return Status;
    }

    LdrEntry->EntryPointActivationContext = Header;  // ???
    InsertTailList( MiHotPatchList, LdrEntry->PatchList );
  }
  else
  {
    if ( f ) 
    {
      RtlFreeHotPatchData( Header );
    }
  }

  return Status;
}

So MiPerformHotPatch inspects the hotpatch information stored in the hotpatch image. This data includes information about which code regions need to be updated. After the neccessary information has been gathered, it applies the code changes.

Two basic problems have to be overcome now: On the one hand, all code sections of drivers are mapped read/execute only. Overwring the instructions thus does not work. On the other hand, the system has to properly synchronize the patching process, i.e. it has to make sure no CPU is currently executing the code that is about to be patched.

To overcome the memory protection problems, Windows facilitates a trick I previously only knew from malware: It creates a memory descriptor list (MDL) for the affected code region, maps the MDL, and updates the code through this mapped region. The memory protection is thus circumvented. As it turns, out, there is even a handy, undocumented helper routine for this purpose: ExLockUserBuffer, which is used by MiPerformHotPatch.

To proceed, MiPerformHotPatch calls MmLockAndCopyMemory to do the actual patching. So how does Windows synchronize the update process? Again, it uses a technique I assumed was a malware trick: It schedules CPU-specific DPCs on all CPUs but the current and keeps those DPCs busy while the current thread is uddating the code. Again, Windows provides a neat routine for that: KeGenericCallDpc. In addition to this, Windows raises the IRQL to clock level in order to mask all interrupts.

Here is the pseudo-code for MmLockAndCopyMemory and its helper, MiDoCopyMemory:

NTSTATUS
MmLockAndCopyMemory (
    IN PSYSTEM_HOTPATCH_CODE_INFORMATION PatchInfo,
    IN KPROCESSOR_MODE ProbeMode
    )
{
  PVOID Buffer;
  NTSTATUS Status;
  UINT Index;

  if ( 0 == PatchInfo->CodeInfo.DescriptorsCount )
  {
    return STATUS_SUCCESS;
  }

  Buffer = ExAllocatePoolWithQuotaTag( 
    9,
    PatchInfo->CodeInfo.DescriptorsCount * 2,
    'PtoH' );
  if ( ! Buffer )
  {
    return STATUS_INSUFFICIENT_RESOURCES;
  }
  RtlZeroMemory( Buffer, PatchInfo->CodeInfo.DescriptorsCount * 2 );

  if ( 0 == PatchInfo->CodeInfo.DescriptorsCount )
  {
    Status = STATUS_INVALID_PARAMETER;
    goto Cleanup;
  }

  for ( Index = 0; Index CodeInfo.DescriptorsCount; Index++ )
  {
    if ( PatchInfo->CodeInfo.CodeDescriptors[ Index ].CodeOffset > PatchInfo->InfoSize ||
       PatchInfo->CodeInfo.CodeDescriptors[ Index ].CodeSize > PatchInfo->InfoSize ||
       PatchInfo->CodeInfo.CodeDescriptors[ Index ].CodeOffset +
       PatchInfo->CodeInfo.CodeDescriptors[ Index ].CodeSize > PatchInfo->InfoSize || 
       /* other checks... */ )
    {
      Status = STATUS_INVALID_PARAMETER;
      goto Cleanup;
    }

    Status = ExLockUserBuffer(
      TargetAddress,
      PatchInfo->CodeInfo.CodeDescriptors[ Index ].CodeSize
      ProbeMode,
      IoWriteAccess,
      &PatchInfo->CodeInfo.CodeDescriptors[ Index ].MappedAddress,
      Buffer[ Index ]
      );
    if ( ! NT_SUCCESS( Status ) )
    {
      goto Cleanup;
    }
  }

  PatchInfo->Flags |= FLG_HOTPATCH_ACTIVE;

  KeGenericCallDpc(
    MiDoCopyMemory,
    PatchInfo );

  if ( PatchInfo->Flags & FLG_HOTPATCH_VERIFICATION_ERROR )
  {
    PatchInfo->Flags &= ~FLG_HOTPATCH_ACTIVE;
    PatchInfo->Flags &= ~FLG_HOTPATCH_VERIFICATION_ERROR;
    Status = STATUS_DATA_ERROR;
  }

Cleanup:
  if ( PatchInfo->CodeInfo.DescriptorsCount > 0 )
  {
    for ( Index = 0; Index CodeInfo.DescriptorsCount; Index++ )
    {
      ExUnlockUserBuffer( Buffer[ Index ] );
    }
  }

  ExFreePoolWithTag( Buffer, 0 );
  return Status;
}

VOID MiDoCopyMemory(
  IN PKDPC Dpc,
  IN PSYSTEM_HOTPATCH_CODE_INFORMATION PatchInfo,
  IN ULONG NumberCpus,
  IN DEFERRED_REVERSE_BARRIER ReverseBarrier
  )
{
  KIRQL OldIrql;
  UNREFERENCED_PARAMETER( Dpc );
  NTSTATUS Status;
  ULONG Index;

  OldIrql = KfRaiseIrql( CLOCK1_LEVEL );

  //
  // Decrement reverse barrier count.
  //
  Status = KeSignalCallDpcSynchronize( ReverseBarrier );
  if ( ! NT_SUCCESS( Status ) )
  {
    goto Cleanup;
  }

  PatchInfo->Flags &= ~FLG_HOTPATCH_VERIFICATION_ERROR;
    
  for ( Index = 0; Index CodeInfo.DescriptorsCount; Index++ )
  {
    if ( PatchInfo->Flags & FLG_HOTPATCH_ACTIVE )
    {
      if ( PatchInfo->CodeInfo.CodeDescriptors[ Index ].ValidationSize != 
        RtlCompareMemory(
          PatchInfo->CodeInfo.CodeDescriptors[ Index ].MappedAddress,
          ( PBYTE ) PatchInfo + PatchInfo->CodeInfo.CodeDescriptors[ Index ].ValidationOffset,
          PatchInfo->CodeInfo.CodeDescriptors[ Index ].ValidationSize ) )
      {

        if ( PatchInfo->CodeInfo.CodeDescriptors[ Index ].CodeSize != 
          RtlCompareMemory(
            PatchInfo->CodeInfo.CodeDescriptors[ Index ].MappedAddress,
            ( PBYTE ) PatchInfo + PatchInfo->CodeInfo.CodeDescriptors[ Index ].OrigCodeOffset,
            PatchInfo->CodeInfo.CodeDescriptors[ Index ].CodeSize ) )
        {
          PatchInfo->Flags &= FLG_HOTPATCH_VERIFICATION_ERROR;
          break;
        }
      }
    }
    else
    {
      if ( PatchInfo->CodeInfo.CodeDescriptors[ Index ].CodeSize !=
        RtlComparememory(
          PatchInfo->CodeInfo.CodeDescriptors[ Index ].MappedAddress,
          ( PBYTE ) PatchInfo + PatchInfo->CodeInfo.CodeDescriptors[ Index ].CodeOffset,
          PatchInfo->CodeInfo.CodeDescriptors[ Index ].CodeSize ) )
      {
        PatchInfo->Flags &= FLG_HOTPATCH_VERIFICATION_ERROR;
        break;
      }
    }
  }

  //loc_479533
  if ( PatchInfo->Flags & FLG_HOTPATCH_VERIFICATION_ERROR ||
     PatchInfo->CodeInfo.DescriptorsCount <= 0 )
  {
    goto Cleanup;
  }

  for ( Index = 0; Index CodeInfo.DescriptorsCount; Index++ )
  {
    PVOID Source;
    if ( PatchInfo->Flags & FLG_HOTPATCH_ACTIVE )
    {
      Source = ( PBYTE ) PatchInfo + PatchInfo->CodeInfo.CodeDescriptors[ Index ].CodeOffset;
    }
    else
    {
      Source = ( PBYTE ) PatchInfo + PatchInfo->CodeInfo.CodeDescriptors[ Index ].OrigCodeOffset;
    }

    RtlCopyMemory(
      PatchInfo->CodeInfo.CodeDescriptors[ Index ].MappedAddress,
      Source,
      PatchInfo->CodeInfo.CodeDescriptors[ Index ].CodeSize
      );
  }


Cleanup:
   KeSignalCallDpcSynchronize( ReverseBarrier );
   KfLowerIrql( OldIrql );
   KeSignalCallDpcDone( NumberCpus );
}

To see the code, in action, we set a breakpoint on nt!MiDoCopyMemory:

kd> k
ChildEBP RetAddr  
f6acbac0 8087622f nt!MiDoCopyMemory
f6acbae8 80990a10 nt!KeGenericCallDpc+0x3d
f6acbb0c 80990bea nt!MmLockAndCopyMemory+0xf1
f6acbb34 80990cba nt!MiPerformHotPatch+0x143
f6acbb68 809b2d67 nt!MmHotPatchRoutine+0x75
f6acbba8 808caeff nt!ExApplyCodePatch+0x191
f6acbd50 8082337b nt!NtSetSystemInformation+0xa1e

Before letting MiDoCopyMemory do its work, let’s see what it is about to do. No modifications have yet been done to mrxsmb:

kd> !chkimg mrxsmb
0 errors : mrxsmb 

kd> !chkimg rdbss
0 errors : rdbss

The second argument is a structure holding the information garthered previously, peeking into it reveals:

kd> dd /c 1 poi(esp+8) l 4
81583008  00000001
8158300c  00000149
81583010  00000008   <-- # of code patches
81583014  f9648b1f   <-- hmm...

As it turns out, address 81583014 refers to a variable length array of size 8. Poking aroud with dd, the following listing suggests that the structure is of size 28 bytes:

kd> dd /c 7 81583014
81583014  f9648b1f fa2afb1f 000000ec 00000005 000000f1 000000f6 00000005
81583030  f9648b24 fa2b2b24 000000fb 00000002 000000fd 000000ff 00000002
8158304c  f96585ef fa2b15ef 00000101 00000005 00000106 0000010b 00000005
81583068  f96585f4 fa2b45f4 00000110 00000002 00000112 00000114 00000002
81583084  f9658569 fa2b3569 00000116 00000005 0000011b 00000120 00000005
815830a0  f965856e fa2b656e 00000125 00000002 00000127 00000129 00000002
815830bc  f9653378 fa2b5378 0000012b 00000005 00000130 00000135 00000005
815830d8  f965337d fa2b837d 0000013a 00000005 0000013f 00000144 00000005

Given that rdbss was loaded to address range f9641000-f9671000, it is obvious that the first 2 columns refer to code addresses. The third, fifth and sixth column looks like an offset, the fourth and seventh like the length of the code change. First, let’s see where the first column points to:

kd> u f9648b1f
rdbss!RxInitiateOrContinueThrottling+0x6b:
f9648b1f 90              nop
f9648b20 90              nop
f9648b21 90              nop
f9648b22 90              nop
f9648b23 90              nop
rdbss!RxpCancelRoutine:
f9648b24 8bff            mov     edi,edi
f9648b26 55              push    ebp
f9648b27 8bec            mov     ebp,esp

Now that looks promising, especially since the fourth column holds the value 5. Let’s look at the second row:

kd> u f9648b24
rdbss!RxpCancelRoutine:
f9648b24 8bff            mov     edi,edi

No doubt, the first and second row define the two patches necessary to redirect RxpCancelRoutine. But what to replace this code with? As it turns out, the offsets in column three are relative to the structure and point to the code that is to be written:

kd> u poi(esp+8)+000000ec
815830f4 e9dcc455fd      jmp     7eadf5d5          mov     edi,edi

kd> u poi(esp+8)+000000fb
81583103 ebf9            jmp     815830fe

That makes perfectly sense — the five nops are to be overwritten by a near jump, the mov edi, edi will be replaced by a short jump.

So let’s run MiDoCopyMemory and have a look at the results. Back in MmLockAndCopyMemory, the code referred to by the first to rows look like this:

kd> u f9648b1f
rdbss!RxInitiateOrContinueThrottling+0x6b:
f9648b1f e9dcc455fd      jmp     hpf3!RxpCancelRoutine (f6ba5000)

kd> u f9648b24
rdbss!RxpCancelRoutine:
f9648b24 ebf9            jmp     rdbss!RxInitiateOrContinueThrottling+0x6b (f9648b1f)
f9648b26 55              push    ebp
f9648b27 8bec            mov     ebp,esp

VoilĂ , RxpCancelRoutine has been patched and calls are redirected to hpf3!RxpCancelRoutine, the new routine located in the auxiliarry ‘hpf3′ driver. All that remains to be done is cleanup (unlocking the memory etc).

That’s it — that’s how Windows applies patches on the fly using hotpatching. Too bad that the technology is so rarely used in practice.

Windows Hotpatching

Several years ago, with Windows Server 2003 SP1, Microsoft introduced a technology and infrastructure called Hotpatching. The basic intent of this infrastructure is to provide a means to apply hotfixes on the fly, i.e. without having to reboot the system — even if the hotfix contains changes on critical system components such as the kernel iteself, important drivers, or user mode libraries such as shell32.dll.

Trying to applying hotfixes on the fly introduces a variety of problems — the most important being:

  • Patching code that is currently in use
  • Atomically replacing files on disk that are currently in use and therefore locked
  • Making sure that all changes take effect for both, processes currently running and processes which are yet to be started (i.e. before the next reboot)
  • Allowing further hotfixes to be applied on system that has not been rebooted since the last hotfix has been applied in an on-the-fly fashion

The Windows Hotpatching infrastructure is capable of handling all these cases — it is, however, not applicable to all kinds of code fixes. Generally speaking, it can only be used for fixes that merely comprise smallish code changes but do not affect layout or semantics of data structures. A fix for a buffer overflow caused by an off-by-one error, however, is a perfect example for a fix that could certainly be applied using the Hotpatching infrastructure.

That all sounds good and nice, but reality is that we still reboot our machines for just about every update Microsoft provides us, right?

Right. The answer for this is threefold. First, as indicated, some hotfixes can be expected to make changes that cannot be safely applied using the Hotpatching system. Secondly, Hotpatching is used on an opt-in basis, so you will not benefit from it automatically: When a hotpatch-enabled hotfix is applied through Windows Update or by launching the corresponding exe file, it is not used and a reboot will be required. The user has to explicitly specify the /hotpatch:enable switch in order to have the hotfix to be applied on the fly.

In the months after the release of SP1, a certain fraction of the hotfixes issued by Microsoft were indeed hotpatch-enabled and could be applied without a reboot. Interestingly, however, I am not aware of a single hotfix issued since Server 2003 SP2 that supported hotpatching!

And thirdly: Whether Microsoft has lost faith in their hotpatching facility, whether the effort to test such hotfixes turned out to be too high or whether there were other reasons speaking against issueing hotpatch-enabled hotfixes — I do not know.

Notwithstanding this observation, Hotpatching is an interesting technology that deserves to be looked at in more detail. Although I will not cover the entire infrastructure, I will spend at least one more blog post on the mechanisms implemented in Windows that allow code modifications to be performed on the fly. That is, I will focus on the hotpatching part of the infrastructure and will ignore coldpatching and other, smaller aspects of the infrastructre.

I’ll be at WCRE 2009 presenting NTrace

Next week, the 16th Working Conference on Reverse Engineering (WCRE) will be held in Lille, France. I will be there presenting NTrace: Function Boundary Tracing for Windows on IA-32.

NTrace is a dynamic function boundary tracing toolkit for IA-32/x86 that can be used to trace both kernel and user mode Windows components — examples for components that can be traced include the kernel itself (ntoskrnl), drivers like NTFS as well as user mode components such as kernel32, shell32 or even explorer.exe.

NTrace implements a novel approach to instrumenting IA-32 machine code and integrating with the Structured Exception Handling facility of Windows. Using this approach, NTrace is not only capable of tracing nearly the entire Windows kernel and system libraries, it is also faster than Solaris DTrace FBT on IA-32!

Details on how exactly NTrace works will be publiched in the paper, which will be made available soon. I will also publish more details on NTrace both here and on a dedicated NTrace website.

The work, by the way, is basically the result of my Master’s thesis I wrote back in 2008.

Uniquely Identifying a Module’s Build

It is common practice to embed a version resource (VS_VERSIONINFO) into PE images such as DLL and EXE files. While this resource mainly serves informational purposes, the version information is occasionaly used to perform certain checks, such as verifying the module’s suitability for a particular purpose.

Under certain circumstances, however, this versioning information may be too imprecise: Versions are not necessarily incremented after each build, so it is possible that two copies of a module carry the same versioning information, yet differ significantly in their implementation. In such situations, identifying the actual build of the module might become neccessary.

The most common, but by no means the only situation in which this applies in practice concerns debugging — to identify the PDB file exactly matching a given module, the debugger must be able to recognize the specific build of a module. It thus does not come as a surprise that all images for which debugging information has been generated contain a dedicated identifier for this purpose: The CodeView signature GUID.

Summarizing what Oleg Starodumov has covered in more detail, cl, when directed to generate a PDB file, implicitly creates this GUID and, along with the path to the PDB file, embeds this data into the PE image. For current versions, the relevant structure used to encode this information is CV_INFO_PDB70, which seems to have been documented once, but not any more:

typedef struct _CV_INFO_PDB70
{
  ULONG CvSignature;
  GUID Signature;
  ULONG Age;
  UCHAR PdbFileName[ ANYSIZE_ARRAY ];
} CV_INFO_PDB70, *PCV_INFO_PDB70;

In order to be able to locate the structure within the PE image, a directory entry of type IMAGE_DEBUG_TYPE_CODEVIEW is written to the image’s debug directory. The following code listing demonstrates how to obtain the signature GUID of an image:

#define PtrFromRva( base, rva ) ( ( ( PUCHAR ) base ) + rva )

static PIMAGE_DATA_DIRECTORY GetDebugDataDirectory(
  __in ULONG_PTR LoadAddress
  )
{
  PIMAGE_DOS_HEADER DosHeader = 
    ( PIMAGE_DOS_HEADER ) ( PVOID ) LoadAddress;
  PIMAGE_NT_HEADERS NtHeader = ( PIMAGE_NT_HEADERS ) 
    PtrFromRva( DosHeader, DosHeader->e_lfanew );
  ASSERT ( IMAGE_NT_SIGNATURE == NtHeader->Signature );

  return &NtHeader->OptionalHeader.DataDirectory
      [ IMAGE_DIRECTORY_ENTRY_DEBUG ];
}

NTSTATUS GetDebugGuid(
  __in ULONG_PTR ModuleBaseAddress,
  __out GUID *Guid
  )
{
  PIMAGE_DATA_DIRECTORY DebugDataDirectory;
  PIMAGE_DEBUG_DIRECTORY DebugHeaders;
  ULONG Index;
  ULONG NumberOfDebugDirs;
  ULONG_PTR ModuleBaseAddress;
  NTSTATUS Status;

  DebugDataDirectory  = DebugDataDirectory( ModuleBaseAddress );
  DebugHeaders    = ( PIMAGE_DEBUG_DIRECTORY ) PtrFromRva( 
    ModuleBaseAddress, 
    DebugDataDirectory->VirtualAddress );

  ASSERT( ( DebugDataDirectory->Size % sizeof( IMAGE_DEBUG_DIRECTORY ) ) == 0 );
  NumberOfDebugDirs = DebugDataDirectory->Size / sizeof( IMAGE_DEBUG_DIRECTORY );

  //
  // Lookup CodeView record.
  //
  for ( Index = 0; Index < NumberOfDebugDirs; Index++ )
  {
    PCV_INFO_PDB70 CvInfo;
    if ( DebugHeaders[ Index ].Type != IMAGE_DEBUG_TYPE_CODEVIEW )
    {
      continue;
    }

    CvInfo = ( PCV_INFO_PDB70 ) PtrFromRva( 
      ModuleBaseAddress, 
      DebugHeaders[ Index ].AddressOfRawData );

    if ( CvInfo->CvSignature != 'SDSR' )
    {
      //
      // Weird, old PDB format maybe.
      //
      return STATUS_xxx_UNRECOGNIZED_CV_HEADER;
    }

    *Guid = CvInfo->Signature;
    return STATUS_SUCCESS;  
  }

  return STATUS_xxx_CV_GUID_LOOKUP_FAILED;
}

cfix 1.2 Installer Fixed for AMD64

The cfix 1.2 package as released last week contained a rather stupid bug that the new build, 1.2.0.3244, now fixes: the amd64 binaries cfix64.exe and cfixkr64.sys were wrongly installed as cfix32.exe and cfixkr32.sys, respectively. Not only did this stand in contrast to what the documenation stated, it also resulted in cfix being unable to load the cfixkr driver on AMD64 platforms.

The new MSI package is now available for download on Sourceforge.

cfix 1.2 introduces improved C++ support

cfix 1.2, which has been released today, introduces a number of new features, the most prominent being improved support for C++ and additional execution options.

New C++ API

To date, cfix has primarily focussed on C as the programming language to write unit tests in. Although C++ has always been supported, cfix has not made use of the additional capabilities C++ provides. With version 1.2, cfix makes C++ a first class citizen and introduces an additional API that leverages the benefits of C++ and allows writing test cases in a more convenient manner.

Being implemented on top of the existing C API, the C++ API is not a replacement, but rather an addition to the existing API set.

As the following example suggests, fixtures can now be written as classes, with test cases being implemented as methods:

#include <cfixcc.h>

class ExampleTest : public cfixcc::TestFixture
{
public:
  void TestOne() 
  {}
  
  void TestTwo() 
  {}
};

CFIXCC_BEGIN_CLASS( ExampleTest )
  CFIXCC_METHOD( TestOne )
  CFIXCC_METHOD( TestTwo )
CFIXCC_END_CLASS()

To learn more about the definition of fixtures, have a look at the respective TestFixture chapter in the cfix documentation.

Regarding the implementation of test cases, cfix adds a new set of type-safe, template-driven assertions that, for instance, allow convenient equality checks:

void TestOne() 
{
  const wchar_t* testString = L"test";
  
  //
  // Use typesafe assertions...
  //
  CFIXCC_ASSERT_EQUALS( 1, 1 );
  CFIXCC_ASSERT_EQUALS( L"test", testString );
  CFIXCC_ASSERT_EQUALS( wcslen( testString ), ( size_t ) 4 );
  
  //
  // ...log messages...
  //
  CFIX_LOG( L"Test string is %s", testString );
  
  //
  // ...or use the existing "C" assertions.
  //
  CFIX_ASSERT( wcslen( testString ) == 4 );
  CFIX_ASSERT_MESSAGE( testString[ 0 ] == 't', 
    L"Test string should start with a 't'" );
}

Again, have a look at the updated API reference for an overview of the new API additions.

Customizing Test Runs

Another important new feature is the addition of the new switches -fsf (Shortcut Fixture), -fsr (Shortcut Run), and -fss (Shortcut Run On Failing Setup). Using these switches allows you to specify how a test run should resume when a test case fails.

When a test case fails, the default behavior of cfix is to report the failure, and resume at the next test case. By specifying -fsf, however, the remaining test cases of the same fixture will be skipped and execution resumes at the next fixture. With -fsr, cfix can be requirested to abort the entire run as soon as a single test case fails.

What else is new in 1.2?

Download

As always, cfix 1.2 is source and binary compatible to previous versions. The new MSI package and source code can now be downloaded on Sourceforge.

cfix is open source and licensed under the GNU Lesser General Public License.

How GUI Thread Conversion on Svr03 Breaks the SEH Chain

The Windows kernel maintains two types of threads — Non-GUI threads, and GUI threads. Non-GUI threads threads use the default stack size of 12KB (on i386, which this this discussion applies to) and the default System Service Descriptor table (SSDT), KeServiceDescriptorTable. GUI threads, in contrast, are expected to have much larger stack requirements and thus use an extended stack size of 60 KB (Note: these are the numbers for Svr03 and may vary among releases). More importantly, however, GUI threads use a different SSDT — KeServiceDescriptorTableShadow. Unlike KeServiceDescriptorTable, which only supports the basic set of system calls, this SSDT also includes all the User and GDI system services.

All threads start off as Non-GUI threads. Once the application makes a call to a system service that does not fall within the default range, however, the NT kernel will suspect this thread to be about to do GUI stuff — and will convert the thread into a GUI thread.

Converting a thread to a GUI thread naturally has to entail two things — swapping the SSDT, and enlarging the stack. While swapping the SSDT is not really interesting, enlarging the stack size poses a challenge — you cannot really enlarge a stack as the nearby pages that would need to be acquired may not be available.

As a consequence, enlarging the stack works by swapping the stack. The old, small stack is exchanged against a newly allocated, larger stack. Now swapping a stack is not really a common thing to do and is pretty easy to get wrong. And well, as it turns out, the Svr03 kernel did in fact get it wrong.

But let’s start at the beginning.

When the number of the requested system service is found to be beyond the range supported by the default SSDT, KiConvertToGuiThread is called to perform the thread conversion. KiConvertToGuiThread itself is pretty dumb and lets PsConvertToGuiThread do the actual work.

The following pseudo code illustrates what PsConvertToGuiThread does:

NTSTATUS PsConvertToGuiThread()
{
  //
  // Create the new stack.
  //
  LargeStack = MmCreateKernelStack( ... )
  
  if ( LargeStack == NULL )
  {
    __try
    {
      //
      // Allocation failed -- set last error value.
      //
      NtCurrentTeb()->LastErrorValue = ERROR_NOT_ENOUGH_MEMORY;
    }
    __except( ... )
    {
    }
    
    //
    // N.B. We are still on the old stack.
    //
    
    //
    // This will copy the old thread's contents to the new stack and 
    // migrate the context of the current thread to the new stack.
    //
    SmallStack = KeSwitchKernelStack( LargeStack, ... );

    //
    // Now we are on the new stack.
    //
    MmDeleteKernelStack( SmallStack, ... );
  }
  ...
  //
  // Notify Win32k.
  //
  
  ( PspW32ProcessCallout )( ... )
  ...
  ( PspW32ThreadCallout ) ( ... )
  
  ...
}

This code looks innocent enough, but infact, it is lying. Too see why, you have to recall how Structured Exception Handling is implemented on i386 and how the C compiler makes use of it (I think I have spent way too much time with SEH over the past months…): The __try/__except-block at the top of the routine will cause to the compiler to emit the typical SEH prolog at the beginning of the function. The purpose of this prolog is to set up an EXCEPTION_REGISTRATION_RECORD and to put this record onto the current thread’s SEH chain, which in turn is rooted in the PCR. In the same way, the compiler will put an appropriate epilog to the end of the routine.

So while the code above suggests that the SEH stuff is scoped to the very beginning of the function, it will not be until the end of the function has been reached that the EXCEPTION_REGISTRATION_RECORD is torn down and removed from the SEH chain.

And at this point, it should become clear why this becomes a problem in the context of stack swapping. At the point where KeSwitchKernelStack is called, the EXCEPTION_REGISTRATION_RECORD will still be listed in the SEH chain, although it does not serve any particular purpose any more. So KeSwitchKernelStack is called, which will, as indicated before, copy the contents of the old stack to the new stack — which, of course, includes the EXCEPTION_REGISTRATION_RECORD.

But…

neither KeSwitchKernelStack, nor PsConvertToGuiThread updates the SEH pointer in the PCR! After the swapping has been conducted and MmDeleteKernelStack has returned, the root of the SEH chain will point to freed memory — memory where the EXCEPTION_REGISTRATION_RECORD once has been.

Now two things are worth noting. First, PsConvertToGuiThread can be expected to occupy the bottommost stack frame of the kernel stack. A situation where the dangling pointer could harm a caller of PsConvertToGuiThread is thus not possible.

Secondly, PsConvertToGuiThread makes callouts to Win32k by invoking the callbacks pointed to by PspW32ProcessCallout and PspW32ThreadCallout. And in fact, it is only PsConvertToGuiThread‘s luck that these routines are so well behaved that they do not cause the system to bugcheck because of the dangling pointer. If one of these routines (or routines called by these) did anything with the SEH chain going beyond adding another record to the chain and removing it later, odds were that this routine would dereference a stray pointer… and would bugcheck the system…

It is worth noting that the implementation of PsConvertToGuiThread has changed in Windows Vista, so that the above discussion does not apply to this and later releases.

Debugging a Debugger Deadlock

While I still use VisualStudio 2005 Team System for most of my development, I want to make sure that cfix works properly with VisualStudio 2008 as well.

To test that, I recently started a Windows 2003 Server VM, installed VCExpress 2008 and cfix and attempted to run an example project in the VC debugger. As long as no assertions fired, everything seemed fine. I then altered the example’s source code so that one of the assertion would fail, ran it in the debugger — and waited. Nothing happened.

When an assertion fires, cfix 1.1 attempts to capture the stack trace to make debugging easier. For this to work, the respective debugging symbols must be loaded, and — if not yet available — be downloaded from the symbol server. It is thus not uncommon that it takes a couple of seconds before the failure message appears and the debugger breaks in when an assertion fails. But after about a minute, still nothing happened — a deadlock seems to have occured.

To make the situation worse, I then noticed that while cfix was hanging, Internet Explorer would not properly start any more. After creating its main window, it would hang as well.

So I rebooted the machine and hooked up a kernel debugger. Luckily, the scenario was easy to reproduce and I could take a closer look…

Debugging the deadlock

First, let’s find the cfix32 process:

kd> !process 0 0 cfix32.exe
PROCESS 8254e490  SessionId: 0  Cid: 06b0    Peb: 7ffde000  ParentCid: 0658
    DirBase: 06e90000  ObjectTable: e170ee58  HandleCount:  69.
    Image: cfix32.exe

Now that we have the VA, we can switch to the process and load its user mode symbols:

kd> .process /r /p 8254e490  
Implicit process is now 8254e490
.cache forcedecodeuser done
Loading User Symbols
...............................

Looking at this process’ threads, it is easy to see that one stack looks suspicious:

kd> !process 8254e490  
PROCESS 8254e490  SessionId: 0  Cid: 06b0    
	Peb: 7ffde000  ParentCid: 0658
    DirBase: 06e90000  ObjectTable: e170ee58  HandleCount:  69.
    Image: cfix32.exe
    VadRoot 82455628 Vads 76 Clone 0 Private 491. 
    Modified 8. Locked 0.
    DeviceMap e148f3c0

    [...]

THREAD 826276f8  Cid 06b0.06b4  Teb: 7ffdd000 Win32Thread: 
  e1734a38 WAIT: (Unknown) KernelMode Non-Alertable
	f4422b44  SynchronizationEvent
Not impersonating
DeviceMap                 e148f3c0
Owning Process            8254e490       Image:         cfix32.exe
Wait Start TickCount      13474          Ticks: 9138 (0:00:01:31.511)
Context Switch Count      219                 LargeStack
UserTime                  00:00:00.010
KernelTime                00:00:00.010
Win32 Start Address cfix32!wmainCRTStartup (0x010048cf)
Start Address kernel32!BaseProcessStartThunk (0x77e617f8)
Stack Init f4423000 Current f4422aa0 Base f4423000 Limit f441f000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0

ChildEBP RetAddr  
f4422ab8 808202b6 nt!KiSwapContext+0x25
f4422ad0 8081fb6e nt!KiSwapThread+0x83 
f4422b14 809bae63 nt!KeWaitForSingleObject+0x2e0 
f4422bf4 809bc06d nt!DbgkpQueueMessage+0x178 
f4422c18 8096ba9a nt!DbgkpSendApiMessage+0x45 
f4422cc8 80909942 nt!DbgkMapViewOfSection+0xcf 
f4422d34 8082350b nt!NtMapViewOfSection+0x269 
f4422d34 7c8285ec nt!KiFastCallEntry+0xf8 
00069b94 7c82728b ntdll!KiFastSystemCallRet 
00069b98 7c831e05 ntdll!NtMapViewOfSection+0xc
00069bdc 7c831fd6 ntdll!LdrpMapViewOfDllSection+0x64 
00069ccc 7c833027 ntdll!LdrpMapDll+0x390 
[...]

The thread is blocked — it is waiting on en event. Events do not have an owner so we have to do a little more to find out, what it is waiting for. Unfortunately, !thread crops the trace, so let us first get the full one:

kd> .thread 826276f8  
kd> kn100
 # ChildEBP RetAddr  
00 f4422ab8 808202b6 nt!KiSwapContext+0x25
01 f4422ad0 8081fb6e nt!KiSwapThread+0x83
02 f4422b14 809bae63 nt!KeWaitForSingleObject+0x2e0
03 f4422bf4 809bc06d nt!DbgkpQueueMessage+0x178
04 f4422c18 8096ba9a nt!DbgkpSendApiMessage+0x45
05 f4422cc8 80909942 nt!DbgkMapViewOfSection+0xcf
06 f4422d34 8082350b nt!NtMapViewOfSection+0x269
07 f4422d34 7c8285ec nt!KiFastCallEntry+0xf8
08 00069b94 7c82728b ntdll!KiFastSystemCallRet
09 00069b98 7c831e05 ntdll!NtMapViewOfSection+0xc
0a 00069bdc 7c831fd6 ntdll!LdrpMapViewOfDllSection+0x64
0b 00069ccc 7c833027 ntdll!LdrpMapDll+0x390
0c 00069f30 7c8330f5 ntdll!LdrpLoadImportModule+0x17c
0d 00069f70 7c8330a4 ntdll!LdrpHandleOneNewFormatImportDescriptor+0x4d
0e 00069f8c 7c833248 ntdll!LdrpHandleNewFormatImportDescriptors+0x1d
0f 0006a014 7c833049 ntdll!LdrpWalkImportDescriptor+0x195
10 0006a264 7c8330f5 ntdll!LdrpLoadImportModule+0x1cb
11 0006a2a4 7c8330a4 ntdll!LdrpHandleOneNewFormatImportDescriptor+0x4d
12 0006a2c0 7c833248 ntdll!LdrpHandleNewFormatImportDescriptors+0x1d
13 0006a348 7c83427d ntdll!LdrpWalkImportDescriptor+0x195
14 0006a5e0 7c834065 ntdll!LdrpLoadDll+0x241
15 0006a85c 77e41bf3 ntdll!LdrLoadDll+0x198
16 0006a8c4 77e41dbd kernel32!LoadLibraryExW+0x1b2
17 0006a8d8 77e41df3 kernel32!LoadLibraryExA+0x1f
18 0006a8f8 46a7870c kernel32!LoadLibraryA+0xb5
19 0006a954 46a93b3e WININET!__delayLoadHelper2+0xfc
1a 0006a994 46a93950 WININET!_tailMerge_RASAPI32_dll+0xd
1b 0006a9a8 46a93a4e WININET!DoConnectoidsExist+0x2b
1c 0006a9d4 46a93abc WININET!GetRasConnections+0x34
1d 0006a9f0 46a8c559 WININET!IsDialUpConnection+0xa9
1e 0006aa0c 46a97a44 WININET!FixProxySettingsForCurrentConnection+0x31
1f 0006b5e4 46aa3774 WININET!InternetQueryOptionA+0xa47
20 0006b748 01d12dc6 WININET!InternetQueryOptionW+0x1fa
21 0006b98c 01d12583 symsrv!StoreWinInet::dumpproxyinfo+0x46
22 0006be04 01d1290a symsrv!StoreWinInet::connect+0x273
23 0006c040 01d05ae7 symsrv!StoreWinInet::find+0x3a
24 0006c134 01d06c47 symsrv!cascade+0x87
25 0006c684 01d06a57 symsrv!SymbolServerByIndexW+0x127
26 0006c8b4 0302e30e symsrv!SymbolServerW+0x77
27 0006ccf4 03018eed dbghelp!symsrvGetFile+0x12e
28 0006d9dc 03019f57 dbghelp!diaLocatePdb+0x33d
29 0006dc58 03041ade dbghelp!diaGetPdb+0x207
2a 0006de7c 0303ff15 dbghelp!GetDebugData+0x2be
2b 0006e324 03040516 dbghelp!modload+0x305
2c 0006e7a4 0304068e dbghelp!LoadModule+0x3f6
2d 0006e9e8 03044eaf dbghelp!GetModule+0x4e
2e 0006ea30 03044bda dbghelp!NTGetProcessModules+0x16f
2f 0006eae8 03032e80 dbghelp!GetProcessModules+0x4a
30 0006ed70 60f03f7a dbghelp!SymInitializeW+0x320
31 0006f17c 60f032b0 cfix!CfixpCaptureStackTrace+0x117 
32 0006f598 100419b5 cfix!CfixPeReportFailedAssertion+0xc5 
WARNING: Stack unwind information not available. 
Following frames may be wrong.
33 0006f7c0 10040f40 VsSample!__CfixFixturePeSimpleAdderTest+0x5ab1
34 0006f8b4 10040db6 VsSample!__CfixFixturePeSimpleAdderTest+0x503c
35 0006fa68 10040a8a VsSample!__CfixFixturePeSimpleAdderTest+0x4eb2
36 0006fb48 60f02b64 VsSample!__CfixFixturePeSimpleAdderTest+0x4b86
37 0006fb84 60f02be6 cfix!CfixsRunTestRoutine+0x33 
38 0006fb94 60f038e9 cfix!CfixsRunTestCaseMethod+0x27 
39 0006fbac 60f03a06 cfix!CfixsRunTestCase+0x25 
3a 0006fbcc 60f03ce5 cfix!CfixsRunTsexecActionMethod+0xfb 
3b 0006fbf0 0100e135 cfix!CfixsRunSequenceAction+0x122 
3c 0006fc2c 0100d5c2 cfix32!CfixrunpRunFixtures+0x90 
3d 0006fc40 0100d85c cfix32!CfixrunsMainWorker+0x3f 
3e 0006fe7c 010046b8 cfix32!CfixrunMain+0x1b9 
3f 0006fee0 0100485e cfix32!wmain+0x80 
40 0006ffc0 77e6f23b cfix32!_wmainCRTStartup+0x12b 
41 0006fff0 00000000 kernel32!BaseProcessStart+0x23

Whoa, what a trace! cfix!CfixpCaptureStackTrace tries to assemble a stack trace, for which it has to initialize dbghelp.dll first. dbghelp!SymInitializeW seeks help of symsrv.dll, which in turn tries to connect to the Microsoft symbol server. Before it can so, it obviously attempts to get its proxy settings straight, which in turn leads to some DLL (rasapi.dll, in case you wonder) being loaded. The loader then calls into the debugging subsystem (nt!Dbgk*). It may be assumed that the loader is notifying the debugger about the DLL having been loaded.

Turining our attention to Internet Explorer, we look at iexplore.exe’s threads:

kd> !process 0 0 iexplore.exe
PROCESS 824ec3b0  SessionId: 0  Cid: 07f0    Peb: 7ffdb000  ParentCid: 054c
    DirBase: 0307c000  ObjectTable: e15186b8  HandleCount: 225.
    Image: iexplore.exe
    
kd> .process /r /p 824ec3b0  
Implicit process is now 824ec3b0
.cache forcedecodeuser done
Loading User Symbols
...............................................

Now, iexplore has lots of threads, but skimming over them, one looked interesting:

kd> !process 824ec3b0  
PROCESS 824ec3b0  SessionId: 0  Cid: 07f0    
	Peb: 7ffdb000  ParentCid: 054c
    DirBase: 0307c000  ObjectTable: e15186b8  HandleCount: 225.
    Image: iexplore.exe
    VadRoot 824991c8 Vads 168 Clone 0 Private 643. 
    Modified 44. Locked 0.
    DeviceMap e148f3c0

    [...]
    
THREAD 82431980  Cid 07f0.00b8  Teb: 7ffd5000 Win32Thread: 00000000 
  WAIT: (Unknown) UserMode Non-Alertable
	82631eb0  Mutant - owning thread 826276f8
Not impersonating
DeviceMap                 e148f3c0
Owning Process            824ec3b0       Image:         iexplore.exe
Wait Start TickCount      21924          Ticks: 688 (0:00:00:06.889)
Context Switch Count      1             
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address ntdll!RtlpWorkerThread (0x7c839efb)
Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
Stack Init f40a3000 Current f40a2c78 Base f40a3000 Limit f40a0000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0

ChildEBP RetAddr  
f40a2c90 808202b6 nt!KiSwapContext+0x25 
f40a2ca8 8081fb6e nt!KiSwapThread+0x83 
f40a2cec 8090e64e nt!KeWaitForSingleObject+0x2e0 
f40a2d50 8082350b nt!NtWaitForSingleObject+0x9a 
f40a2d50 7c8285ec nt!KiFastCallEntry+0xf8 
01e5fdd0 7c827d0b ntdll!KiFastSystemCallRet 
01e5fdd4 77e61d1e ntdll!NtWaitForSingleObject+0xc 
01e5fe44 77e61c8d kernel32!WaitForSingleObjectEx+0xac 
01e5fe58 46a8c54d kernel32!WaitForSingleObject+0x12 
01e5fe74 46a7eeca WININET!FixProxySettingsForCurrentConnection+0x25 
01e5fe8c 46a7ee3f WININET!CFsm_HttpSendRequest::RunSM+0x61 
01e5fea4 46a7efa3 WININET!CFsm::Run+0x39 
01e5fed4 77da5938 WININET!CFsm::RunWorkItem+0x79 
01e5feec 7c83a827 SHLWAPI!ExecuteWorkItem+0x1d 
01e5ff44 7c83aa0b ntdll!RtlpWorkerCallout+0x71 
01e5ff64 7c83aa82 ntdll!RtlpExecuteWorkerRequest+0x4f 
01e5ff78 7c839f60 ntdll!RtlpApcCallout+0x11 
01e5ffb8 77e64829 ntdll!RtlpWorkerThread+0x61 
01e5ffec 00000000 kernel32!BaseThreadStart+0x34 

Now we are getting somewhere. We have seen FixProxySettingsForCurrentConnection in cfix’s trace already — but in this case, it is waiting on something. Let’s see…


kd> !object 82631eb0  
Object: 82631eb0  Type: (827a5550) Mutant
    ObjectHeader: 82631e98 (old version)
    HandleCount: 3  PointerCount: 6
    Directory Object: e1496420  Name: WininetProxyRegistryMutex

And 826276f8, that’s the cfix32 thread we have already assessed. Obviously, iexplore waits for cfix to release the WininetProxyRegistryMutex, and cfix waits on someone else.

Turning over to VC, we can find a stack that also contains a call to FixProxySettingsForCurrentConnection on its stack. Again, blocking on WininetProxyRegistryMutex.

kd> k100
ChildEBP RetAddr  
f4492c90 808202b6 nt!KiSwapContext+0x25
f4492ca8 8081fb6e nt!KiSwapThread+0x83
f4492cec 8090e64e nt!KeWaitForSingleObject+0x2e0
f4492d50 8082350b nt!NtWaitForSingleObject+0x9a
f4492d50 7c8285ec nt!KiFastCallEntry+0xf8
065b95c8 7c827d0b ntdll!KiFastSystemCallRet
065b95cc 77e61d1e ntdll!NtWaitForSingleObject+0xc
065b963c 77e61c8d kernel32!WaitForSingleObjectEx+0xac
065b9650 46a8c54d kernel32!WaitForSingleObject+0x12
065b966c 46a7eeca WININET!FixProxySettingsForCurrentConnection+0x25
065b9684 46a7ee3f WININET!CFsm_HttpSendRequest::RunSM+0x61
065b969c 46a7fefa WININET!CFsm::Run+0x39
065b96b4 46ab0a67 WININET!DoFsm+0x25
065b96dc 46aa1092 WININET!HttpWrapSendRequest+0x148
065b9714 06b231da WININET!HttpSendRequestW+0x5e
065b973c 06b22ea8 SYMSRV!StoreWinInet::request+0x2a
065b9770 06b226cc SYMSRV!StoreWinInet::fileinfo+0x18
065b9780 06b22741 SYMSRV!StoreWinInet::get+0x7c
065b9fc4 06b229a3 SYMSRV!StoreWinInet::open+0x41
065ba204 06b15ae7 SYMSRV!StoreWinInet::find+0xd3
065ba2f8 06b16c47 SYMSRV!cascade+0x87
065ba848 06b16a57 SYMSRV!SymbolServerByIndexW+0x127
065baa78 51412896 SYMSRV!SymbolServerW+0x77
065bb8cc 51413383 mspdb80!LOCATOR::SYMSRV::SymbolServer+0x190
065bbf10 514136f8 mspdb80!LOCATOR::FLocatePdbSymsrv+0x75
065bbf38 514139ce mspdb80!LOCATOR::FLocatePdbPathHelper+0x179
065bc96c 51413cbe mspdb80!LOCATOR::FLocatePdbPath+0x105
065bccb4 51414371 mspdb80!LOCATOR::FLocatePdb+0x1ad
065bd9a8 458cc1e8 mspdb80!PDBCommon::OpenValidate5+0xab
065bd9ec 45959d4c msenc90!enc::EncImageEdit::
                  `scalar deleting destructor'+0x4d
065bda34 45958e62 NatDbgDE!OLPDBOpen+0x93
065be6c0 45958f0a NatDbgDE!OLStart+0x107
065be6fc 45958fae NatDbgDE!LoadOmfForReal+0x23
065be714 45959019 NatDbgDE!LoadSymbols+0x43
065be72c 459590d9 NatDbgDE!OLLoadOmf+0x55
065be75c 45959154 NatDbgDE!SHLoadDll+0xd5
065be7ac 45959247 NatDbgDE!CSymbolHandlerX::SHLoadDll+0x5a
065be844 4595937c NatDbgDE!CModule::Load+0x1a1
065be8ac 4594d002 NatDbgDE!CNativeProcess::NotifyModLoad+0xc8
065be9ec 4594cf6d NatDbgDE!EngineCallback+0xb3
065bea18 45958d3f NatDbgDE!EMCallBackDB+0x4c
065bf050 4594d0dc NatDbgDE!LoadFixups+0x218
065bf0ac 4594d289 NatDbgDE!DebugPacket+0x213
065bfdb4 4594cf39 NatDbgDE!EMFunc+0x40f
065bfddc 4594d73d NatDbgDE!TLCallBack+0x1e
065bfdf4 4594d711 NatDbgDE!TLClientLib::Local_TLFunc+0xc8
065bfe3c 4594d85c NatDbgDE!DMSendPacket+0x121
065bfee8 45959b1d NatDbgDE!NotifyEM+0x3ae
065bff0c 4594d663 NatDbgDE!ProcessLoadDLLEvent+0x47
065bff44 4594d686 NatDbgDE!ProcessDebugEvent+0x30d
065bffb8 77e64829 NatDbgDE!DmPollLoop+0x3c
065bffec 00000000 kernel32!BaseThreadStart+0x34

But — looking closely, it becomes obvious that this thread must be the one handling debug events, and in fact, the call to ProcessLoadDLLEvent is a strong indication for that this thread is currently handling a DLL load event. And now we have closed the loop — this thread must be handling the DLL load event for rasapi.dll, the DLL which cfix was about to load. And to do this, VC attempts to acquire the WininetProxyRegistryMutex, which is owned by the original cfix thread. Deadlock.

What is interesting about this situation is that neither party — cfix, iexplore or VCExpress, and also none of the modules clearly is the culprit and behaving wrong. It is more like a combination of special circumstances that bring up the deadlock as discussed.

It is also notable that I am not using any particular proxy settings on this machine and automatic proxy configuration has been turned off.

So far, I have not experienced the same problem with VS 2003 and VS 2005 — I thus assume that only VS 2008 is affected by this.

Although I am pretty sure that cfix is not really at fault here, I have to adapt it to avoid this deadlock in the future. Until an updated version is available, you can use this workaround.

Thread 0:0 is special

Thread IDs uniquely identify a thread — this certainly holds for user mode threads and should also hold for kernel mode threads. But there is one kind of thread where the ID does not uniquely identify a KTHREAD — the Idle thread.

On a uniprocessor system, there is only one Idle thread and this idle thread will have the thread ID 0 (in process 0). On a multiprocessor system, however, Windows creates one Idle thread per CPU. That makes sense — however, what may be surprising at first is that although all Idle threads have their own KTHREAD structure, all share the same thread ID 0 (CID 0:0). That is, each multiprocessor system will have multiple threads with ID 0, which in turn means that CID 0:0 does not uniquely identify a single thread.

This is easily verified using the kernel debugger on a multiprocessor system (a quad core in this case):

0: kd> !running

System Processors f (affinity mask)
  Idle Processors f
All processors idle.

0: kd> !pcr 0
KPCR for Processor 0 at ffdff000:
[...]

	      CurrentThread: 8089d8c0
	         NextThread: 00000000
	         IdleThread: 8089d8c0


0: kd> !thread 8089d8c0
THREAD 8089d8c0  Cid 0000.0000  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 0
[...]

0: kd> !pcr 1
KPCR for Processor 1 at f7727000:
[...]

	      CurrentThread: f772a090
	         NextThread: 00000000
	         IdleThread: f772a090
	         
0: kd> !thread f772a090
THREAD f772a090  Cid 0000.0000  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 1
[...]

...and so on.

Now, the idle threads are usually not of major intest. Still, it is possible that these threads can become relevant — such as when doing certain kinds of analysis such as tracing interrupts. If such analysis groups events by the CID of the thread they were captured on (rather than the VA of the KTHREAD structure), the results for CID 0:0 will be wrong.

Not a big thing. Still, it took me a while to figure out that this was indeed the reason for some of my traces containing timestamps in strange order: The traces had been created on a quadcode machine and I did make the mistake to correlate the events by their CID during later analsis. As a result, the traces of the four idle threads were all mixed up…

cfix 1.1 introduces NT kernel mode unit tests

cfix 1.1 introduces a number of new features. The most important among these is the additional ability to write kernel mode unit tests, i.e. unit tests that are run in kernel mode. Needless to say, cfix 1.1 still supports user mode unit tests.

All contemporary unit testing frameworks focus on unit testing in user mode. Certainly, the vast majority of testing code can be assumed to be targeting user mode, so this does not come at a surprise. Tools for driver testing, of which there are quite a few, focus on integration testing — they usually test whether the driver works in its entirety.

While these tools are very useful indeed, they do not support true unit testing — i.e. offering the ability to test individual routines or subsystems of a driver. To perform such tests, it would be neccessary to write a separate test driver or revert to other techniques such as this one.

cfix 1.1 fills in this gap and offers the ability to write kernel mode tests. That way, individual parts of what may eventually become a driver can thoroughly be tested in isolation, without neccessitating much boilerplate code.

Example

Writing a kernel mode unit test is as easy as writing a user mode unit test — the API is the same for user and kernel mode tests. Even the tools, cfix32 and cfix64 are the same for both modi. The only true difference is that kernel mode tests require slightly different build settings.

The following listing shows an example for a kernel mode unit test — but the same code could just as well be compiled into a user mode unit test.

#include <cfix.h>

static void FixtureSetup()
{
  CFIX_ASSERT( 0 != 1 );
}

static void FixtureTeardown()
{
  CFIX_LOG( L"Tearing down..." );
}

/*++
  Test routine -- do the actual testing.
--*/
static void Test1()
{
  ULONG a = 1;
  ULONG b = 1;
  CFIX_ASSERT_EQUALS_ULONG( a, b );
  CFIX_ASSERT( a + b == 2 );
  
  // You are free to use all WDM APIs here!
  
  CFIX_LOG( L"a=%d, b=%d", a, b );
}


/*++
  Define a test fixture. 
--*/
CFIX_BEGIN_FIXTURE( MyFixture )
  CFIX_FIXTURE_ENTRY( Test1 )

  CFIX_FIXTURE_SETUP( FixtureSetup )
  CFIX_FIXTURE_TEARDOWN( FixtureTeardown )
CFIX_END_FIXTURE()

Once built, the test can be run from the command line:

C:\cfix\bin\i386>cfix32 -nologo -kern ktest.sys
Module: ktest (ktest.sys)
  Fixture: MyFixture
    Test1

For a more detailed discussion and more example code, please refer to the tutorial.

Architecture

For user mode code, the cfix architecture roughly looks like this:

The tests are compiled into a DLL. Using the testrunner application cfix32 or cfix64, one or more fixtures defined in the DLL can be run and the results are reported to the console or to a log file.

For kernel mode code, the acrhitecture looks a little different. The tests are compiled into a driver rather than into a DLL. The driver is verly lightweight and, besides the tests, contains only very little cfix-provided code (basically, just a DriverEntry implementation).

When cfix32 or cfix64 is requested to run a kernel mode tests, it will load the Reflector, a driver that contains the kernel mode fraction of the testing framework. Relaying control operation and output through the reflector, the kernel mode unit tests can be run.

All these additional steps are performed without additional user intervention — the drivers are installed, loaded and stopped automatically. From a user perspective, running a kernel mode tests feels just like running a user mode test.

More…

cfix 1.1 introduces additional new features. I will discuss some of them over the next weeks. In any case, whether you have not used cfix yet or are a cfix 1.0 user, you should go straight to the download page now.

Next Page »


Categories

Try Visual Assert, the unit testing add-in for Visual Studio (R)


NTrace: Function Boundary Tracing for Windows on IA-32

About me

Johannes Passing, M.Sc., living in Berlin, Germany.

Besides his consulting work, Johannes mainly focusses on Win32, COM, and NT kernel mode development, along with Java and .Net. He also is the author of cfix, a C/C++ unit testing framework for Win32 and NT kernel mode, Visual Assert, a Visual Studio Unit Testing-AddIn, and NTrace, a dynamic function boundary tracing toolkit for Windows NT/x86 kernel/user mode code.

Contact Johannes: jpassing (at) acm org

Johannes' GPG fingerprint is BBB1 1769 B82D CD07 D90A 57E8 9FE1 D441 F7A0 1BB1.

LinkedIn LinkedIn Profile
Xing Xing Profile
Twitter Follow me on Twitter (new)

Follow

Get every new post delivered to your Inbox.