Debugging a Debugger Deadlock

While I still use VisualStudio 2005 Team System for most of my development, I want to make sure that cfix works properly with VisualStudio 2008 as well.

To test that, I recently started a Windows 2003 Server VM, installed VCExpress 2008 and cfix and attempted to run an example project in the VC debugger. As long as no assertions fired, everything seemed fine. I then altered the example’s source code so that one of the assertion would fail, ran it in the debugger – and waited. Nothing happened.

When an assertion fires, cfix 1.1 attempts to capture the stack trace to make debugging easier. For this to work, the respective debugging symbols must be loaded, and – if not yet available – be downloaded from the symbol server. It is thus not uncommon that it takes a couple of seconds before the failure message appears and the debugger breaks in when an assertion fails. But after about a minute, still nothing happened – a deadlock seems to have occured.

To make the situation worse, I then noticed that while cfix was hanging, Internet Explorer would not properly start any more. After creating its main window, it would hang as well.

So I rebooted the machine and hooked up a kernel debugger. Luckily, the scenario was easy to reproduce and I could take a closer look…

Debugging the deadlock

First, let’s find the cfix32 process:

    kd!process 0 0 cfix32.exe
    PROCESS 8254e490  SessionId: 0  Cid: 06b0    Peb: 7ffde000  ParentCid: 0658
        DirBase: 06e90000  ObjectTable: e170ee58  HandleCount:  69.
        Image: cfix32.exe

Now that we have the VA, we can switch to the process and load its user mode symbols:

    kd.process /r /p 8254e490  
    Implicit process is now 8254e490
    .cache forcedecodeuser done
    Loading User Symbols

Looking at this process’ threads, it is easy to see that one stack looks suspicious:

    kd!process 8254e490  
    PROCESS 8254e490  SessionId: 0  Cid: 06b0    
    	Peb: 7ffde000  ParentCid: 0658
        DirBase: 06e90000  ObjectTable: e170ee58  HandleCount:  69.
        Image: cfix32.exe
        VadRoot 82455628 Vads 76 Clone 0 Private 491. 
        Modified 8. Locked 0.
        DeviceMap e148f3c0
    THREAD 826276f8  Cid 06b0.06b4  Teb: 7ffdd000 Win32Thread: 
      e1734a38 WAIT: (Unknown) KernelMode Non-Alertable
    	f4422b44  SynchronizationEvent
    Not impersonating
    DeviceMap                 e148f3c0
    Owning Process            8254e490       Image:         cfix32.exe
    Wait Start TickCount      13474          Ticks: 9138 (0:00:01:31.511)
    Context Switch Count      219                 LargeStack
    UserTime                  00:00:00.010
    KernelTime                00:00:00.010
    Win32 Start Address cfix32!wmainCRTStartup (0x010048cf)
    Start Address kernel32!BaseProcessStartThunk (0x77e617f8)
    Stack Init f4423000 Current f4422aa0 Base f4423000 Limit f441f000 Call 0
    Priority 10 BasePriority 8 PriorityDecrement 0
    ChildEBP RetAddr  
    f4422ab8 808202b6 nt!KiSwapContext+0x25
    f4422ad0 8081fb6e nt!KiSwapThread+0x83 
    f4422b14 809bae63 nt!KeWaitForSingleObject+0x2e0 
    f4422bf4 809bc06d nt!DbgkpQueueMessage+0x178 
    f4422c18 8096ba9a nt!DbgkpSendApiMessage+0x45 
    f4422cc8 80909942 nt!DbgkMapViewOfSection+0xcf 
    f4422d34 8082350b nt!NtMapViewOfSection+0x269 
    f4422d34 7c8285ec nt!KiFastCallEntry+0xf8 
    00069b94 7c82728b ntdll!KiFastSystemCallRet 
    00069b98 7c831e05 ntdll!NtMapViewOfSection+0xc
    00069bdc 7c831fd6 ntdll!LdrpMapViewOfDllSection+0x64 
    00069ccc 7c833027 ntdll!LdrpMapDll+0x390 

The thread is blocked – it is waiting on en event. Events do not have an owner so we have to do a little more to find out, what it is waiting for. Unfortunately, !thread crops the trace, so let us first get the full one:

    kd.thread 826276f8  
     # ChildEBP RetAddr  
    00 f4422ab8 808202b6 nt!KiSwapContext+0x25
    01 f4422ad0 8081fb6e nt!KiSwapThread+0x83
    02 f4422b14 809bae63 nt!KeWaitForSingleObject+0x2e0
    03 f4422bf4 809bc06d nt!DbgkpQueueMessage+0x178
    04 f4422c18 8096ba9a nt!DbgkpSendApiMessage+0x45
    05 f4422cc8 80909942 nt!DbgkMapViewOfSection+0xcf
    06 f4422d34 8082350b nt!NtMapViewOfSection+0x269
    07 f4422d34 7c8285ec nt!KiFastCallEntry+0xf8
    08 00069b94 7c82728b ntdll!KiFastSystemCallRet
    09 00069b98 7c831e05 ntdll!NtMapViewOfSection+0xc
    0a 00069bdc 7c831fd6 ntdll!LdrpMapViewOfDllSection+0x64
    0b 00069ccc 7c833027 ntdll!LdrpMapDll+0x390
    0c 00069f30 7c8330f5 ntdll!LdrpLoadImportModule+0x17c
    0d 00069f70 7c8330a4 ntdll!LdrpHandleOneNewFormatImportDescriptor+0x4d
    0e 00069f8c 7c833248 ntdll!LdrpHandleNewFormatImportDescriptors+0x1d
    0f 0006a014 7c833049 ntdll!LdrpWalkImportDescriptor+0x195
    10 0006a264 7c8330f5 ntdll!LdrpLoadImportModule+0x1cb
    11 0006a2a4 7c8330a4 ntdll!LdrpHandleOneNewFormatImportDescriptor+0x4d
    12 0006a2c0 7c833248 ntdll!LdrpHandleNewFormatImportDescriptors+0x1d
    13 0006a348 7c83427d ntdll!LdrpWalkImportDescriptor+0x195
    14 0006a5e0 7c834065 ntdll!LdrpLoadDll+0x241
    15 0006a85c 77e41bf3 ntdll!LdrLoadDll+0x198
    16 0006a8c4 77e41dbd kernel32!LoadLibraryExW+0x1b2
    17 0006a8d8 77e41df3 kernel32!LoadLibraryExA+0x1f
    18 0006a8f8 46a7870c kernel32!LoadLibraryA+0xb5
    19 0006a954 46a93b3e WININET!__delayLoadHelper2+0xfc
    1a 0006a994 46a93950 WININET!_tailMerge_RASAPI32_dll+0xd
    1b 0006a9a8 46a93a4e WININET!DoConnectoidsExist+0x2b
    1c 0006a9d4 46a93abc WININET!GetRasConnections+0x34
    1d 0006a9f0 46a8c559 WININET!IsDialUpConnection+0xa9
    1e 0006aa0c 46a97a44 WININET!FixProxySettingsForCurrentConnection+0x31
    1f 0006b5e4 46aa3774 WININET!InternetQueryOptionA+0xa47
    20 0006b748 01d12dc6 WININET!InternetQueryOptionW+0x1fa
    21 0006b98c 01d12583 symsrv!StoreWinInet::dumpproxyinfo+0x46
    22 0006be04 01d1290a symsrv!StoreWinInet::connect+0x273
    23 0006c040 01d05ae7 symsrv!StoreWinInet::find+0x3a
    24 0006c134 01d06c47 symsrv!cascade+0x87
    25 0006c684 01d06a57 symsrv!SymbolServerByIndexW+0x127
    26 0006c8b4 0302e30e symsrv!SymbolServerW+0x77
    27 0006ccf4 03018eed dbghelp!symsrvGetFile+0x12e
    28 0006d9dc 03019f57 dbghelp!diaLocatePdb+0x33d
    29 0006dc58 03041ade dbghelp!diaGetPdb+0x207
    2a 0006de7c 0303ff15 dbghelp!GetDebugData+0x2be
    2b 0006e324 03040516 dbghelp!modload+0x305
    2c 0006e7a4 0304068e dbghelp!LoadModule+0x3f6
    2d 0006e9e8 03044eaf dbghelp!GetModule+0x4e
    2e 0006ea30 03044bda dbghelp!NTGetProcessModules+0x16f
    2f 0006eae8 03032e80 dbghelp!GetProcessModules+0x4a
    30 0006ed70 60f03f7a dbghelp!SymInitializeW+0x320
    31 0006f17c 60f032b0 cfix!CfixpCaptureStackTrace+0x117 
    32 0006f598 100419b5 cfix!CfixPeReportFailedAssertion+0xc5 
    WARNING: Stack unwind information not available. 
    Following frames may be wrong.
    33 0006f7c0 10040f40 VsSample!__CfixFixturePeSimpleAdderTest+0x5ab1
    34 0006f8b4 10040db6 VsSample!__CfixFixturePeSimpleAdderTest+0x503c
    35 0006fa68 10040a8a VsSample!__CfixFixturePeSimpleAdderTest+0x4eb2
    36 0006fb48 60f02b64 VsSample!__CfixFixturePeSimpleAdderTest+0x4b86
    37 0006fb84 60f02be6 cfix!CfixsRunTestRoutine+0x33 
    38 0006fb94 60f038e9 cfix!CfixsRunTestCaseMethod+0x27 
    39 0006fbac 60f03a06 cfix!CfixsRunTestCase+0x25 
    3a 0006fbcc 60f03ce5 cfix!CfixsRunTsexecActionMethod+0xfb 
    3b 0006fbf0 0100e135 cfix!CfixsRunSequenceAction+0x122 
    3c 0006fc2c 0100d5c2 cfix32!CfixrunpRunFixtures+0x90 
    3d 0006fc40 0100d85c cfix32!CfixrunsMainWorker+0x3f 
    3e 0006fe7c 010046b8 cfix32!CfixrunMain+0x1b9 
    3f 0006fee0 0100485e cfix32!wmain+0x80 
    40 0006ffc0 77e6f23b cfix32!_wmainCRTStartup+0x12b 
    41 0006fff0 00000000 kernel32!BaseProcessStart+0x23

Whoa, what a trace! cfix!CfixpCaptureStackTrace tries to assemble a stack trace, for which it has to initialize dbghelp.dll first. dbghelp!SymInitializeW seeks help of symsrv.dll, which in turn tries to connect to the Microsoft symbol server. Before it can so, it obviously attempts to get its proxy settings straight, which in turn leads to some DLL (rasapi.dll, in case you wonder) being loaded. The loader then calls into the debugging subsystem (nt!Dbgk*). It may be assumed that the loader is notifying the debugger about the DLL having been loaded.

Turining our attention to Internet Explorer, we look at iexplore.exe’s threads:

    kd!process 0 0 iexplore.exe
    PROCESS 824ec3b0  SessionId: 0  Cid: 07f0    Peb: 7ffdb000  ParentCid: 054c
        DirBase: 0307c000  ObjectTable: e15186b8  HandleCount: 225.
        Image: iexplore.exe
    kd.process /r /p 824ec3b0  
    Implicit process is now 824ec3b0
    .cache forcedecodeuser done
    Loading User Symbols

Now, iexplore has lots of threads, but skimming over them, one looked interesting:

    kd!process 824ec3b0  
    PROCESS 824ec3b0  SessionId: 0  Cid: 07f0    
    	Peb: 7ffdb000  ParentCid: 054c
        DirBase: 0307c000  ObjectTable: e15186b8  HandleCount: 225.
        Image: iexplore.exe
        VadRoot 824991c8 Vads 168 Clone 0 Private 643. 
        Modified 44. Locked 0.
        DeviceMap e148f3c0
    THREAD 82431980  Cid 07f0.00b8  Teb: 7ffd5000 Win32Thread: 00000000 
      WAIT: (Unknown) UserMode Non-Alertable
    	82631eb0  Mutant - owning thread 826276f8
    Not impersonating
    DeviceMap                 e148f3c0
    Owning Process            824ec3b0       Image:         iexplore.exe
    Wait Start TickCount      21924          Ticks: 688 (0:00:00:06.889)
    Context Switch Count      1             
    UserTime                  00:00:00.000
    KernelTime                00:00:00.000
    Win32 Start Address ntdll!RtlpWorkerThread (0x7c839efb)
    Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
    Stack Init f40a3000 Current f40a2c78 Base f40a3000 Limit f40a0000 Call 0
    Priority 8 BasePriority 8 PriorityDecrement 0
    ChildEBP RetAddr  
    f40a2c90 808202b6 nt!KiSwapContext+0x25 
    f40a2ca8 8081fb6e nt!KiSwapThread+0x83 
    f40a2cec 8090e64e nt!KeWaitForSingleObject+0x2e0 
    f40a2d50 8082350b nt!NtWaitForSingleObject+0x9a 
    f40a2d50 7c8285ec nt!KiFastCallEntry+0xf8 
    01e5fdd0 7c827d0b ntdll!KiFastSystemCallRet 
    01e5fdd4 77e61d1e ntdll!NtWaitForSingleObject+0xc 
    01e5fe44 77e61c8d kernel32!WaitForSingleObjectEx+0xac 
    01e5fe58 46a8c54d kernel32!WaitForSingleObject+0x12 
    01e5fe74 46a7eeca WININET!FixProxySettingsForCurrentConnection+0x25 
    01e5fe8c 46a7ee3f WININET!CFsm_HttpSendRequest::RunSM+0x61 
    01e5fea4 46a7efa3 WININET!CFsm::Run+0x39 
    01e5fed4 77da5938 WININET!CFsm::RunWorkItem+0x79 
    01e5feec 7c83a827 SHLWAPI!ExecuteWorkItem+0x1d 
    01e5ff44 7c83aa0b ntdll!RtlpWorkerCallout+0x71 
    01e5ff64 7c83aa82 ntdll!RtlpExecuteWorkerRequest+0x4f 
    01e5ff78 7c839f60 ntdll!RtlpApcCallout+0x11 
    01e5ffb8 77e64829 ntdll!RtlpWorkerThread+0x61 
    01e5ffec 00000000 kernel32!BaseThreadStart+0x34 

Now we are getting somewhere. We have seen FixProxySettingsForCurrentConnection in cfix’s trace already – but in this case, it is waiting on something. Let’s see…

    kd!object 82631eb0  
    Object: 82631eb0  Type: (827a5550) Mutant
        ObjectHeader: 82631e98 (old version)
        HandleCount: 3  PointerCount: 6
        Directory Object: e1496420  Name: WininetProxyRegistryMutex

And 826276f8, that’s the cfix32 thread we have already assessed. Obviously, iexplore waits for cfix to release the WininetProxyRegistryMutex, and cfix waits on someone else.

Turning over to VC, we can find a stack that also contains a call to FixProxySettingsForCurrentConnection on its stack. Again, blocking on WininetProxyRegistryMutex.

    ChildEBP RetAddr  
    f4492c90 808202b6 nt!KiSwapContext+0x25
    f4492ca8 8081fb6e nt!KiSwapThread+0x83
    f4492cec 8090e64e nt!KeWaitForSingleObject+0x2e0
    f4492d50 8082350b nt!NtWaitForSingleObject+0x9a
    f4492d50 7c8285ec nt!KiFastCallEntry+0xf8
    065b95c8 7c827d0b ntdll!KiFastSystemCallRet
    065b95cc 77e61d1e ntdll!NtWaitForSingleObject+0xc
    065b963c 77e61c8d kernel32!WaitForSingleObjectEx+0xac
    065b9650 46a8c54d kernel32!WaitForSingleObject+0x12
    065b966c 46a7eeca WININET!FixProxySettingsForCurrentConnection+0x25
    065b9684 46a7ee3f WININET!CFsm_HttpSendRequest::RunSM+0x61
    065b969c 46a7fefa WININET!CFsm::Run+0x39
    065b96b4 46ab0a67 WININET!DoFsm+0x25
    065b96dc 46aa1092 WININET!HttpWrapSendRequest+0x148
    065b9714 06b231da WININET!HttpSendRequestW+0x5e
    065b973c 06b22ea8 SYMSRV!StoreWinInet::request+0x2a
    065b9770 06b226cc SYMSRV!StoreWinInet::fileinfo+0x18
    065b9780 06b22741 SYMSRV!StoreWinInet::get+0x7c
    065b9fc4 06b229a3 SYMSRV!StoreWinInet::open+0x41
    065ba204 06b15ae7 SYMSRV!StoreWinInet::find+0xd3
    065ba2f8 06b16c47 SYMSRV!cascade+0x87
    065ba848 06b16a57 SYMSRV!SymbolServerByIndexW+0x127
    065baa78 51412896 SYMSRV!SymbolServerW+0x77
    065bb8cc 51413383 mspdb80!LOCATOR::SYMSRV::SymbolServer+0x190
    065bbf10 514136f8 mspdb80!LOCATOR::FLocatePdbSymsrv+0x75
    065bbf38 514139ce mspdb80!LOCATOR::FLocatePdbPathHelper+0x179
    065bc96c 51413cbe mspdb80!LOCATOR::FLocatePdbPath+0x105
    065bccb4 51414371 mspdb80!LOCATOR::FLocatePdb+0x1ad
    065bd9a8 458cc1e8 mspdb80!PDBCommon::OpenValidate5+0xab
    065bd9ec 45959d4c msenc90!enc::EncImageEdit::
                      `scalar deleting destructor'+0x4d
    065bda34 45958e62 NatDbgDE!OLPDBOpen+0x93
    065be6c0 45958f0a NatDbgDE!OLStart+0x107
    065be6fc 45958fae NatDbgDE!LoadOmfForReal+0x23
    065be714 45959019 NatDbgDE!LoadSymbols+0x43
    065be72c 459590d9 NatDbgDE!OLLoadOmf+0x55
    065be75c 45959154 NatDbgDE!SHLoadDll+0xd5
    065be7ac 45959247 NatDbgDE!CSymbolHandlerX::SHLoadDll+0x5a
    065be844 4595937c NatDbgDE!CModule::Load+0x1a1
    065be8ac 4594d002 NatDbgDE!CNativeProcess::NotifyModLoad+0xc8
    065be9ec 4594cf6d NatDbgDE!EngineCallback+0xb3
    065bea18 45958d3f NatDbgDE!EMCallBackDB+0x4c
    065bf050 4594d0dc NatDbgDE!LoadFixups+0x218
    065bf0ac 4594d289 NatDbgDE!DebugPacket+0x213
    065bfdb4 4594cf39 NatDbgDE!EMFunc+0x40f
    065bfddc 4594d73d NatDbgDE!TLCallBack+0x1e
    065bfdf4 4594d711 NatDbgDE!TLClientLib::Local_TLFunc+0xc8
    065bfe3c 4594d85c NatDbgDE!DMSendPacket+0x121
    065bfee8 45959b1d NatDbgDE!NotifyEM+0x3ae
    065bff0c 4594d663 NatDbgDE!ProcessLoadDLLEvent+0x47
    065bff44 4594d686 NatDbgDE!ProcessDebugEvent+0x30d
    065bffb8 77e64829 NatDbgDE!DmPollLoop+0x3c
    065bffec 00000000 kernel32!BaseThreadStart+0x34

But – looking closely, it becomes obvious that this thread must be the one handling debug events, and in fact, the call to ProcessLoadDLLEvent is a strong indication for that this thread is currently handling a DLL load event. And now we have closed the loop – this thread must be handling the DLL load event for rasapi.dll, the DLL which cfix was about to load. And to do this, VC attempts to acquire the WininetProxyRegistryMutex, which is owned by the original cfix thread. Deadlock.

What is interesting about this situation is that neither party – cfix, iexplore or VCExpress, and also none of the modules clearly is the culprit and behaving wrong. It is more like a combination of special circumstances that bring up the deadlock as discussed.

It is also notable that I am not using any particular proxy settings on this machine and automatic proxy configuration has been turned off.

So far, I have not experienced the same problem with VS 2003 and VS 2005 – I thus assume that only VS 2008 is affected by this.

Although I am pretty sure that cfix is not really at fault here, I have to adapt it to avoid this deadlock in the future. Until an updated version is available, you can use this workaround.

« Back to home