Posts Tagged 'Windows'

What a weirdo: How the /analyze switch changes its behavior depending on its environment

In Visual Studio 2005 Team System (VSTS), the “ultimate” SKU of Visual Studio 2005, Microsoft introduced the /analyze compiler switch. When the /analyze switch is used, the cl compiler not only does its regular checks, but performs a much more thorough static code analysis.

While /analyze is very useful indeed, it was only available in the top SKU — the Standard and Professional versions of Visual Studio lacked support for this compiler switch (this has changed by now, Professional now also supports this feature). As some smart people quickly figured out though, the compilers shipped as part of the Windows SDK did support /analyze, too.

So given that some compilers do support /analyze while other do not, you may well expect that there are two slightly different types of binaries, one that the SDK and VSTS uses, and one that is shipped with other Visual Studio SKUs.

At least this was what I expected. As it turns out though, this is not quite the case.

Where’s /analyze?

For the past two years, I have been developing using Visual Studio 2005 Team System along with Windows SDK 6.0 and WDK 6000 on a Vista x64 machine. Using this setup, I was able to use the /analyze switch in both, “regular” Visual Studio projects and WDK (build.exe-driven) projects. That led me to the conclusion that the WDK 6000 compilers, like the SDK compilers were in fact /analyze-enabled binaries as well.

Switching to a Windows 7 machine with VSTS 2005 and 2008, SDK 7.0, and WDK 6000 did not change this — /analyze kept working fine in all environments.

Then I set up a build server, installed WDK 6000 and Windows SDK 7.0 and attempted to perform a build — to my surprise, though, I got plenty of complaints about the /anayze switch not being supported.

I verified that the right compilers (WDK 6000) were used and compared cl versions between the build machine and my development machine — both were 14.00.50727.220, so everything seemed right. Running cl.exe /? on both machines, however, I noticed that despite versions being the same, this Code Analsis section was missing in the output on the build machine:

                         -CODE ANALYSIS-

/analyze[:WX-] enable code analysis
    WX- - code analysis warnings should not be treated as errors even if /WX is invoked

So obviously, Code Analysis support is enabled or disabled depending on external factors — not the binary itself, but the environment somehow determines whether the /analyze switch is supported or not.

Observing cl.exe /? with Process Monitor on my development machine resulted in the following output:

Process Monitor tracing the search for c1xxast

This trace leaves little room for interpretation: The code analysis features must (mainly) be implemented in c1xxast.dll. c1xxast.dll, however, is not shipped with the WDK itself, nor is it shipped with the non-VSTS SKUs of Visual Studio. So by default, the WDK’s cl will fail to locate the DLL and will revert to “/analyze-disabled mode”.

If, however, you have VSTS or the Windows SDK installed on your machine and your %PATH% happens to include the right directories, cl’s search for c1xxast.dll will succeeded and — tada — /analyze suddenly works. On my development machine, this obviously was the case, whilst on the build machine, it was not.

Compiler version mish-mash

I added the Windows SDK’s bin directory to the build machine’s %PATH% and rerun the build. As I expected, /analyze now worked fine — what I did not quite expect though was that I was now getting dozens of compilation warnings like:

warning C6309: Argument '1' is null: this does not adhere to function 
specification of 'CfixCreateThread'

The reason for this was simple: The WDK cl.exe (remember, version 14.00.50727.220), thanks to a proper %PATH%, now used c1xxast.dll from SDK 7 to perform code analysis — despite the fact that c1xxast.dll actually “belonged” to cl version 15.00.30729.01. So the c1xxast.dll was one generation ahead of the WDK I was using.

The really, really cool thing about cl being able to work with a newer c1xxast.dll is that you can continue using WDK 6000 or 6001 (with W2K support!) and still benefit from the latest-and-greatest static code analysis features.

The reason for getting several warnings on the build machine while not getting similar warnings on my development machine was simply that on my development machine, the VS 2005 directory preceded the SDK directory in my %PATH%. Once I switched the order, I got the same wanings on both machines. This leads me to:

The ugly thing about this, however, is that a tiny change in the order of directories in %PATH% can suddenly make a huge difference w.r.t. code analysis. This is not quite what you’d normally expect.

(The additional compiler warnings, by the way, were a result of the improved analysis checks in cl 15: cl 14 routinely failed to verify the usage of __in vs. __in_opt parameters; cl 15 has become much more precise here and found several mis-attributed function signatures.)

Advertisements

Launched ntrace.org

Having given my presentation on NTrace today at the WCRE in Lille/France, I have also opened ntrace.org to the public. NTrace, in case you have missed my previous posts, is a dynamic function boundary tracing system for Windows/x86 I initially developed as part of my Master’s thesis that is capable of performing DTrace-like tracing of both user and kernel mode components.

On the NTrace page, you will now find the paper itself as being published as part of the WCRE proceedings (mind the copyright notice, please) along with two screencasts: One showing how NTrace can be used to trace kernel mode components such as NTFS, and one demonstrating NTrace for user mode tracing.

If you have questions about NTrace or are interested in more details, please feel free to write me an email — my address is jpassing at acm org.

I’ll be at WCRE 2009 presenting NTrace

Next week, the 16th Working Conference on Reverse Engineering (WCRE) will be held in Lille, France. I will be there presenting NTrace: Function Boundary Tracing for Windows on IA-32.

NTrace is a dynamic function boundary tracing toolkit for IA-32/x86 that can be used to trace both kernel and user mode Windows components — examples for components that can be traced include the kernel itself (ntoskrnl), drivers like NTFS as well as user mode components such as kernel32, shell32 or even explorer.exe.

NTrace implements a novel approach to instrumenting IA-32 machine code and integrating with the Structured Exception Handling facility of Windows. Using this approach, NTrace is not only capable of tracing nearly the entire Windows kernel and system libraries, it is also faster than Solaris DTrace FBT on IA-32!

Details on how exactly NTrace works will be publiched in the paper, which will be made available soon. I will also publish more details on NTrace both here and on a dedicated NTrace website.

The work, by the way, is basically the result of my Master’s thesis I wrote back in 2008.

More Context Menu Handlers for Everyday Use

Although Windows Explorer may actually not be the brightest spot of Windows, it is still, for most users, among the most often used pograms. Customizing it to speed up certain tasks is thus a natural desire.

A while ago, I wrote about how to extend the context menu by new commands that allow MSI packages to be installed/uninstalled with logfiles being created. The registry entries I used were:

Windows Registry Editor Version 5.00

[HKEY_CLASSES_ROOT\Msi.Package\shell\LoggedInstall]
@="&Logged Install"

[HKEY_CLASSES_ROOT\Msi.Package\shell\LoggedInstall\command]
@="msiexec.exe /l* \"%1-install.log\" /i \"%1\" %*"

[HKEY_CLASSES_ROOT\Msi.Package\shell\LoggedUninstall]
@="L&ogged Uninstall"

[HKEY_CLASSES_ROOT\Msi.Package\shell\LoggedUninstall\command]
@="msiexec.exe /l* \"%1-uninstall.log\" /x \"%1\" %*"

[HKEY_CLASSES_ROOT\Msi.Package\shell\runas\command]
@="msiexec.exe /l* \"%1.log\" /i \"%1\" %*"

While I do not use Windows Installer every day, I am a heavy user of cmd.exe command prompts. Another set of custom verbs I use on my machines therefore deal with opening command line windows. Getting Windows Explorer to open a “normal” command prompt using the context menu is not hard and it has been demonstrated on various places. The idea becomes truly powerful, though, when the commands are specialized to open special kinds of command windows:

  • A plain command prompt
  • An elevated command prompt (using elevate.exe)
  • A WDK command prompt (WLH-chk)
  • A WDK command prompt (WLHA64-chk)
  • A Visual Studio 2005 command prompt
  • etc …

To distinguish the different types of consoles, I like to use different colors — The Visual Studio command prompt is white/green, the elevated prompt is green/blue, and so on. The following script puts it all together (mind the static paths):

Windows Registry Editor Version 5.00

[HKEY_CLASSES_ROOT\Directory\shell\Open DDK Console here (WLH-chk)]
@="Open DD&K Console here (WLH-chk)"

[HKEY_CLASSES_ROOT\Directory\shell\Open DDK Console here (WLH-chk)\command]
@="C:\\Windows\\System32\\cmd.exe /k C:\\WinDDK\\6000\\bin\\setenv.bat C:\\WinDDK\\6000\\ chk WLH && cd /D %1 && color 1f"

[HKEY_CLASSES_ROOT\Directory\shell\Open DDK Console here (WLHA64-chk)]
@="Open DD&K Console here (WLHA64-chk)"

[HKEY_CLASSES_ROOT\Directory\shell\Open DDK Console here (WLHA64-chk)\command]
@="C:\\Windows\\System32\\cmd.exe /k C:\\WinDDK\\6000\\bin\\setenv.bat C:\\WinDDK\\6000\\ chk AMD64 && cd /D %1 && color 1f"

[HKEY_CLASSES_ROOT\Directory\shell\Open Default Console here]
@="Open Default Conso&le here"

[HKEY_CLASSES_ROOT\Directory\shell\Open Default Console here\command]
@="cmd.exe /K \"title %1 && cd /D %1\""

[HKEY_CLASSES_ROOT\Directory\shell\Open Elevated Console here]
@="Open Ele&vated Console here"

[HKEY_CLASSES_ROOT\Directory\shell\Open Elevated Console here\command]
@="d:\\bin\\elevate.exe /K \"title %1 && color 1a && cd /D %1\""

[HKEY_CLASSES_ROOT\Directory\shell\Open VS.Net 2005 Console here]
@="Open VS.Net 200&5 Console here"

[HKEY_CLASSES_ROOT\Directory\shell\Open VS.Net 2005 Console here\command]
@="cmd.exe /K \"cd /D %1 && \"C:\\Program Files (x86)\\Microsoft Visual Studio 8\\VC\\vcvarsall.bat\" && color 2f\""

Finally, if you perform backups to the cloud from time to time and do want to upload unencrypted data or for other reasons encrypt specific files occasionaly, it may also be practical to have two GPG commands in your context menu — one to (symetrically) encrypt, and one to decrypt:

Windows Registry Editor Version 5.00

[HKEY_CLASSES_ROOT\.gpg]
@="GpgFile"

[HKEY_CLASSES_ROOT\GpgFile]

[HKEY_CLASSES_ROOT\GpgFile\shell]

[HKEY_CLASSES_ROOT\GpgFile\shell\Decrypt]
@="Decrypt"

[HKEY_CLASSES_ROOT\GpgFile\shell\Decrypt\command]
@="\"c:\\Program Files (x86)\\GNU\\GnuPG\\gpg.exe\" -d -i -o %1.plain %1"

[HKEY_CLASSES_ROOT\*\shell\GpgSymmetricEncrypt]
@="Encrypt with GPG (symmetric)"

[HKEY_CLASSES_ROOT\*\shell\GpgSymmetricEncrypt\command]
@="\"c:\\Program Files (x86)\\GNU\\GnuPG\\gpg.exe\" -c %1"

Thread 0:0 is special

Thread IDs uniquely identify a thread — this certainly holds for user mode threads and should also hold for kernel mode threads. But there is one kind of thread where the ID does not uniquely identify a KTHREAD — the Idle thread.

On a uniprocessor system, there is only one Idle thread and this idle thread will have the thread ID 0 (in process 0). On a multiprocessor system, however, Windows creates one Idle thread per CPU. That makes sense — however, what may be surprising at first is that although all Idle threads have their own KTHREAD structure, all share the same thread ID 0 (CID 0:0). That is, each multiprocessor system will have multiple threads with ID 0, which in turn means that CID 0:0 does not uniquely identify a single thread.

This is easily verified using the kernel debugger on a multiprocessor system (a quad core in this case):

0: kd> !running

System Processors f (affinity mask)
  Idle Processors f
All processors idle.

0: kd> !pcr 0
KPCR for Processor 0 at ffdff000:
[...]

	      CurrentThread: 8089d8c0
	         NextThread: 00000000
	         IdleThread: 8089d8c0


0: kd> !thread 8089d8c0
THREAD 8089d8c0  Cid 0000.0000  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 0
[...]

0: kd> !pcr 1
KPCR for Processor 1 at f7727000:
[...]

	      CurrentThread: f772a090
	         NextThread: 00000000
	         IdleThread: f772a090
	         
0: kd> !thread f772a090
THREAD f772a090  Cid 0000.0000  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 1
[...]

...and so on.

Now, the idle threads are usually not of major intest. Still, it is possible that these threads can become relevant — such as when doing certain kinds of analysis such as tracing interrupts. If such analysis groups events by the CID of the thread they were captured on (rather than the VA of the KTHREAD structure), the results for CID 0:0 will be wrong.

Not a big thing. Still, it took me a while to figure out that this was indeed the reason for some of my traces containing timestamps in strange order: The traces had been created on a quadcode machine and I did make the mistake to correlate the events by their CID during later analsis. As a result, the traces of the four idle threads were all mixed up…

Reaching beyond the top of the stack — illegal or just bad style?

The stack pointer, esp on i386, denotes the top of the stack. All memory below the stack pointer (i.e. higher addresses) is occupied by parameters, variables and return addresses; memory above the stack pointer must be assumed to contain garbage.

When programming in assembly, it is equally easy to use memory below and above the stack pointer. Reading from or writing to addresses beyond the top of the stack is unusual and under normal circumstances, there is little reason to do so. There are, however, situations — rare situations — where it may tempting to temporarily use memory beyond the top of the stack.

That said, the question is whether it is really just a convention and good style not to grab beyond the stack of the stack or whether there are actually reasons why doing so could lead to problems.

When trying to answer this question, one first has to make a distinction between user mode and kernel mode. In user mode Windows, I am unable to come up with a single reason of why usage of memory beyond the top of the stack could lead to problems. So in this case, it is probably merely bad style.

However, things are different in kernel mode.

In one particular routine I recently wrote, I encountered a situation where temporarily violating the rule of not reaching beyond the top of the stack came in handy. The routine worked fine for quite a while. In certain situations, however, it suddenly started to fail due to memory corruption. Interestingly enough, the routine did not fail always, but still rather frequently.

Having identified the specific routine as being the cuplrit, I started single stepping the code. Everything was fine until I reached the point where the memory above the stack pointer was used. The window span only a single instruction. Yet, as soon as I had stepped over the two instructions, the system crashed. I tried it multiple times, and it was prefectly reproduceable when being single-stepped.

So I took a look at the stack contents after every single step I took. To my surprise, as soon as I reached the critical window, the contents of the memory location just beyond the current stack pointer suddenly became messed. Very weird.

After having been scratching my head for a while, that suddenly started to made sense: I was not the only one using the stack — in between the two instructions, an interrupt must have occured and been dispatched. As my thread happened to be the one currently running, it was my stack that has been used for dispatching it. This also explains why it did not happened always unless I was single-stepping the respective code.

When an interrupt occurs and no privilege-level change has to be performed, the CPU will push the EFLAGS, CS and EIP registers on the stack. That is, the stack of whatever kernel thread happens to be the one currently running on this CPU is reused and the memory locations beyond the stack pointer will be overwritten by these three values. So what I initially interpreted as garbage, actually were the contents of EFLAGS, CS and EIP.

On Windows NT, unlike some other operating systems (FreeBSD, IIRC), handling the interrupt, which involves runing the interrupt service routine (ISR) occurs on the same stack as well. The following stack trace, taken elsewhere, shows an ISR being executed on the stack of the interrupted thread:

f6bdab4c f99bf153 i8042prt!I8xQueueCurrentMouseInput+0x67
f6bdab78 80884289 i8042prt!I8042MouseInterruptService+0xa58
f6bdab78 f6dd501a nt!KiInterruptDispatch+0x49
f6bdac44 f6dd435f driver!Quux+0x11a 
f6bdac58 f6dd61db driver!Foobar+0x6f 
...

Morale of the story: Using memory beyond the current stack pointer is not only bad practice, it is actually illegal when done in kernel mode.

Fun with low level SEH

Most code that uses Structured Exception Handling does this with the help of the compiler, e.g. by using __try/__except/__finally. Still, it is possible to do everything by hand, i.e. to provide your own exception handlers and set up the exception registration records manually. However, as this entire topic is not documented very well, doing so opens room for all kind of surprises…

Although more than 10 years old, the best article on this topic still seems to be Matt Pirtrek’s A Crash Course on the Depths of Win32™ Structured Exception Handling, which I assume you have read. However, note that this article as well as this post refer to i386 only, albeit both to user and kernel mode.

Exception Registration Record Validation

On the i386, SEH uses a linked list of exception registration records. The first record is pointed to by the first member of the TIB. In user mode, the TIB is part of the TEB, in kernel mode it is part of the KPCR — in any case, it is at fs:[0]. Each record, besides containing a pointer to the next lower record, stores a pointer to an exception handler routine.

Installing an exception registration record is thus straightforward and merely requires adjusting the TIB pointer and having the new record point to next lower record. So I set up my custom exception registration record, registered it properly, verified that all pointers are correct and tried using it. However, I was unpleasently surprised that exeption handling totally failed as soon as my exception registration got involved. !exchain reported an “Invalid exception stack”, although checking the pointers manually again seemed to show that the chain of exception registration records was fine and my record seemed ok.

Digging a little deeper I found the reason for that — and in fact I cannot remember ever having heard or read about this requirement before: Windows requires all EXCEPTION_REGISTRATION_RECORDs to be located on the stack. Both RtlDispatchException and RtlUnwind check the location of each EXCEPTION_REGISTRATION_RECORD against the stack limits and abort exception handling as soon as a record is found to be not stack-located. Aborting exception handling in this case means that RaiseException/ExRaiseStatus will just return and execution will be resumed at the caller site as if nothing happened.

This requirement is fair enough, actually, but in my case it totally wrecked my design. I did not have the 8 spare bytes to store this record and thus therefore put the record on some dedicated place on the heap. Urgh. Anyway…

As an interesting side note, Windows Server 2003 performs this stack check against both limits — minimum and maximum address of the stack. Vista, however, only checks against the maximum address (i.e. bottom of the stack) and does not care whether the minimum address (i.e. top of stack) has been exceeded.

Moreover, there is another restriction on exception records that only applies to user mode: The handler routine pointed to by the exception record is verified to not point into the stack. This is obviously another security measure to avoid SEH records to point into some overflown buffers.

SafeSEH

It is worth pointing out that all these checks are unrelated to SafeSEH and are performed regardless of whether your module is SafeSEH compatible or not. Not before these checks have all passed, the exception handler has to undergo the SafeSEH validation: The image base is calculated, the table listing the trusted SEH handlers is looked up and it is checked whether the handler routine pointed to by the current exception record is located in this table.

SafeSEH Handler Registration

Using SafeSEH is a good thing and I link all my modules with /SafeSEH. So when you use low level SEH, i.e. without using the __try/__except compiler support, the obvious question is how to get your SEH handler to be recognized as a trusted handler and be included in the SafeSEH table. After all, the compiler will not be able to recognize that the routine you have just written will in fact be used as an exception handler. The C compiler does not seem to offer support for that — luckily however, ml does by providing the .SAFESEH directive.

If you like writing your exception handler in assmbler, this is all you need. If, however, you prefer C, this is somewhat unsatisfying. The documentation of .SAFESEH states that it can be used with an extrn proc, but that does not seem to work. My solution was thus to write the actual routine in C and write a little thunk in assembler, which I was then able to register using the .SAFESEH directive:

.586               
.model flat, stdcall
option casemap :none

extrn RealExceptionHandlerWrittenInC@16

...

ExceptionHandlerThunk proto
.SAFESEH ExceptionHandlerThunk

...

.code

ExceptionHandlerThunk proc
	jmp RealExceptionHandlerWrittenInC@16
ExceptionHandlerThunk endp

Stupid things you should not do

Finally, there is another little quirk that bit me: Do not use EXCEPTION_CONTINUE_SEARCH where ExceptionContinueSearch would have been appropriate. The EXCEPTION_* constants are for use by exception filters as used for __except statements, whereas the Exception* values have to be used for low level exception handlers. Should be obvious, right? :)

Having chosen the wrong group of constants, I returned EXCEPTION_CONTINUE_SEARCH from my exception handler to indicate that the handler is unable to handle certain exceptions. However, as it turns out, EXCEPTION_CONTINUE_SEARCH has the value 0 and is thus interpreted as ExceptionContinueExecution. Now, returning ExceptionContinueExecution when being requested to handle an exception raised by ExRaiseStatus is obviously a bad idea and in this case led to a STATUS_NONCONTINUABLE_EXCEPTION. After a few of those had stacked up (in kernel mode), VirtualPC crashed with an unrecoverable CPU error. Nice :)


Categories




About me

Johannes Passing, M.Sc., living in Berlin, Germany.

Besides his consulting work, Johannes mainly focusses on Win32, COM, and NT kernel mode development, along with Java and .Net. He also is the author of cfix, a C/C++ unit testing framework for Win32 and NT kernel mode, Visual Assert, a Visual Studio Unit Testing-AddIn, and NTrace, a dynamic function boundary tracing toolkit for Windows NT/x86 kernel/user mode code.

Contact Johannes: jpassing (at) acm org

Johannes' GPG fingerprint is BBB1 1769 B82D CD07 D90A 57E8 9FE1 D441 F7A0 1BB1.

LinkedIn Profile
Xing Profile
Github Profile
Advertisements