Archive for June, 2008

Reaching beyond the top of the stack — illegal or just bad style?

The stack pointer, esp on i386, denotes the top of the stack. All memory below the stack pointer (i.e. higher addresses) is occupied by parameters, variables and return addresses; memory above the stack pointer must be assumed to contain garbage.

When programming in assembly, it is equally easy to use memory below and above the stack pointer. Reading from or writing to addresses beyond the top of the stack is unusual and under normal circumstances, there is little reason to do so. There are, however, situations — rare situations — where it may tempting to temporarily use memory beyond the top of the stack.

That said, the question is whether it is really just a convention and good style not to grab beyond the stack of the stack or whether there are actually reasons why doing so could lead to problems.

When trying to answer this question, one first has to make a distinction between user mode and kernel mode. In user mode Windows, I am unable to come up with a single reason of why usage of memory beyond the top of the stack could lead to problems. So in this case, it is probably merely bad style.

However, things are different in kernel mode.

In one particular routine I recently wrote, I encountered a situation where temporarily violating the rule of not reaching beyond the top of the stack came in handy. The routine worked fine for quite a while. In certain situations, however, it suddenly started to fail due to memory corruption. Interestingly enough, the routine did not fail always, but still rather frequently.

Having identified the specific routine as being the cuplrit, I started single stepping the code. Everything was fine until I reached the point where the memory above the stack pointer was used. The window span only a single instruction. Yet, as soon as I had stepped over the two instructions, the system crashed. I tried it multiple times, and it was prefectly reproduceable when being single-stepped.

So I took a look at the stack contents after every single step I took. To my surprise, as soon as I reached the critical window, the contents of the memory location just beyond the current stack pointer suddenly became messed. Very weird.

After having been scratching my head for a while, that suddenly started to made sense: I was not the only one using the stack — in between the two instructions, an interrupt must have occured and been dispatched. As my thread happened to be the one currently running, it was my stack that has been used for dispatching it. This also explains why it did not happened always unless I was single-stepping the respective code.

When an interrupt occurs and no privilege-level change has to be performed, the CPU will push the EFLAGS, CS and EIP registers on the stack. That is, the stack of whatever kernel thread happens to be the one currently running on this CPU is reused and the memory locations beyond the stack pointer will be overwritten by these three values. So what I initially interpreted as garbage, actually were the contents of EFLAGS, CS and EIP.

On Windows NT, unlike some other operating systems (FreeBSD, IIRC), handling the interrupt, which involves runing the interrupt service routine (ISR) occurs on the same stack as well. The following stack trace, taken elsewhere, shows an ISR being executed on the stack of the interrupted thread:

f6bdab4c f99bf153 i8042prt!I8xQueueCurrentMouseInput+0x67
f6bdab78 80884289 i8042prt!I8042MouseInterruptService+0xa58
f6bdab78 f6dd501a nt!KiInterruptDispatch+0x49
f6bdac44 f6dd435f driver!Quux+0x11a 
f6bdac58 f6dd61db driver!Foobar+0x6f 

Morale of the story: Using memory beyond the current stack pointer is not only bad practice, it is actually illegal when done in kernel mode.


Explicit Symbol Loading with dbghelp

When working with symbols, the default case is that you either analyze the current process, a concurrently running process or maybe even the kernel. dbghelp provides special support for these use cases and getting the right symbols to load is usually easy — having access to the process being analyzed, dbghelp can obtain the necessary module information by itself and will come up with the matching symbols.

Things are not quite as easy when analyzing symbols for a process (or kernel) that is not running any more or executes on a different machine. In this case, manual intervention is necessary — without your help, dbghelp will not be able to decide which PDB to load.

An example for a case where this applies is writing a binary tracefile. To keep the trace file small and to make writing entries as efficient as possible, it makes sense to write raw instruction pointer and function pointer addresses to the file. This way, resolving symbols and finding the names of the affected routines is delayed until the file is actually read.

However, when the trace file is read, the traced process (or kernel) may not be alive any more, so dbghelp’s automatisms cannot be used. Moreover, the version of the OS and the individual modules involved may not be the same on the machine where the trace is obtained and the machine where the trace file is read. It is thus crucial that the trace file contains all neccessary information required for the reading application to find and load the matching symbols.

Capturing the debug information

That sounds easy enough. But once again, the documentation on this topic is flaky enough to make this a non-trivial task. This thread on the OSR WinDBG list revealed some important insights, yet the summary is not entirely correct. So I will try to explain it here. Note that the discussion equally applies to user and kernel mode.

Two basic pieces of information need to be captured and stored in the trace file: The load address and the debug data of each module involved. The load address is required in order to later be able to convert from virtual addresses (VAs) to relative virtual addresses (RVAs). The debug data is neccessary for finding the PDB exactly matching the respective module.

The debug data is stored in the PE image and can be obtained via the debug directory. The following diagram illustrates the basic layout of the affected parts of a PE image:

Via the _IMAGE_DATA_DIRECTORY structure, which is at index IMAGE_DIRECTORY_ENTRY_DEBUG of the DataDirectory, we can obtain the RVA of the debug directory — as always, RVAs are relative to the load address. The debug directory is a sequence of IMAGE_DEBUG_DIRECTORY structures. Although the CodeView entry should be the one of primary interest, expect more than one entry in this sequence. The entry count can be calculated by dividing _IMAGE_DATA_DIRECTORY::Size by sizeof( IMAGE_DEBUG_DIRECTORY ).

The IMAGE_DEBUG_DIRECTORY structures do not contain all necessary information by themselves — rather, they refer to a blob of debug data located elsewhere in the image. IMAGE_DEBUG_DIRECTORY::AddressOfRawData contains the RVA (again, relative to the load address) to this blob. Note that the blobs are not necessarily laid out in the same order as the corresponding IMAGE_DEBUG_DIRECTORY structures are. (The role of the PointerToRawData member in this context remains unclear.)

Summing up, we need to write to the trace file:

  • The module load address
  • all debug data blobs referred to

In order to maintain the relation between the IMAGE_DEBUG_DIRECTORY entries and their corresponding blobs, the AddressOfRawData and PointerToRawData members have to be updated before being written to the file. In order to make loading symbols a bit easier, it makes sense to save the entire sequence of IMAGE_DEBUG_DIRECTORY structs (with updated RVAs) contigously to the file, immediately followed by the debug data blobs.

Loading symbols

Having all neccessary information in the trace file, we can now load the symbols. The key is to use SymLoadModuleEx rather than SymLoadModule64 and passing it a MODLOAD_DATA struct:

  • MODLOAD_DATA::data receives a pointer to the sequence of IMAGE_DEBUG_DIRECTORY structs. For each struct, the AddressOfRawData member should be set to 0 and PointerToRawData should receive the RVA of the corresponding debug data blob relative to the beginning of the respective IMAGE_DEBUG_DIRECTORY struct.

    As indicated before, it therefore makes sense to write the IMAGE_DEBUG_DIRECTORY structs and the blobs consecutievly to the file and — at the time of writing — adjust the AddressOfRawData and PointerToRawData members. This way, you can just assign the entire memory chunk to MODLOAD_DATA::data.

  • MODLOAD_DATA::size receives the size of the entire memory chunk, including both the sequence of IMAGE_DEBUG_DIRECTORY structs and all blobs.

There you go, dbghelp should now be able to identify the proper PDB and load the symbols.

cfix 1.0.1 adds support for Windows 2000

Despite the fact that mainstream support for Windows 2000 has ended in 2005 and the system is well on its way to retirement, Windows 2000 is still in wide use today. As such, it remains being an important target platform for many software packages.

The fact that cfix has not provided support for Windows 2000 was thus unfortunate — after all, if Windows 2000 is among the target platforms of your software, you should be able to run your tests on this platform.

cfix 1.0.1 remedies this shortcoming and adds Windows 2000 i386 SP4 as an supported platform.

The updated packages are available for download on Sourceforge.


About me

Johannes Passing lives in Berlin, Germany and works as a Solutions Architect at Google Cloud.

While mostly focusing on Cloud-related stuff these days, Johannes still enjoys the occasional dose of Win32, COM, and NT kernel mode development.

He also is the author of cfix, a C/C++ unit testing framework for Win32 and NT kernel mode, Visual Assert, a Visual Studio Unit Testing-AddIn, and NTrace, a dynamic function boundary tracing toolkit for Windows NT/x86 kernel/user mode code.

Contact Johannes: jpassing (at) hotmail com

LinkedIn Profile
Xing Profile
Github Profile