Archive for the 'Kernel' Category



Reaching beyond the top of the stack — illegal or just bad style?

The stack pointer, esp on i386, denotes the top of the stack. All memory below the stack pointer (i.e. higher addresses) is occupied by parameters, variables and return addresses; memory above the stack pointer must be assumed to contain garbage.

When programming in assembly, it is equally easy to use memory below and above the stack pointer. Reading from or writing to addresses beyond the top of the stack is unusual and under normal circumstances, there is little reason to do so. There are, however, situations — rare situations — where it may tempting to temporarily use memory beyond the top of the stack.

That said, the question is whether it is really just a convention and good style not to grab beyond the stack of the stack or whether there are actually reasons why doing so could lead to problems.

When trying to answer this question, one first has to make a distinction between user mode and kernel mode. In user mode Windows, I am unable to come up with a single reason of why usage of memory beyond the top of the stack could lead to problems. So in this case, it is probably merely bad style.

However, things are different in kernel mode.

In one particular routine I recently wrote, I encountered a situation where temporarily violating the rule of not reaching beyond the top of the stack came in handy. The routine worked fine for quite a while. In certain situations, however, it suddenly started to fail due to memory corruption. Interestingly enough, the routine did not fail always, but still rather frequently.

Having identified the specific routine as being the cuplrit, I started single stepping the code. Everything was fine until I reached the point where the memory above the stack pointer was used. The window span only a single instruction. Yet, as soon as I had stepped over the two instructions, the system crashed. I tried it multiple times, and it was prefectly reproduceable when being single-stepped.

So I took a look at the stack contents after every single step I took. To my surprise, as soon as I reached the critical window, the contents of the memory location just beyond the current stack pointer suddenly became messed. Very weird.

After having been scratching my head for a while, that suddenly started to made sense: I was not the only one using the stack — in between the two instructions, an interrupt must have occured and been dispatched. As my thread happened to be the one currently running, it was my stack that has been used for dispatching it. This also explains why it did not happened always unless I was single-stepping the respective code.

When an interrupt occurs and no privilege-level change has to be performed, the CPU will push the EFLAGS, CS and EIP registers on the stack. That is, the stack of whatever kernel thread happens to be the one currently running on this CPU is reused and the memory locations beyond the stack pointer will be overwritten by these three values. So what I initially interpreted as garbage, actually were the contents of EFLAGS, CS and EIP.

On Windows NT, unlike some other operating systems (FreeBSD, IIRC), handling the interrupt, which involves runing the interrupt service routine (ISR) occurs on the same stack as well. The following stack trace, taken elsewhere, shows an ISR being executed on the stack of the interrupted thread:

f6bdab4c f99bf153 i8042prt!I8xQueueCurrentMouseInput+0x67
f6bdab78 80884289 i8042prt!I8042MouseInterruptService+0xa58
f6bdab78 f6dd501a nt!KiInterruptDispatch+0x49
f6bdac44 f6dd435f driver!Quux+0x11a 
f6bdac58 f6dd61db driver!Foobar+0x6f 
...

Morale of the story: Using memory beyond the current stack pointer is not only bad practice, it is actually illegal when done in kernel mode.

Explicit Symbol Loading with dbghelp

When working with symbols, the default case is that you either analyze the current process, a concurrently running process or maybe even the kernel. dbghelp provides special support for these use cases and getting the right symbols to load is usually easy — having access to the process being analyzed, dbghelp can obtain the necessary module information by itself and will come up with the matching symbols.

Things are not quite as easy when analyzing symbols for a process (or kernel) that is not running any more or executes on a different machine. In this case, manual intervention is necessary — without your help, dbghelp will not be able to decide which PDB to load.

An example for a case where this applies is writing a binary tracefile. To keep the trace file small and to make writing entries as efficient as possible, it makes sense to write raw instruction pointer and function pointer addresses to the file. This way, resolving symbols and finding the names of the affected routines is delayed until the file is actually read.

However, when the trace file is read, the traced process (or kernel) may not be alive any more, so dbghelp’s automatisms cannot be used. Moreover, the version of the OS and the individual modules involved may not be the same on the machine where the trace is obtained and the machine where the trace file is read. It is thus crucial that the trace file contains all neccessary information required for the reading application to find and load the matching symbols.

Capturing the debug information

That sounds easy enough. But once again, the documentation on this topic is flaky enough to make this a non-trivial task. This thread on the OSR WinDBG list revealed some important insights, yet the summary is not entirely correct. So I will try to explain it here. Note that the discussion equally applies to user and kernel mode.

Two basic pieces of information need to be captured and stored in the trace file: The load address and the debug data of each module involved. The load address is required in order to later be able to convert from virtual addresses (VAs) to relative virtual addresses (RVAs). The debug data is neccessary for finding the PDB exactly matching the respective module.

The debug data is stored in the PE image and can be obtained via the debug directory. The following diagram illustrates the basic layout of the affected parts of a PE image:

Via the _IMAGE_DATA_DIRECTORY structure, which is at index IMAGE_DIRECTORY_ENTRY_DEBUG of the DataDirectory, we can obtain the RVA of the debug directory — as always, RVAs are relative to the load address. The debug directory is a sequence of IMAGE_DEBUG_DIRECTORY structures. Although the CodeView entry should be the one of primary interest, expect more than one entry in this sequence. The entry count can be calculated by dividing _IMAGE_DATA_DIRECTORY::Size by sizeof( IMAGE_DEBUG_DIRECTORY ).

The IMAGE_DEBUG_DIRECTORY structures do not contain all necessary information by themselves — rather, they refer to a blob of debug data located elsewhere in the image. IMAGE_DEBUG_DIRECTORY::AddressOfRawData contains the RVA (again, relative to the load address) to this blob. Note that the blobs are not necessarily laid out in the same order as the corresponding IMAGE_DEBUG_DIRECTORY structures are. (The role of the PointerToRawData member in this context remains unclear.)

Summing up, we need to write to the trace file:

  • The module load address
  • all IMAGE_DEBUG_DIRECTORY entries
  • all debug data blobs referred to

In order to maintain the relation between the IMAGE_DEBUG_DIRECTORY entries and their corresponding blobs, the AddressOfRawData and PointerToRawData members have to be updated before being written to the file. In order to make loading symbols a bit easier, it makes sense to save the entire sequence of IMAGE_DEBUG_DIRECTORY structs (with updated RVAs) contigously to the file, immediately followed by the debug data blobs.

Loading symbols

Having all neccessary information in the trace file, we can now load the symbols. The key is to use SymLoadModuleEx rather than SymLoadModule64 and passing it a MODLOAD_DATA struct:

  • MODLOAD_DATA::data receives a pointer to the sequence of IMAGE_DEBUG_DIRECTORY structs. For each struct, the AddressOfRawData member should be set to 0 and PointerToRawData should receive the RVA of the corresponding debug data blob relative to the beginning of the respective IMAGE_DEBUG_DIRECTORY struct.

    As indicated before, it therefore makes sense to write the IMAGE_DEBUG_DIRECTORY structs and the blobs consecutievly to the file and — at the time of writing — adjust the AddressOfRawData and PointerToRawData members. This way, you can just assign the entire memory chunk to MODLOAD_DATA::data.

  • MODLOAD_DATA::size receives the size of the entire memory chunk, including both the sequence of IMAGE_DEBUG_DIRECTORY structs and all blobs.

There you go, dbghelp should now be able to identify the proper PDB and load the symbols.

Fun with low level SEH

Most code that uses Structured Exception Handling does this with the help of the compiler, e.g. by using __try/__except/__finally. Still, it is possible to do everything by hand, i.e. to provide your own exception handlers and set up the exception registration records manually. However, as this entire topic is not documented very well, doing so opens room for all kind of surprises…

Although more than 10 years old, the best article on this topic still seems to be Matt Pirtrek’s A Crash Course on the Depths of Win32™ Structured Exception Handling, which I assume you have read. However, note that this article as well as this post refer to i386 only, albeit both to user and kernel mode.

Exception Registration Record Validation

On the i386, SEH uses a linked list of exception registration records. The first record is pointed to by the first member of the TIB. In user mode, the TIB is part of the TEB, in kernel mode it is part of the KPCR — in any case, it is at fs:[0]. Each record, besides containing a pointer to the next lower record, stores a pointer to an exception handler routine.

Installing an exception registration record is thus straightforward and merely requires adjusting the TIB pointer and having the new record point to next lower record. So I set up my custom exception registration record, registered it properly, verified that all pointers are correct and tried using it. However, I was unpleasently surprised that exeption handling totally failed as soon as my exception registration got involved. !exchain reported an “Invalid exception stack”, although checking the pointers manually again seemed to show that the chain of exception registration records was fine and my record seemed ok.

Digging a little deeper I found the reason for that — and in fact I cannot remember ever having heard or read about this requirement before: Windows requires all EXCEPTION_REGISTRATION_RECORDs to be located on the stack. Both RtlDispatchException and RtlUnwind check the location of each EXCEPTION_REGISTRATION_RECORD against the stack limits and abort exception handling as soon as a record is found to be not stack-located. Aborting exception handling in this case means that RaiseException/ExRaiseStatus will just return and execution will be resumed at the caller site as if nothing happened.

This requirement is fair enough, actually, but in my case it totally wrecked my design. I did not have the 8 spare bytes to store this record and thus therefore put the record on some dedicated place on the heap. Urgh. Anyway…

As an interesting side note, Windows Server 2003 performs this stack check against both limits — minimum and maximum address of the stack. Vista, however, only checks against the maximum address (i.e. bottom of the stack) and does not care whether the minimum address (i.e. top of stack) has been exceeded.

Moreover, there is another restriction on exception records that only applies to user mode: The handler routine pointed to by the exception record is verified to not point into the stack. This is obviously another security measure to avoid SEH records to point into some overflown buffers.

SafeSEH

It is worth pointing out that all these checks are unrelated to SafeSEH and are performed regardless of whether your module is SafeSEH compatible or not. Not before these checks have all passed, the exception handler has to undergo the SafeSEH validation: The image base is calculated, the table listing the trusted SEH handlers is looked up and it is checked whether the handler routine pointed to by the current exception record is located in this table.

SafeSEH Handler Registration

Using SafeSEH is a good thing and I link all my modules with /SafeSEH. So when you use low level SEH, i.e. without using the __try/__except compiler support, the obvious question is how to get your SEH handler to be recognized as a trusted handler and be included in the SafeSEH table. After all, the compiler will not be able to recognize that the routine you have just written will in fact be used as an exception handler. The C compiler does not seem to offer support for that — luckily however, ml does by providing the .SAFESEH directive.

If you like writing your exception handler in assmbler, this is all you need. If, however, you prefer C, this is somewhat unsatisfying. The documentation of .SAFESEH states that it can be used with an extrn proc, but that does not seem to work. My solution was thus to write the actual routine in C and write a little thunk in assembler, which I was then able to register using the .SAFESEH directive:

.586               
.model flat, stdcall
option casemap :none

extrn RealExceptionHandlerWrittenInC@16

...

ExceptionHandlerThunk proto
.SAFESEH ExceptionHandlerThunk

...

.code

ExceptionHandlerThunk proc
	jmp RealExceptionHandlerWrittenInC@16
ExceptionHandlerThunk endp

Stupid things you should not do

Finally, there is another little quirk that bit me: Do not use EXCEPTION_CONTINUE_SEARCH where ExceptionContinueSearch would have been appropriate. The EXCEPTION_* constants are for use by exception filters as used for __except statements, whereas the Exception* values have to be used for low level exception handlers. Should be obvious, right? :)

Having chosen the wrong group of constants, I returned EXCEPTION_CONTINUE_SEARCH from my exception handler to indicate that the handler is unable to handle certain exceptions. However, as it turns out, EXCEPTION_CONTINUE_SEARCH has the value 0 and is thus interpreted as ExceptionContinueExecution. Now, returning ExceptionContinueExecution when being requested to handle an exception raised by ExRaiseStatus is obviously a bad idea and in this case led to a STATUS_NONCONTINUABLE_EXCEPTION. After a few of those had stacked up (in kernel mode), VirtualPC crashed with an unrecoverable CPU error. Nice :)

Ksplice — safe enough?

Last week, Ksplice, an automatic system for rebootless Linux kernel security updates gained some attention. The idea of using hotpatching techniques for applying sucurity fixes to the kernel in order to save reboots is not quite new. Not only does Windows support hotpatching as of Windows Server 2003 SP1, there also have have been attempts to introduce a hot updating infrastructure to the Linux kernel before. Anyway, the paper is an instresting read.

The basic idea followed by Kspliace is to analyze the differences between an old (flawed) and a new (fixed) kernel binary. Based on this analysis, Ksplice decides which routines have changed and now need to be updated. Updating routines is performed by replacing the old routine, i.e. execution is redirected from the old to the new routine.

Such redirection requires code to be patched. Patching code is a nontrivial undertaking and always raises the question of safety — after all, uncareful kernel code patching could easily crash the entire system. The paper describes how this problem is dealt with, yet one of the paragraphs caught my attention (page 7):

A safe time to update a function is when no thread’s instruction pointer falls within that function’s text in memory and when no thread’s kernel stack contains a return address within that function’s text in memory.

Before inserting the trampolines, Ksplice captures all of the machine’s processors and checks whether the above safety condition is met for all of the functions being replaced. [...]

So in order to ensure safety, Ksplice perfroms a full stack walk for all threads. While this is a sound approach in theory, it usually turns out to be rather problematic in practice. In fact, the only other updating/dynamic instrumentation approach I am currently aware of that also performs stack walks is Paradyn — all other approaches (deliberately) have choosen other ways to perform safe runtime code modifications.

The reason why stack walking is problematic should be obvious — creating perfect stack traces either requires proper debugging information for all modules involved to be available or requires proper stack frames for all routines so that the ebp-chain can be traversed. In practice, debugging information is often not available for all modules. While this is probably less a problem on Linux than on Windows, it is still a problem that cannot be easily dismissed. Finally, optimizations such as Frame Pointer Omission can thwart attempts to perform a stack walk by following the ebp-chain.

The paper is not specific on how exactly these stack walks are performed and how it tries to overcome these problems, so I took a look at the sources. The stack walk is performed by the routine check_stack, which is shown in the following listing (Excerpt from primary.c, lines 283–307):

/* Modified version of Linux's print_context_stack */
int
check_stack(struct thread_info *tinfo, long *stack)
{
  int conflict, status = 0;
  long addr;

  while (valid_stack_ptr(tinfo, stack)) {
    addr = *stack++;
    if (__kernel_text_address(addr)) {
      conflict = check_address_for_conflict(addr);
      if (conflict)
        status = -1;
      if (debug >= 2) {
        printk("%08lx ", addr);
        if (conflict)
          printk("[= 2)
    printk("\n");

  return status;
}

The parameter stack contains the the frame pointer of the topmost stack frame. Starting from this address, the routine treats every doubleword on the stack as a potential stack frame and sees whether it might represent a return address that points to one of the critical functions. While possibly seeing too many stack frames and generating false positives with this approach, it is in fact a more pessimistic and thus in this context safer approach than walking the stack by following the ebp-chain.

AuxKlibGetImageExportDirectory and forwarders

One of the newer additions to the DDK is the aux_klib library, which, among others, offers the routine AuxKlibGetImageExportDirectory. As its name suggests, AuxKlibGetImageExportDirectory offers a handy way to obtain a pointer to the export directory of a kernel module.

There is, however, one issue that — at least in my opinion — renders AuxKlibGetImageExportDirectory pretty much useless in most scenarios: Dealing with forwaders.

The primary motivation to call AuxKlibGetImageExportDirectory is to either enumerate the exports of a module or to find a specific export. In both cases, the code is likely to call at least one of the exported routines. To maintain binary compatibility, it would be risky for such code to rely on the fact that all exports that it aims to call are in fact ‘real’ exports and not forwarders. Rather, it is crucial to be prepared to find both types — exports and forwarders — in the export directory and handle each of them appropropriately.

So we need to tell an export from a forwarder. As it turns out, this is not quite as easy as checking some flag. Quoting the Microsoft Portable Executable and Common Object File Format Specification on the content of the export address table:

Each entry in the export address table is a field that uses one of two formats in the following table. If the address specified is not within the export section (as defined by the address and length that are indicated in the optional header), the field is an export RVA, which is an actual address in code or data. Otherwise, the field is a forwarder RVA, which names a symbol in another DLL.

And this exactly is the problem — only being provided the PIMAGE_EXPORT_DIRECTORY pointer, we do not know the start and end RVA of the export section. As a consequence, identifying forwarders is infeasible when using AuxKlibGetImageExportDirectory — which in turn makes it a pretty much useless function.

Workaround

Although AuxKlibGetImageExportDirectory is handy, the work it performs is rather trivial. Therefore, it is not hard to come up with code that, given the Load Address of a module, finds the export directory and properly checks for the existance of forwarders. The following code shows how:

PIMAGE_DATA_DIRECTORY ExportDataDir;
PIMAGE_EXPORT_DIRECTORY ExportDirectory;
PIMAGE_DOS_HEADER DosHeader = ( PIMAGE_DOS_HEADER ) LoadAddress;
PIMAGE_NT_HEADERS NtHeader; 

PULONG FunctionRvaArray;
PUSHORT OrdinalsArray;

ULONG Index;

//
// Peek into PE image to obtain exports.
//
NtHeader = ( PIMAGE_NT_HEADERS ) 
  PtrFromRva( DosHeader, DosHeader->e_lfanew );
if( IMAGE_NT_SIGNATURE != NtHeader->Signature )
{
  //
  // Unrecognized image format.
  //
  return ...;
}

ExportDataDir = &NtHeader->OptionalHeader.DataDirectory
    [ IMAGE_DIRECTORY_ENTRY_EXPORT ];

ExportDirectory = ( PIMAGE_EXPORT_DIRECTORY ) PtrFromRva( 
  LoadAddress, 
  ExportDataDir->VirtualAddress );
  

if ( ExportDirectory->AddressOfNames == 0 ||
   ExportDirectory->AddressOfFunctions == 0 ||
   ExportDirectory->AddressOfNameOrdinals == 0 )
{
  //
  // This module does not have any exports.
  //
  return ...;
}

FunctionRvaArray = ( PULONG ) PtrFromRva(
  LoadAddress,
  ExportDirectory->AddressOfFunctions );

OrdinalsArray = ( PUSHORT ) PtrFromRva(
  LoadAddress,
  ExportDirectory->AddressOfNameOrdinals );

for ( Index = 0; Index < 
      ExportDirectory->NumberOfNames; Index++ )
{
  //
  // Get corresponding export ordinal.
  //
  USHORT Ordinal = ( USHORT ) OrdinalsArray[ Index ] 
    + ( USHORT ) ExportDirectory->Base;

  //
  // Get corresponding function RVA.
  //
  ULONG FuncRva = 
    FunctionRvaArray[ Ordinal - ExportDirectory->Base ];

  if ( FuncRva >= ExportDataDir->VirtualAddress && 
     FuncRva < ExportDataDir->VirtualAddress 
       + ExportDataDir->Size )
  {
    //
    // It is a forwarder.
    //
  }
  else 
  {
    //
    // It is an export.
    //
  }
}

/kernel and 8.3 file names

Although the Windows file systems have supported filenames with more than 8 chanracters for years, it has still remained good practice (at least for native development) to name modules in 8.3 format. While modules not adhering to this practice normally work well, there is at least one situation where giving a module a long file name does make a real difference: the file name of the kernel.

The default kernel file name is ntoskrnl.exe. Using the /kernel boot parameter, this default can be overridden and an arbitrary file can be specified to be loaded as kernel — this is especially useful when you routinely switch between different kernels, such as the free and checked build.

If, however, you try to give your alternate kernel a non-8.3 name, you will be saluted with the following screen on next boot:

boot screen

Quite obviously, ntldr is not capable of loading a kernel from a non-8.3 file :)

« Previous Page


Categories

Try Visual Assert, the unit testing add-in for Visual Studio (R)


NTrace: Function Boundary Tracing for Windows on IA-32

About me

Johannes Passing, M.Sc., living in Berlin, Germany.

Besides his consulting work, Johannes mainly focusses on Win32, COM, and NT kernel mode development, along with Java and .Net. He also is the author of cfix, a C/C++ unit testing framework for Win32 and NT kernel mode, Visual Assert, a Visual Studio Unit Testing-AddIn, and NTrace, a dynamic function boundary tracing toolkit for Windows NT/x86 kernel/user mode code.

Contact Johannes: jpassing (at) acm org

Johannes' GPG fingerprint is BBB1 1769 B82D CD07 D90A 57E8 9FE1 D441 F7A0 1BB1.

LinkedIn LinkedIn Profile
Xing Xing Profile
Twitter Follow me on Twitter (new)

Follow

Get every new post delivered to your Inbox.