Archive for May, 2008

Fun with low level SEH

Most code that uses Structured Exception Handling does this with the help of the compiler, e.g. by using __try/__except/__finally. Still, it is possible to do everything by hand, i.e. to provide your own exception handlers and set up the exception registration records manually. However, as this entire topic is not documented very well, doing so opens room for all kind of surprises…

Although more than 10 years old, the best article on this topic still seems to be Matt Pirtrek’s A Crash Course on the Depths of Win32™ Structured Exception Handling, which I assume you have read. However, note that this article as well as this post refer to i386 only, albeit both to user and kernel mode.

Exception Registration Record Validation

On the i386, SEH uses a linked list of exception registration records. The first record is pointed to by the first member of the TIB. In user mode, the TIB is part of the TEB, in kernel mode it is part of the KPCR — in any case, it is at fs:[0]. Each record, besides containing a pointer to the next lower record, stores a pointer to an exception handler routine.

Installing an exception registration record is thus straightforward and merely requires adjusting the TIB pointer and having the new record point to next lower record. So I set up my custom exception registration record, registered it properly, verified that all pointers are correct and tried using it. However, I was unpleasently surprised that exeption handling totally failed as soon as my exception registration got involved. !exchain reported an “Invalid exception stack”, although checking the pointers manually again seemed to show that the chain of exception registration records was fine and my record seemed ok.

Digging a little deeper I found the reason for that — and in fact I cannot remember ever having heard or read about this requirement before: Windows requires all EXCEPTION_REGISTRATION_RECORDs to be located on the stack. Both RtlDispatchException and RtlUnwind check the location of each EXCEPTION_REGISTRATION_RECORD against the stack limits and abort exception handling as soon as a record is found to be not stack-located. Aborting exception handling in this case means that RaiseException/ExRaiseStatus will just return and execution will be resumed at the caller site as if nothing happened.

This requirement is fair enough, actually, but in my case it totally wrecked my design. I did not have the 8 spare bytes to store this record and thus therefore put the record on some dedicated place on the heap. Urgh. Anyway…

As an interesting side note, Windows Server 2003 performs this stack check against both limits — minimum and maximum address of the stack. Vista, however, only checks against the maximum address (i.e. bottom of the stack) and does not care whether the minimum address (i.e. top of stack) has been exceeded.

Moreover, there is another restriction on exception records that only applies to user mode: The handler routine pointed to by the exception record is verified to not point into the stack. This is obviously another security measure to avoid SEH records to point into some overflown buffers.

SafeSEH

It is worth pointing out that all these checks are unrelated to SafeSEH and are performed regardless of whether your module is SafeSEH compatible or not. Not before these checks have all passed, the exception handler has to undergo the SafeSEH validation: The image base is calculated, the table listing the trusted SEH handlers is looked up and it is checked whether the handler routine pointed to by the current exception record is located in this table.

SafeSEH Handler Registration

Using SafeSEH is a good thing and I link all my modules with /SafeSEH. So when you use low level SEH, i.e. without using the __try/__except compiler support, the obvious question is how to get your SEH handler to be recognized as a trusted handler and be included in the SafeSEH table. After all, the compiler will not be able to recognize that the routine you have just written will in fact be used as an exception handler. The C compiler does not seem to offer support for that — luckily however, ml does by providing the .SAFESEH directive.

If you like writing your exception handler in assmbler, this is all you need. If, however, you prefer C, this is somewhat unsatisfying. The documentation of .SAFESEH states that it can be used with an extrn proc, but that does not seem to work. My solution was thus to write the actual routine in C and write a little thunk in assembler, which I was then able to register using the .SAFESEH directive:

.586               
.model flat, stdcall
option casemap :none

extrn RealExceptionHandlerWrittenInC@16

...

ExceptionHandlerThunk proto
.SAFESEH ExceptionHandlerThunk

...

.code

ExceptionHandlerThunk proc
	jmp RealExceptionHandlerWrittenInC@16
ExceptionHandlerThunk endp

Stupid things you should not do

Finally, there is another little quirk that bit me: Do not use EXCEPTION_CONTINUE_SEARCH where ExceptionContinueSearch would have been appropriate. The EXCEPTION_* constants are for use by exception filters as used for __except statements, whereas the Exception* values have to be used for low level exception handlers. Should be obvious, right? :)

Having chosen the wrong group of constants, I returned EXCEPTION_CONTINUE_SEARCH from my exception handler to indicate that the handler is unable to handle certain exceptions. However, as it turns out, EXCEPTION_CONTINUE_SEARCH has the value 0 and is thus interpreted as ExceptionContinueExecution. Now, returning ExceptionContinueExecution when being requested to handle an exception raised by ExRaiseStatus is obviously a bad idea and in this case led to a STATUS_NONCONTINUABLE_EXCEPTION. After a few of those had stacked up (in kernel mode), VirtualPC crashed with an unrecoverable CPU error. Nice :)

Advertisements

The case of the mysterious JVM x64 crashes

To date, I did all my Java development uding 32 bit JVMs. After all, as long as you do not have extreme memory requirements, the 64 bit JVM should not buy you much. Today I installed the Java 6 Update 6 x64 JDK on my Vista x64 machine and tried to run some of my JUnit tests on this VM.

With little success:

#
# An unexpected error has been detected by Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000772b219b, pid=4188, tid=2272
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode windows-amd64)
# Problematic frame:
# C  [ntdll.dll+0x5219b]
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

The project uses a native library via JNI, so of course I immediately suspected this to be the problem. So I placed a Java breakpoint on the respective System.loadLibrary call with the intent of attaching WinDBG as soon as this breakpoint is hit. In WinDBG, I could then break on the exception and see what the problem is.

But to my surprise, the Java breakpoint was not hit — the VM crashed immediately and I received the same output about an unexpected error having occured. That seemed strange to me — maybe it was not the fault of the JNI library after all? So I created a simple Hello World application and ran it — that worked. Then I created this innocent JUnit test:

public class JTest
{
  @org.junit.Test
  public void testname() throws Exception
  {    
  }
}

This one failed again, yielding the same error message as above. Well, at least that gave me evidence that not my JNI library but the JVM was the culprit of the crash — but still, the situation seemed weird. Running the test again under WinDBG, I could see that the AV occured during a heap free operation (As a side node, it is annoying that Sun does not supply symbols for its binaries):

00000000`0404ca88 00000000`7727e7e2 ntdll!RtlCaptureContext+0x8c
00000000`0404ca98 00000000`7727e72b ntdll!RtlpWalkFrameChain+0x52
00000000`0404d018 00000000`773352f2 ntdll!RtlCaptureStackBackTrace+0x4b
00000000`0404d048 00000000`772e1d35 ntdll!RtlpStackTraceDatabaseLogPrefix+0x42
00000000`0404d178 00000000`7715d9fa ntdll! ?? ::FNODOBFM::`string'+0xa93f
00000000`0404d1f8 000007fe`fef0175c kernel32!HeapFree+0xa
*** WARNING: Unable to verify checksum for C:\Program Files\Java\jdk1.6.0_06\jre\bin\server\jvm.dll
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Program Files\Java\jdk1.6.0_06\jre\bin\server\jvm.dll - 
00000000`0404d228 00000000`08101c09 msvcrt!free+0x1c
00000000`0404d258 00000000`081026cc jvm!JVM_EnqueueOperation+0x8c139
00000000`0404d288 00000000`040b4937 jvm!JVM_EnqueueOperation+0x8cbfc
00000000`0404d318 00000000`0404d338 0x40b4937

Well, could be a heap corruption — but it is interesting that crash did not occur during block coalescence or similar operations but during stack trace capturing. As a matter of fact, I always run my machine with user mode stack trace database creation enabled for debugging purposes. So I disabled the stack trace database in gflags, rebooted the machine and — voilà, the crash disappeared!

Wow. I think this is worth being filed as a bug.

Trace and Watch Data — How does it work

One of the builtin WinDBG commands is wt (Trace and Watch Data), which can be used to trace the execution flow of a function. Given source code like the following:

void foo()
{
}

void bar()
{
}

int main()
{
  // Some random code...
  int a = 1, b = 2;
  
  // Call a child function.
  foo();
  
  // More useless code...
  a+=b;
  if ( a == b) a = b;
  
  // Call another child function.
  bar();  
  
  return 0;
}

wt will produce the following output:

0:000> wt
Tracing test!main to return address 00401291
    6     0 [  0] test!main
    1     0 [  1]   test!ILT+5(_foo)
    4     0 [  1]   test!foo
   13     5 [  0] test!main
    1     0 [  1]   test!ILT+0(_bar)
    4     0 [  1]   test!bar
   17    10 [  0] test!main

27 instructions were executed in 26 events 
                                  (0 from other threads)

Function Name         Invocations MinInst MaxInst AvgInst
test!ILT+0(_bar)                1       1       1       1
test!ILT+5(_foo)                1       1       1       1
test!bar                        1       4       4       4
test!foo                        1       4       4       4
test!main                       1      17      17      17

0 system calls were executed

Although helpful, tracing a larger function calling a multitude of other functions slows down the debuggee significantly. An interesting question is thus how wt is implemented. Three possible implementation strategies come to mind:

  1. Use single-stepping. After each instruction executed, a debug trap is raised and the debugger is delivered a single-step debugging event. Though all non-branching instructions are probably irrelevant to wt, by intercepting each call and ret instruction, the debugger is able to trace function entry and exit.
  2. Explicitly set breakpoints. The debugger disassembles the function to be traced and places an ordinary breakpoint on each call instruction as well on as the return address of the function. Whenever one of the call-breakpoints fires, the debugger instruments the target function in the same way (i.e. place breakpoints on each call instruction as well as the return address) and continues execution (without single-stepping). By intercepting all function calls and returns, the debugger is able to deduce the call tree. This approach would be similar to UMSS.
  3. Use Last Branch Recording. This is a rather new additon to the IA-32 instruction set that allows setting breakpoints on taken branches, interrupts, and exceptions, and to single-step from one branch to the next.

In order to find out, we have to debug the debugger to observe how it debugs the target. We thus start WinDBG, choose our test application as target and let it break on main. We then start another WinDBG instance and attach it to the first WinDBG instance. In order to find out which debugging events are consumed by the first instance, we use the second debugger to trace function calls made by the first debugger.

All usermode debuggers eventually end up calling ntdll!NtWaitForDebugEvent in a loop — so to find out which debugging events are consumed, all we need to do is trace all calls to this function. While being an undocumented native function, there is an excellent summary on the inner workings of user mode debugging which also covers ntdll!NtWaitForDebugEvent. Given this information, all we need to do to check whether strategy #1 or strategy #2 has been implemented (I assume #3 may safely be neglected) is to put together a little breakpoint command like the following (line breaks added for clarity):

bp ntdll!NtWaitForDebugEvent "
   r @$t1=poi(esp+10); 
   g @$ra; 
   .if (poi(@$t1)==8) {.echo \"SingleStep\n\" } 
   .else {.printf \"Excp %p\\n\", poi(@$t1+c)};
   g "

When entering ntdll!NtWaitForDebugEvent, we store the address of the fourth parameter (which receives a PDBGUI_WAIT_STATE_CHANGE structure) in $t1 and step out of the function. Then we reach into the structure whose address is stored in $t1 and check if the first field marks the event of being of type DbgSingleStepStateChange (0x8) and output an appropriate message. If we receive about 30 single-step events, strategy #1 has probably been chosen. For #2 we would expect to receive 5 breakpoint events.

Back to the first debugger, we now opt to trace the main function by running wt. This yields the output shown above. Switching to the second debugger again, we now see the following output:

SingleStep
SingleStep
SingleStep

[...about 20 more...]

SingleStep
SingleStep
SingleStep
SingleStep
SingleStep

Quite obviously, wt implements strategy #1 — it does single stepping. Although this does not really come as a surprise, it is still unfortunate as it is most likely the slowest approach of tracing calls. And as anybody who has ever used wt can probably confirm, wt is really slow.

As an interesting side note, as of Linux kernel 2.6.25, ptrace on x86 has been enhanced to facilitate Last Branch Recording on CPUs that support it.

Ksplice — safe enough?

Last week, Ksplice, an automatic system for rebootless Linux kernel security updates gained some attention. The idea of using hotpatching techniques for applying sucurity fixes to the kernel in order to save reboots is not quite new. Not only does Windows support hotpatching as of Windows Server 2003 SP1, there also have have been attempts to introduce a hot updating infrastructure to the Linux kernel before. Anyway, the paper is an instresting read.

The basic idea followed by Kspliace is to analyze the differences between an old (flawed) and a new (fixed) kernel binary. Based on this analysis, Ksplice decides which routines have changed and now need to be updated. Updating routines is performed by replacing the old routine, i.e. execution is redirected from the old to the new routine.

Such redirection requires code to be patched. Patching code is a nontrivial undertaking and always raises the question of safety — after all, uncareful kernel code patching could easily crash the entire system. The paper describes how this problem is dealt with, yet one of the paragraphs caught my attention (page 7):

A safe time to update a function is when no thread’s instruction pointer falls within that function’s text in memory and when no thread’s kernel stack contains a return address within that function’s text in memory.

Before inserting the trampolines, Ksplice captures all of the machine’s processors and checks whether the above safety condition is met for all of the functions being replaced. […]

So in order to ensure safety, Ksplice perfroms a full stack walk for all threads. While this is a sound approach in theory, it usually turns out to be rather problematic in practice. In fact, the only other updating/dynamic instrumentation approach I am currently aware of that also performs stack walks is Paradyn — all other approaches (deliberately) have choosen other ways to perform safe runtime code modifications.

The reason why stack walking is problematic should be obvious — creating perfect stack traces either requires proper debugging information for all modules involved to be available or requires proper stack frames for all routines so that the ebp-chain can be traversed. In practice, debugging information is often not available for all modules. While this is probably less a problem on Linux than on Windows, it is still a problem that cannot be easily dismissed. Finally, optimizations such as Frame Pointer Omission can thwart attempts to perform a stack walk by following the ebp-chain.

The paper is not specific on how exactly these stack walks are performed and how it tries to overcome these problems, so I took a look at the sources. The stack walk is performed by the routine check_stack, which is shown in the following listing (Excerpt from primary.c, lines 283–307):

/* Modified version of Linux's print_context_stack */
int
check_stack(struct thread_info *tinfo, long *stack)
{
  int conflict, status = 0;
  long addr;

  while (valid_stack_ptr(tinfo, stack)) {
    addr = *stack++;
    if (__kernel_text_address(addr)) {
      conflict = check_address_for_conflict(addr);
      if (conflict)
        status = -1;
      if (debug >= 2) {
        printk("%08lx ", addr);
        if (conflict)
          printk("[= 2)
    printk("\n");

  return status;
}

The parameter stack contains the the frame pointer of the topmost stack frame. Starting from this address, the routine treats every doubleword on the stack as a potential stack frame and sees whether it might represent a return address that points to one of the critical functions. While possibly seeing too many stack frames and generating false positives with this approach, it is in fact a more pessimistic and thus in this context safer approach than walking the stack by following the ebp-chain.

Google Calendar, WTF?

Quite obviously, Google does not always get it right either. Ever when I try to see my Google Calendar (using Opera), I am requested to login. So I enter my credentials, am redirected a couple of times and — are broght to the login page again. Logging in again does not help, I have by then entered an infinite loop. Thankfully, I can escape this loop by jumping to the original calendar URL again — now Google recognizes that I have already logged in and shows me my calendar. Great.

But this one is even better. Having received an invitation, I was presented with a page offering me to accept or reject the event.

Look at the screenshot — you can select all three options Yes, No and Maybe at the same time! Very convenient indeed. Having submitted the form, this answer seems to have been recognized as a Yes. Amazing.


Categories




About me

Johannes Passing, M.Sc., living in Berlin, Germany.

Besides his consulting work, Johannes mainly focusses on Win32, COM, and NT kernel mode development, along with Java and .Net. He also is the author of cfix, a C/C++ unit testing framework for Win32 and NT kernel mode, Visual Assert, a Visual Studio Unit Testing-AddIn, and NTrace, a dynamic function boundary tracing toolkit for Windows NT/x86 kernel/user mode code.

Contact Johannes: jpassing (at) acm org

Johannes' GPG fingerprint is BBB1 1769 B82D CD07 D90A 57E8 9FE1 D441 F7A0 1BB1.

LinkedIn Profile
Xing Profile
Github Profile