Linux Ksplice -- safe enough?

Last week, Ksplice, an automatic system for rebootless Linux kernel security updates gained some attention. The idea of using hotpatching techniques for applying sucurity fixes to the kernel in order to save reboots is not quite new. Not only does Windows support hotpatching as of Windows Server 2003 SP1, there also have have been attempts to introduce a hot updating infrastructure to the Linux kernel before. Anyway, the paper is an instresting read.

The basic idea followed by Kspliace is to analyze the differences between an old (flawed) and a new (fixed) kernel binary. Based on this analysis, Ksplice decides which routines have changed and now need to be updated. Updating routines is performed by replacing the old routine, i.e. execution is redirected from the old to the new routine.

Such redirection requires code to be patched. Patching code is a nontrivial undertaking and always raises the question of safety – after all, uncareful kernel code patching could easily crash the entire system. The paper describes how this problem is dealt with, yet one of the paragraphs caught my attention (page 7):




A safe time to update a function is when no thread’s instruction pointer falls within that function’s text in memory and when no thread’s kernel stack contains a return address within that function’s text in memory.







Before inserting the trampolines, Ksplice captures all of the machine’s processors and checks whether the above safety condition is met for all of the functions being replaced. [...]



So in order to ensure safety, Ksplice perfroms a full stack walk for all threads. While this is a sound approach in theory, it usually turns out to be rather problematic in practice. In fact, the only other updating/dynamic instrumentation approach I am currently aware of that also performs stack walks is Paradyn – all other approaches (deliberately) have choosen other ways to perform safe runtime code modifications.

The reason why stack walking is problematic should be obvious – creating perfect stack traces either requires proper debugging information for all modules involved to be available or requires proper stack frames for all routines so that the ebp-chain can be traversed. In practice, debugging information is often not available for all modules. While this is probably less a problem on Linux than on Windows, it is still a problem that cannot be easily dismissed. Finally, optimizations such as Frame Pointer Omission can thwart attempts to perform a stack walk by following the ebp-chain.

The paper is not specific on how exactly these stack walks are performed and how it tries to overcome these problems, so I took a look at the sources. The stack walk is performed by the routine check_stack, which is shown in the following listing (Excerpt from primary.c, lines 283–307):

    /* Modified version of Linux's print_context_stack */
    int
    check_stack(struct thread_info *tinfo, long *stack)
    {
      int conflict, status = 0;
      long addr;
    
      while (valid_stack_ptr(tinfo, stack)) {
        addr = *stack++;
        if (__kernel_text_address(addr)) {
          conflict = check_address_for_conflict(addr);
          if (conflict)
            status = -1;
          if (debug >= 2) {
            printk("%08lx ", addr);
            if (conflict)
              printk("[= 2)
        printk("\n");
    
      return status;
    }
    

The parameter stack contains the the frame pointer of the topmost stack frame. Starting from this address, the routine treats every doubleword on the stack as a potential stack frame and sees whether it might represent a return address that points to one of the critical functions. While possibly seeing too many stack frames and generating false positives with this approach, it is in fact a more pessimistic and thus in this context safer approach than walking the stack by following the ebp-chain.

Any opinions expressed on this blog are Johannes' own. Refer to the respective vendor’s product documentation for authoritative information.
« Back to home