Concurrent Execution A typical user mode process on a Windows system can be expected to have more than one thread. In addition to user threads, the Windows kernel employs a number of system threads. Given the presence of multiple threads, it is likely that whenever a code modification is performed, more than one thread is affected, i.e. more than one thread is sooner or later going to execute the modified code sequence.
Performing modifications on existing code is a technique commonly encountered among instrumentation solutions such as DTrace. Assuming a multiprocessor machine, altering code brings up the challenge of properly synchronizing such activity among processors. As stated before, IA-32/Intel64 allows code to be modified in the same manner as data. Whether modifying data is an atomic operation or not, depends on the size of the operand. If the total number of bytes to be modified is less than 8 and the target address adheres to certain alignment requirements, current IA-32 processors guarantee atomicity of the write operation.
Instrumentation of a routine may comprise multiple steps. As an example, a trampoline may need to be generated or updated, followed by a modification on the original routine, which may include updatating or replacing a branch instruction to point to the trampoline. In such cases, it is essential for maintaining consistency that the code changes take effect in a specific order. Otherwise, if the branch was written before the trampoline code has been stored, the branch would temporarily point to uninitialized memory.
Runtime code modification, of self modifying code as it is often referred to, has been used for decades – to implement JITters, writing highly optimized algorithms, or to do all kinds of interesting stuff. Using runtime code modification code has never been really easy – it requires a solid understanding of machine code and it is straightforward to screw up. What’s not so well known, however, is that writing such code has actually become harder over the last years, at least on the IA-32 platform: Comparing the 486 and current Core architectures, it becomes obvious that Intel, in order to allow more advanced CPU-interal optimizations, has actually lessened certain gauarantees made by the CPU, which in turn requires the programmer to pay more attection to certain details.
Our paper NTrace: Function Boundary Tracing for Windows on IA-32 from WCRE 2009 has now been published on computer.org: Abstract: For a long time, dynamic tracing has been an enabling technique for reverse engineering tools. Tracing can not only be used to record the control flow of a particular component such as a piece of malware itself, it is also a way to analyze the interactions of a component and their impact on the rest of the system.
Having given my presentation on NTrace today at the WCRE in Lille/France, I have also opened ntrace.org to the public. NTrace, in case you have missed my previous posts, is a dynamic function boundary tracing system for Windows/x86 I initially developed as part of my Master’s thesis that is capable of performing DTrace-like tracing of both user and kernel mode components. On the NTrace page, you will now find the paper itself as being published as part of the WCRE proceedings (mind the copyright notice, please) along with two screencasts: One showing how NTrace can be used to trace kernel mode components such as NTFS, and one demonstrating NTrace for user mode tracing.
Next week, the 16th Working Conference on Reverse Engineering (WCRE) will be held in Lille, France. I will be there presenting NTrace: Function Boundary Tracing for Windows on IA-32. NTrace is a dynamic function boundary tracing toolkit for IA-32/x86 that can be used to trace both kernel and user mode Windows components – examples for components that can be traced include the kernel itself (ntoskrnl), drivers like NTFS as well as user mode components such as kernel32, shell32 or even explorer.