13 years ago, I wrote NTrace, a dynamic function boundary tracing toolkit for Windows NT inspired by DTrace. NTrace supported both user-mode and kernel mode tracing and, like DTrace, was able to instrument machine code on the fly.
Under the hood, NTrace took advantage of the way Microsoft x86 compilers emit machine code to make the code compatible with Windows Hotpatching. Aware of the particular code layout emitted by compilers, NTrace used runtime code modification to weave in its instrumentation code, which then captured the function entry and exit events, as well as SEH exceptions.
NTrace was robust and it was fast – in the benchmarks I did, it outperformed DTrace on x86 by a factor of almost three. And while it was not anywhere as feature-rich as DTrace, it was still fairly easy to use, as two screencasts from 2008 demonstrated.
The first screencast showed how NTrace instruments the kernel and drivers of a running Windows system to capture function entry and exit events:
The second screencast showed how NTrace attached to a running Windows process and instrumented it to capture function entry and exit events:
Developing NTrace was part of my master’s thesis about function boundary tracing in the Windows kernel. The aim of my thesis was twofold – first, it proposed a taxonomy and classification for dynamic tracing approaches. And second, it explained how to make use of the Windows Hotpatching infrastructure and structured exception handling system on x86 to trace program execution.
I finished my Master’s degree in fall of 2008. And as the economy was collapsing around me, I spent some more time at the university and published another paper on NTrace, titled NTrace: Function Boundary Tracing for Windows on IA-32. I presented the paper at the 16th Working Conference on Reverse Engineering in Lille, France, and it was later published in the conference’s proceedings. The abstract of the paper read:
For a long time, dynamic tracing has been an enabling technique for reverse engineering tools. Tracing can not only be used to record the control flow of a particular component such as a piece of malware itself, it is also a way to analyze the interactions of a component and their impact on the rest of the system. Unlike Unix-based systems, for which several dynamic tracing tools are available, Windows has been lacking appropriate tools. From a reverse engineering perspective, however, Windows may be considered the most relevant OS, particularly with respect to malware analysis. In this paper, we present NTrace, a dynamic tracing tool for the Windows kernel, drivers, system libraries, and applications that supports function boundary tracing. NTrace incorporates 2 novel approaches: (1) a way to integrate with Windows Structured Exception Handling and (2) a technique to instrument binary code on IA-32 architectures that is both safe and more efficient than DTrace.
Together with my coworkers at the University, I also managed to obtain a patent on the tracing approach.
After I left university, I did not spend any more time on NTrace. One reason was that developing NTrace consumed a lot of time – something I did not have anymore after starting to work full time. Another reason was that in 2008, we were at the cusp of the transition to 64 bit. 64-bit Windows changed the machine code layout in a way that made it more difficult to instrument – and more importantly, Microsoft started to crack down on runtime code modification and patching of kernel-mode components by introducing PatchGuard.
My initial hope was that Microsoft might eventually pick up the idea behind NTrace – but that obviously did not happen. Ten years later, Microsoft introduced DTrace on Windows – but like its Solaris cousin, it uses a more simplistic and slower instrumentation approach based on trap handlers.
NTrace was both one of the most fun and most challenging pieces of software I ever wrote. It was fun because I had the time and opportunity to really dive into the guts of the Windows kernel and I not only learned a lot about the Kernel that way, but also about how x86 processors worked.
But writing NTrace was also challenging, and that was not only because it involved writing low-level C and Assembly code. It was challenging because I not only wanted to be able to instrument some parts of the Kernel, but all of it – including its lowest-level parts like the memory manager, interrupt service routines, and trap handlers. Instrumenting ISRs and trap handers is a risky business as any error in the instrumentation code is likely to first lead to a double-fault, then a triple-fault, and then to a CPU reset – making the code all but impossible to debug. So I fondly remember spending days on end debugging triple-faults that kept occurring once I instrumented a certain memory manager routine.