Debugging The case of the mysterious JVM x64 crashes

Posted on

To date, I did all my Java development uding 32 bit JVMs. After all, as long as you do not have extreme memory requirements, the 64 bit JVM should not buy you much. Today I installed the Java 6 Update 6 x64 JDK on my Vista x64 machine and tried to run some of my JUnit tests on this VM.

With little success:

    #
    # An unexpected error has been detected by Java Runtime Environment:
    #
    #  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000772b219b, pid=4188, tid=2272
    #
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode windows-amd64)
    # Problematic frame:
    # C  [ntdll.dll+0x5219b]
    #
    # If you would like to submit a bug report, please visit:
    #   http://java.sun.com/webapps/bugreport/crash.jsp
    #
    

The project uses a native library via JNI, so of course I immediately suspected this to be the problem. So I placed a Java breakpoint on the respective System.loadLibrary call with the intent of attaching WinDBG as soon as this breakpoint is hit. In WinDBG, I could then break on the exception and see what the problem is.

But to my surprise, the Java breakpoint was not hit – the VM crashed immediately and I received the same output about an unexpected error having occured. That seemed strange to me – maybe it was not the fault of the JNI library after all? So I created a simple Hello World application and ran it – that worked. Then I created this innocent JUnit test:

    public class JTest
    {
      @org.junit.Test
      public void testname() throws Exception
      {    
      }
    }
    

This one failed again, yielding the same error message as above. Well, at least that gave me evidence that not my JNI library but the JVM was the culprit of the crash – but still, the situation seemed weird. Running the test again under WinDBG, I could see that the AV occured during a heap free operation (As a side node, it is annoying that Sun does not supply symbols for its binaries):

    00000000`0404ca88 00000000`7727e7e2 ntdll!RtlCaptureContext+0x8c
    00000000`0404ca98 00000000`7727e72b ntdll!RtlpWalkFrameChain+0x52
    00000000`0404d018 00000000`773352f2 ntdll!RtlCaptureStackBackTrace+0x4b
    00000000`0404d048 00000000`772e1d35 ntdll!RtlpStackTraceDatabaseLogPrefix+0x42
    00000000`0404d178 00000000`7715d9fa ntdll! ?? ::FNODOBFM::`string'+0xa93f
    00000000`0404d1f8 000007fe`fef0175c kernel32!HeapFree+0xa
  *** WARNING: Unable to verify checksum for C:\Program Files\Java\jdk1.6.0_06\jre\bin\server\jvm.dll
  *** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Program Files\Java\jdk1.6.0_06\jre\bin\server\jvm.dll - 
    00000000`0404d228 00000000`08101c09 msvcrt!free+0x1c
    00000000`0404d258 00000000`081026cc jvm!JVM_EnqueueOperation+0x8c139
    00000000`0404d288 00000000`040b4937 jvm!JVM_EnqueueOperation+0x8cbfc
    00000000`0404d318 00000000`0404d338 0x40b4937
    

Well, could be a heap corruption – but it is interesting that crash did not occur during block coalescence or similar operations but during stack trace capturing. As a matter of fact, I always run my machine with user mode stack trace database creation enabled for debugging purposes. So I disabled the stack trace database in gflags, rebooted the machine and – voilà, the crash disappeared!

Wow. I think this is worth being filed as a bug.

Any opinions expressed on this blog are Johannes' own. Refer to the respective vendor’s product documentation for authoritative information.
« Back to home