Memory leak hunting still doesn't end, or: Serialization also requires a bit of manually imposed amnesia

By Attila Szegedi, on Tursday, 17th February 2005

The search for memory leaks in a system I'm currently developing still didn't end after the events described in two previous installments. Those articles are turning into a mini-series with this third part, and I start fearing that maybe this will grow into a soap opera. So, after even more search, I started being suspicious of an internal Sun JDK class, named sun.misc.SoftCache. It is used in several places, and unfortunately it is often used inappropriately (details in few moments). I found at least one case of a prior inappropriate usage in this Sun bug database entry. Anyway, here's what happens when you serialize objects that are instances of classes loaded through a throwaway class loader. Well, actually quite many things happen, but of particular interest to us at this moment is the fact that the java.io.ObjectStreamClass class will store certain per-class information in entries of its SoftCaches named localDescs and reflectors. In the entries, the keys are the classes, and they are held using strong references. The values are held as soft references. Now, you can clearly see the trouble with this. They should have used stock java.util.WeakHashMap, but I would be willing to bet that SoftCache is a remnant of JDK 1.1 days when there were no soft references in the JDK proper, but just an internal Sun implementation. Anyway, by using a strong key-soft value configuration of a map instead of the weak key-strong value configuration of the WeakHashMap, they really created a dangerous memory trap. Consider how are the entries removed from the SoftCache: they are only removed when a soft reference o the value gets cleared and enqueued in a ReferenceQueue, and the SoftCache is mutated the next time. This unfortunately means that in order to remove an entry for no longer used classes from it, your system needs to meet the following two requirements:

Well, I really have to have my stars aligned for this not to drive the system out of memory when classes are frequently reloaded and serialization used. If I believed I have such luck, I'd probably go out and buy a lottery ticket instead. I'd sooner win it than have the luck with JVM memory management in presence of SoftCaches. (And if I won it, then I'd retire and then... well, I guess I'd keep coding, because that's what I do. Reminds me of this Penny Arcade gem). Now, in the meanwhile, what do I do about it? Turns out, there's two things I had to do:

The ugly code goes like this (I swear this was the very first time in my carreer as a Java developer that I had to use setAccessible(true) on a Field)

    public void destroy()
    {
        ... release all known references to the classes and the class loader ...
        clearSoftCache(ObjectStreamClass.class, "localDescs");
        clearSoftCache(ObjectStreamClass.class, "reflectors");
        ... other cleanup ...
    }
    
    private static void clearSoftCache(Class clazz, String fieldName)
    {
        Field f = clazz.getDeclaredField(fieldName);
        f.setAccessible(true);
        Map cache = (Map)f.get(null); // SoftCache is java.util.Map
        synchronized(cache)
        {
            cache.clear();
        }
    }

It couldn't get much uglier than this, could it? The only reason I don't feel ashamed is because this is indeed a temporary workaround for a bug that has been acknowledged by Sun, so it'll probably go away sooner or later.

I must also admit that the code above was incomplete. There are two other cases where the serialization subsystem has better memory than is good for my health and remembers my Class instances long after I'd like the system to have long forgotten them. But this one is special, you won't bump into it too often. Not unless you load and use your own subclasses of ObjectInputStream and ObjectOutputStream through that throwaway class loader. Let's admit it to yourself, you don't that too often. I haven't done it myself except for this one time. It opens up some really elegant design possibilities in certain circumstances, but that's another story (I also don't want to get too specific because I don't want to get too close to the thin line separating what I can disclose about my paid work and what I can't). Anyway, if you do such exotic feats, it's best to add two more lines to the above destroy method, namely the full list of invocations to clearSoftCache method should now read:
        clearSoftCache(ObjectInputStream.class, "subclassAudits");
        clearSoftCache(ObjectOutputStream.class, "subclassAudits");
        clearSoftCache(ObjectStreamClass.class, "localDescs");
        clearSoftCache(ObjectStreamClass.class, "reflectors");

No happy-end for now

So, after all these measures for causing benign amnesia in serialization plus the similar measure considering commons-logging (I wrote about it in the first article), you'd think I no longer see loitering class loaders either in my nightmares nor when I'm awake browsing the logs of the running system.

Wrong.

Unfortunately.

There are still class loaders that don't want to unload. The problem is, the thing is completely nondeterministic, or at least I fail to see a pattern in it.

The text below this paragraph is rather technical. Yes, more technical than what you read above :-). You have been warned.

*Deep breath* I must admit I'm stuck. I'm still seeing class loaders not being unloaded. The trouble is, I can't find any more sources of a leak. I tried every profiler I could lay my hands on an evaluation license for. I read through the JVMPI spec. I started coding my own JNI/JVMPI profiler DLL for the JVM in C. And you know I'm desperate when I go back from Java to coding in C. Then fortunately I abandoned it because I figured that HPROF (the little profiler DLL that can and is shipped with the JDK) binary output should suit my needs. Then I downloaded HAT. It's a nice tool that'll slurp the heap dump output of HPROF and then set up a HTTP server with a HTML interface for querying the said dump. Then I realized it ignores many strong references, like that from object to its class and from class to its class loader, signers, and protection domain. Being open source, I implemented everything in it. I also realized it falls into the trap that it considers static variables to be GC roots.

Let me take a little digression to send a message to all people out there who want to write a Java profiler: Static variables are not GC roots. For some reason almost all profilers on the planet assume this. Well, except for YourKit Profiler 4.0. They got it right. Static variables are strongly held by their classes, but they're not roots. Classes aren't GC roots either. Only classes loaded through the goddamn system class loader are GC roots. I hope the message was clear.

So I fixed that too, so I get less false positives in my output. I now have the most comprehensive profiler I believe I can get (I contributed the changes back to the HAT project, but its mailing list looks rather dead with my message with patch contributions being the first and only one so far in list archives and last CVS change was in 5 months when it was originally uploaded). It turns out the fidelity of the representation of the JVM heap state of this updated HAT is actually identical to what YourKit folks did with the 4.0 of the profiler (disclaimer: I'm in no way affiliated with them, just have tried their profiler out and I must say it's the first one I encountered that does the things the right way. Well, at least the way I'd expect it to). Nevertheless, even with full-fidelity reference information, it doesn't find anything.

I do actually have a fear. You see, all HPROF dumps I analyzed were generated by running on JDK 1.4.2. I also did a dump from running the system on JDK 1.5.0, but it was incredibly flaky. I mean, in rare circumstances when HPROF managed to create a full dump without crashing tje JVM, the dump was full of unresolvable references (most of them exactly to class loaders), but it also showed that one of my classes loaded through that throwaway class loader was having a root reference of type JVMPI_GC_ROOT_JNI_GLOBAL and there was another reference of type JVMPI_GC_ROOT_UNKNOWN to the class loader itself. These didn't show up in JDK 1.4.2 dumps, but I fear that might only be an indication of JVMPI not being fully implemented in 1.4.2. If so, these refs are real. But what now? How do I find a JNI global reference? And how do I deal with a GC root of "unknown" type? And how do I know if they're for real, since as I said the 1.5.0 HPROF output was very flaky, and they didn't show up in 1.4.2 output. Then I turned again to YourKit, and managed to create snapshots running on both 1.4.2 and 1.5.0. 1.4.2 snapshots only discovered false roots - I was keeping a phantom reference to the class loaders to detect if they don't unload, and the only path to a GC root was through a phantom reference. So it does not count. When YourKit took a snapshot running on 1.5.0 (it uses JVMTI instead of JVMPI when profiling 1.5.0 JVMs), all the class loaders were marked as "unknown roots". I've even downloaded JDK SCSL source code and sifted through it to find out what does constitute "unknown" roots in its JVMTI implementation, but to no avail.

It would be nice to end this article with a happy-end, but I can't. I wonder what's lurking behind this last enigma, and I feel I'll be amazed when I finally crack it. If I crack it. Will keep you updated.

Attila.


Monday, 6th June 2005

Update: It looks like we're plagued by this JRE bug: "ClassLoaders do not get released by GC, causing OutOfMemory in Perm Space". If you have Sun Bug Database votes to spare, any will be gratefully accepted for it.


Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.