I was working on this idea for a couple of years now, trying to give it a shape, but never really finding the time for the details. The basic concept was simple:
- Start monitoring of objects
- Run a test
- Collect and import monitoring metrics
- Use math and estimation to detect memory leaks
The memory leak
When does an object become a memory leak suspect? Quite simple: i will use here two acronyms
- SNI – Start Number of Instances
- ENI – End Number of Instances
Taking the shortest path, one would say whenever this result is returned:
ENI-SNI > 0
That means: “if the number of instances at the end of the test is higher than the one at the beginning of the test is considered a memory leak” Well, not necessarily:
- some objects may be initialized only by the test itself when loading specific classes, so they were never there when starting the application server
- soft references: the objects may be collected, meaning it is up to the Garbage Collector to decide when to remove the object
- session timeouts: some users close their sessions upon logging out, others (most of them) just close their browser -> session timeout is the one responsible to remove the session and the attached objects, and the timeout may vary depending on the implementation, meaning it is not the Test End Timestamp that is decisive, but the Test End Timestamp + Timeout
Now just saying “higher number of instances” is not enough. We need to evaluate the delta between SNI and ENI. We can for example use the standard deviation. Upon measuring only two values, if the difference is high enough, we can assume that something is not going as planned, and we might have a memory leak
Monitoring the JVM Object Map
Jmap is a great tool (part of the JDK ) that we can use to inspect the memory map. Using jmap, one can at any point in time (with some overhead, of course – not that big ) retrieve a list of all objects residing in the heap. The results would look something like this:
Java Object Map – Objects and number of instances in the heap
1: 1803751 166592520 [C
2: 347130 103348904 [B
3: 565934 74272832 <constMethodKlass>
4: 565934 72452448 <methodKlass>
5: 242821 62836312 [I
6: 55222 60738280 <constantPoolKlass>
7: 55221 40555296 <instanceKlassKlass>
8: 484966 38797280 java.lang.reflect.Method
9: 886626 35851896 [Ljava.lang.Object;
10: 391682 32626104 [Ljava.util.HashMap$Entry;
11: 46464 32591136 <constantPoolCacheKlass>
12: 1351668 32440032 java.lang.String
13: 748338 23946816 java.util.HashMap$Entry
14: 426004 20448192 java.util.HashMap
15: 501680 20067200 java.util.LinkedHashMap$Entry
16: 820685 19696440 java.util.ArrayList
17: 233944 16843968 java.lang.reflect.Field
18: 360930 11549760 java.util.concurrent.ConcurrentHashMap$HashEntry
The first column is a unique key for the object that will not change during the existence of the object in the JVM.
The second column is the number of instances the object occupies in the heap
The third column is the size (in bytes) that the object occupies in the heap.
Like i mentioned before, in order to perform memory analysis, we need to gather at least two metrics:
- object occupancy before starting the test scenario – SNI
- object occupancy after ending the test scenario – ENI
Those of you reading this post should know by now the two types of garbage collection (Young and Full) and how garbage collection works. Therefore i will just jump into the details of the problem.
One may now say: what if the objects are dead, and waiting to be collected by the next garbage collection? Would the results then still be reliable?
We need to make sure that both SNI and ENI are measured after a FULL GARBAGE COLLECTION. That way we’ll make sure there is no dead object waiting to be collected.
So, our scenario up to now runs like this:
- Full Garbage Collection -> Retrieve SNI for all live objects
- Run the test
- Wait for the session timeout
- Full Garbage Collection -> Retrieve ENI for all live objects
On the other side, we would still want to see what happens with the objects WHILE the test is running: are they collected by the Young Garbage Collector at all? Is it an increasing line that we see, or it decreases as well? Like i said, we are talking about suspects, so we need some more proof to decide if we deal with a leak or not.
So let’s add a loop and retrieve the TNI (temporary number of instances) every couple of seconds
Presuming our performance test will trigger at least a couple of Young Garbage collections, we could try retrieving the heap occupancy map regularly. Adding timestamps to the results will then allow us to see the lifecycle of the object during the performance test. This would then look something like:
ID Instances Bytes Name TimeStamp
262: 1035 281520 MyObject1,15-00-57
457: 1035 91080 MyObject2,15-00-57
613: 475 45600 MyObject3,15-00-57
642: 414 39744 MyObject4,15-00-57
689: 267 32040 MyObject5,15-00-57
862: 177 18408 MyObject6,15-00-57
1434: 118 4720 MyObject7,15-00-57
283: 788 214336 MyObject1,15-01-30
493: 788 69344 MyObject2,15-01-30
662: 369 35424 MyObject3,15-01-30
699: 308 29568 MyObject4,15-01-30
733: 214 25680 MyObject5,15-01-30
955: 135 14040 MyObject6,15-01-30
1405: 118 4720 MyObject7,15-01-30
285: 726 197472 MyObject1,15-02-03
495: 726 63888 MyObject2,15-02-03
657: 345 33120 MyObject3,15-02-03
696: 284 27264 MyObject4,15-02-03
726: 202 24240 MyObject5,15-02-03
973: 118 12272 MyObject6,15-02-03
1365: 118 4720 MyObject7,15-02-03
318: 411 111792 MyObject1,15-02-36
556: 411 36168 MyObject2,15-02-36
716: 217 20832 MyObject3,15-02-36
786: 138 16560 MyObject5,15-02-36
818: 156 14976 MyObject4,15-02-36
1290: 120 4800 MyObject7,15-02-36
AMDL Reports – Object Lifecycle Reports
We can now expect two types of graphics:
Clik here to view.

Memory Leak – Increasing trend and no garbage collection
Here we see an increasing trend , without any decreases over time, meaning the object is not being collected at all
Clik here to view.

Memory Leak – Increasing trend and no garbage collection
Here we see a stable trend, where the objects are being collected
Assuming that at the end of the test we’ll import all monitoring data into the database, and then generate reports containing the three items (SNI, TNI, ENI) , the full list of steps to perform JAMDL would be now:
- Full Garbage Collection -> Retrieve SNI for all live objects
- Start and run the test
- Perform TNI collection while test is running and the session timeout has not occured
- Wait for the session timeout and stop TNI collection
- Full Garbage Collection -> Retrieve ENI for all live objects
- Import the results into the database
- Compute the deviation between SNI and ENI
- Automatic generation of performance report containing all memory leak suspects that resulted from point 7
This is how a AMDL Session would look like in VisualVM
Clik here to view.

AMDL Visual VM Session
AMDL Main Report with Memory Leak Suspects
Integrating the results in the main report could look like this (for presentation purposes i have set the deviation very low to 10)
Clik here to view.

AMDL Performance Report
We can now drill into the two memory leaks suspects and see if there is a memory leak indeed:
Clik here to view.

Object Lifecycle – Drill down report – No memory leak
And since the post is about memory leaks, this would be one then:
Clik here to view.

Memory Leak – Increasing trend and no garbage collection
Using a relational database you can decide on your own on the implementation of the deviation:
- ENI – SNI: You can compute the difference between the ENI and SNI, and set a threshold. For example if at the end of the test i have 100 instances more, do report that as a suspect
- STDEV(ENI,SNI): You can compute the deviation between the two values, and set a threshold.
It is up to you to decide on the implementation that suits you best.
One last word: theoretically you can use this even in production environment, as long as you do not retrieve the memory map that often, and as long as you do not perform full garbage collection. In that case, of course, the timespan for monitoring must be long enough to allow objects to be collected by the Old Generation Garbage Collection…nevertheless, a point to think of, that could save you some time, and actively report possible memory leak suspects.
Cheers, have fun and enjoy. I will gladly help with further information regarding any of the 8 points above
Alex
Image may be NSFW.
Clik here to view.
Clik here to view.
