Quantcast
Viewing all articles
Browse latest Browse all 5

JAMDL – Java Automatic Memory Leak Detector using JMap, Jasper and MySQL

I was working on this idea for a couple of years now, trying to give it a shape, but never really finding the time for the details. The basic concept was simple:

  1. Start monitoring of objects
  2. Run a test
  3. Collect and import monitoring metrics
  4. Use math and estimation to detect memory leaks

The memory leak

When does an object become a memory leak suspect? Quite simple: i will use here two acronyms

  1. SNI – Start Number of Instances
  2. ENI – End Number of Instances

Taking the shortest path, one would say whenever this result is returned:

ENI-SNI > 0

That means: “if the number of instances at the end of the test is higher than the one at the beginning of the test is considered a memory leak” Well, not necessarily:

  • some objects may be initialized only by the test itself when loading specific classes, so they were never there when starting the application server
  • soft references: the objects may be collected, meaning it is up to the Garbage Collector to decide when to remove the object
  • session timeouts: some users close their sessions upon logging out, others (most of them) just close their browser -> session timeout is the one responsible to remove the session and the attached objects, and the timeout may vary depending on the implementation, meaning it is not the Test End Timestamp that is decisive, but the Test End Timestamp + Timeout

Now just saying “higher number of instances” is not enough. We need to evaluate the delta between SNI and ENI. We can for example use the standard deviation. Upon measuring only two values, if the difference is high enough, we can assume that something is not going as planned, and we might have a memory leak

Monitoring the JVM Object Map

Jmap is a great tool (part of the JDK ) that we can use to inspect the memory map. Using jmap, one can at any point in time (with some overhead, of course – not that big ) retrieve a list of all objects residing in the heap. The results would look something like this:

Java Object Map – Objects and number of instances in the heap

1:       1803751      166592520  [C
2:        347130      103348904  [B
3:        565934       74272832  <constMethodKlass>
4:        565934       72452448  <methodKlass>
5:        242821       62836312  [I
6:         55222       60738280  <constantPoolKlass>
7:         55221       40555296  <instanceKlassKlass>
8:        484966       38797280  java.lang.reflect.Method
9:        886626       35851896  [Ljava.lang.Object;
10:        391682       32626104  [Ljava.util.HashMap$Entry;
11:         46464       32591136  <constantPoolCacheKlass>
12:       1351668       32440032  java.lang.String
13:        748338       23946816  java.util.HashMap$Entry
14:        426004       20448192  java.util.HashMap
15:        501680       20067200  java.util.LinkedHashMap$Entry
16:        820685       19696440  java.util.ArrayList
17:        233944       16843968  java.lang.reflect.Field
18:        360930       11549760  java.util.concurrent.ConcurrentHashMap$HashEntry

The first column is a unique key for the object that will not change during the existence of the object in the JVM.

The second column is the number of instances the object occupies in the heap

The third column is the size (in bytes) that the object occupies in the heap.

Like i mentioned before, in order to perform memory analysis, we need to gather at least two metrics:

  • object occupancy before starting the test scenario – SNI
  • object occupancy after ending the test scenario – ENI

Those of you reading this post should know by now the two types of garbage collection (Young and Full) and how garbage collection works. Therefore i will just jump into the details of the problem.

One may now say:  what if the objects are dead, and waiting to be collected by the next garbage collection? Would the results then still be reliable?

We need to make sure that both SNI and ENI are measured after a FULL GARBAGE COLLECTION. That way we’ll make sure there is no dead object waiting to be collected.
So, our scenario up to now runs like this:

  1. Full Garbage Collection -> Retrieve SNI for all live objects
  2. Run the test
  3. Wait for the session timeout
  4. Full Garbage Collection -> Retrieve ENI for all live objects

On the other side, we would still want to see what happens with the objects WHILE the test is running: are they collected by the Young Garbage Collector at all? Is it an increasing line that we see, or it decreases as well? Like i said, we are talking about suspects, so we need some more proof to decide if we deal with a leak or not.

So let’s add a loop and retrieve the TNI (temporary number of instances) every couple of seconds

Presuming our performance test will trigger at least a couple of Young Garbage collections, we could try retrieving the heap occupancy map regularly. Adding timestamps to the results will then allow us to see the lifecycle of the object during the performance test. This would then look something like:

 

ID                Instances   Bytes   Name      TimeStamp
262:          1035         281520  MyObject1,15-00-57
457:          1035          91080  MyObject2,15-00-57
613:           475          45600  MyObject3,15-00-57
642:           414          39744  MyObject4,15-00-57
689:           267          32040  MyObject5,15-00-57
862:           177          18408 MyObject6,15-00-57
1434:           118           4720  MyObject7,15-00-57
283:           788         214336  MyObject1,15-01-30
493:           788          69344  MyObject2,15-01-30
662:           369          35424  MyObject3,15-01-30
699:           308          29568  MyObject4,15-01-30
733:           214          25680  MyObject5,15-01-30
955:           135          14040 MyObject6,15-01-30
1405:           118           4720  MyObject7,15-01-30
285:           726         197472  MyObject1,15-02-03
495:           726          63888  MyObject2,15-02-03
657:           345          33120  MyObject3,15-02-03
696:           284          27264  MyObject4,15-02-03
726:           202          24240  MyObject5,15-02-03
973:           118          12272 MyObject6,15-02-03
1365:           118           4720  MyObject7,15-02-03
318:           411         111792  MyObject1,15-02-36
556:           411          36168  MyObject2,15-02-36
716:           217          20832  MyObject3,15-02-36
786:           138          16560  MyObject5,15-02-36
818:           156          14976  MyObject4,15-02-36
1290:           120           4800  MyObject7,15-02-36

AMDL Reports – Object Lifecycle Reports

We can now expect two types of graphics:

Image may be NSFW.
Clik here to view.
Memory Leak - Increasing trend and no garbage collection

Memory Leak – Increasing trend and no garbage collection

Here we see an increasing trend , without any decreases over time, meaning the object is not being collected at all

Image may be NSFW.
Clik here to view.
Object Lifecycle - No Memory Leak - Stable trend, increasing and decreasing line

Memory Leak – Increasing trend and no garbage collection

Here we see a stable trend, where the objects are being collected

Assuming that at the end of the test we’ll import all monitoring data into the database, and then generate reports containing the three items (SNI, TNI, ENI) , the full list of steps to perform JAMDL would be now:

  1. Full Garbage Collection -> Retrieve SNI for all live objects
  2. Start and run the test
  3. Perform TNI collection while test is running and the session timeout has not occured
  4. Wait for the session timeout and stop TNI collection
  5. Full Garbage Collection -> Retrieve ENI for all live objects
  6. Import the results into the database
  7. Compute the deviation between SNI and ENI
  8. Automatic generation of performance report containing all memory leak suspects that resulted from point 7

This is how a AMDL Session would look like in VisualVM

 AMDL Main Report with Memory Leak Suspects

Integrating the results in the main report could look like this (for presentation purposes i have set the deviation very low to 10)

We can now drill into the two memory leaks suspects and see if there is a memory leak indeed:

Image may be NSFW.
Clik here to view.
Object Lifecycle - Drill down report - No memory leak

Object Lifecycle – Drill down report – No memory leak

And since the post is about memory leaks, this would be one then:

Image may be NSFW.
Clik here to view.
Memory Leak - Increasing trend and no garbage collection

Memory Leak – Increasing trend and no garbage collection

Using a relational database you can decide on your own on the implementation of the deviation:

  1. ENI – SNI: You can compute the difference between the ENI and SNI, and set a threshold. For example if at the end of the test i have 100 instances more, do report that as a suspect
  2. STDEV(ENI,SNI): You can compute the deviation between the two values, and set a threshold.

It is up to you to decide on the implementation that suits you best.

 

One last word: theoretically you can use this even in production environment, as long as you do not retrieve the memory map that often, and as long as you do not perform full garbage collection. In that case, of course, the timespan for monitoring must be long enough to allow objects to be collected by the Old Generation Garbage Collection…nevertheless, a point to think of, that could save you some time, and actively report possible memory leak suspects.

Cheers, have fun and enjoy. I will gladly help with further information regarding any of the 8 points above

Alex

 

 

 


Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.

Viewing all articles
Browse latest Browse all 5

Trending Articles