Monday, January 14, 2013

Memory Management


Lets start from where we left on the earlier post: JVMView
We know each JVM instance is born with a Method Area and a Heap Memory



Heap Memory:
The place where object resides, the Default maximum heap size is 64 Mb, this may be configured with the following VM options as well:

·         -Xmx<size> - to set the maximum Java heap size
·         -Xms<size> - to set the initial Java heap size

Method Area or Non-Heap Area:
This is place where binary code, runtime constant pool, static fields/classes resides. GC will not visit this memory. By default the maximum size of this area is 64 Mb, this may be configured as:

          -XX:MaxPermSize = to set max non heap size










Memory Management:
This is process of managing memory by allocating portion of memory during object creation and reclaiming memory when object is not reachable/referred.

In some programming, reclaiming memory is programmer's responsibility, which is quite complex; developer could spent most of time in debugging. In Java these overhead is taken care automatically by automatic memory management [Garbage Collector]

Garbage Collection Algorithm:
Garbage collection was invented by John McCarthy

Garbage Detection:
As a first step, GC has to detect/identify garbage [objects which are not referred directly/indirectly], for that GC first defines set of roots, from which it will try to reach out to other objects, objects which are not reachable are considered as garbage, because they are no longer required for program execution.

Tracing Approach [All modern JVM prefer this approach]:
Tracing start from root objects and traverse through the whole graph to find out live objects. Objects that are traversed during the trace are marked. After the trace is complete, unmarked objects are known to be unreachable and can be garbage collected. This basic tracing algorithm is called "mark and sweep." 

Yellow are the objects, which are visited by GC during tracing and rest of the objects [dark blue] are considered as non-referred objects and can be garbage collected













So algorithm is quite simple right? Wait!

We all know a real time application is going to have more no. of objects. Expecting GC to traverse through each and every object is going to be time consuming and could degrade the system. Assume a application has some cache objects - which means they are going to be live for long time [if not till the end of the program, at least for some long time] – so GC performance can be improved if it can avoid visiting these kind of long lived objects regularly.

The two important observations about objects:
  1. Most objects have very short lives [ex: local objects].
  2. Some objects have long lifetimes [ex: Cache objects], and it could refer to any short-lived object as well.
Having these two points as base assumption, GC group’s objects by age, and GC collects younger [short lived] object more often than older [long lived] ones.

Generational Collectors:
In this approach, the heap is divided into two sub-heaps; each serves one generation of objects and garbage will be collected from each heaps separately. 

When an object is created it allocates memory in the younger generation, if the object survives few younger generation garbage collections then the object will get promoted to old generation.

The younger generation is typically small and collected frequently. As most objects are short-lived, only a small percentage of young objects are likely to get promoted to older generation. On the other hand the older generation is typically larger and collected less frequently as it is expected to have long lived objects. 


In order for an object to be retained in younger generation for few cycles, the younger generation is further split in to three areas. 


The Eden: The place where most new objects are allocated, this place is empty after a garbage collection cycle

Two-survivor space: These hold objects, which has survived one garbage collection and have been given another chance to become unreachable before they promoted to old generation. Like shown in the above figure only one survivor space will have object while the other is most of the time empty.


Demonstration of a Minor [Young generation] GC cycle:

  • In general all objects will be initially created in Younger generation [Eden space]
  • When GC runs in young generation heap[both in eden and from], the object found for garbage are marked as X.  
  • Live objects found in the Eden are copied to unused survivor space.
  • Live objects in the survivor space [From], which needs to be given another chance, are copied to unused survivor space
  • Live objects in the survivor space[From], which are old enough, are promoted to old generation


  • Towards the end of the minor GC, the two survivor space swaps space [From <--> To]
  • Eden will be empty, one survivor space[From] will have objects and other [To] will be empty

The cycle keeps continuing... At the time of next run: the Eden will be filled by new objects, From will have objects which has survived previous run, and To will be empty as usual. The GC cycle repeats and non-referenced  objects are cleaned up.

With this I assume we have some good understanding how GC deals with short-lived objects.
  
Below are the commonly known garbage collectors:

1. The Serial GC:
For young generation, this GC work in the same way mentioned above
For old generation, it mange by “sliding compacting mark-sweep”

Sliding Compacting Mark-Sweep
After identifying the objects, which are going to live in old generation, it slides them towards the beginning of the heap so that free space will be available as contiguous chunk towards the end. Hence memory allocation for any new object will be easier.



Both Minor and Full GC take place in sequence in a Stop the World fashion; the application will be stopped when GC is in progress.

2. The Parallel GC:
This operates similar to Serial GC [Stop the World Fashion], however both the minor and full garbage collections take place in parallel.

3. The Mostly Concurrent GC:
Also known as Concurrent Mark-Sweep [CMS], manages the young generation in the same way [stop the world fashion], where as the old generation [large part of the heap] is managed concurrently. Unlike other two GC, this does not do the fragmentation, free space are not contiguous as a result allocation into old generation is expensive.

Application, which requires huge heap, tends to opt to this GC. Since Heap is huge the GC is going to take more time, and this does not stop the application while GC is in progress

4. The Garbage-First GC:
This is parallel, concurrent, and compacting low-pause GC.
This does not have physically separate spaces for young and old generation but it is a set of regions which allows this to resize young generation in a flexible way

Detailed description of these GC is beyond the scope of this article.

Notes:

Out Of Memory:When a request is made to create new object, and if heap does not have sufficient space after GC cycle then application crashes with OOM error.

The first thing comes to our mind is to increase the heap size, but before doing that we need to check whether the app really requires huge heap? As the problem might be because of Memory leak, if that is the case then the increased heap might not be sufficient after some period of time. Using Eclipse Memory Analyzer Tool [MAT] we can identify memory leak, there are many such tools available in market. 

Even if you decide to increase heap size, keep in the mind that the huge the heap the more the workload on GC as it has to traverse through the entire heap 

Avoid Object Pooling: Pooled objects are long-lived objects and it will get into Old generation

Sloppy sizing of Array: If Array list is initially sized too small, then its backing array might subsequently needs to be resized several times causing unnecessary allocation

Command Line Tools: -XX:+PrintGCDetails  is command line argument to print the GC information during runtime. This will print Heap [Eden, From, To], Non-Heap memory usage.



References:
http://docs.oracle.com/javase/specs/jvms/se7/html/index.html
http://www.artima.com/insidejvm/ed2/gcP.html
Java Performance by Charlie Hunt & Binu John
http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
http://www.yourkit.com/docs/kb/sizes.jsp