Lets start from where we left on the earlier post: JVMView
We
know each JVM instance is born with a Method Area and a Heap MemoryHeap Memory:
The place where object resides, the Default maximum heap size is 64 Mb, this may be
configured with the following VM options as well:
· -Xmx<size>
- to set the maximum Java heap size
· -Xms<size>
- to set the initial Java heap size
Method Area or Non-Heap Area:
This is place where binary code, runtime constant pool,
static fields/classes resides. GC will not visit this memory. By default the maximum size of this area is 64 Mb, this
may be configured as:
-XX:MaxPermSize
= to set max non heap size
Memory Management:
This is process of managing memory by allocating portion of memory during object creation and reclaiming memory when object is not reachable/referred.
In some programming, reclaiming memory is programmer's responsibility, which is quite complex; developer could spent most of time in debugging. In Java these overhead is taken care automatically by automatic memory management [Garbage Collector]
Garbage Collection Algorithm:
Garbage Detection:
As a first step, GC has to detect/identify garbage [objects which are not referred directly/indirectly], for that GC first defines set of roots, from which it will try to reach out to other objects, objects which are not reachable are considered as garbage, because they are no longer required for program execution.
Tracing Approach [All modern JVM prefer this approach]:
Tracing start from root objects and traverse through
the whole graph to find out live objects. Objects that are traversed during the
trace are marked. After the trace is complete, unmarked objects are known to be
unreachable and can be garbage collected. This basic tracing algorithm is
called "mark and sweep."
Yellow are the objects, which are visited by GC during tracing and rest of the objects [dark blue] are considered as non-referred objects and can be garbage collected |
So algorithm is quite simple right? Wait!
We all know a real time application is going to have more
no. of objects. Expecting GC to traverse through each and every object is going
to be time consuming and could degrade the system. Assume a application has some cache objects - which means they
are going to be live for long time [if not till the end of the program, at
least for some long time] – so GC performance can be improved if it can avoid visiting
these kind of long lived objects regularly.
The two important observations about objects:
- Most
objects have very short lives [ex: local objects].
- Some objects have long lifetimes [ex: Cache objects], and it could refer to any short-lived object as well.
Having these two points as base assumption, GC group’s
objects by age, and GC collects younger [short lived] object more often than
older [long lived] ones.
Generational Collectors:
In this approach, the heap is divided into two sub-heaps;
each serves one generation of objects and garbage will be collected from each heaps separately.
When an object is created it allocates memory in the younger generation, if the object survives few younger generation garbage collections then the object will get promoted to old generation.
When an object is created it allocates memory in the younger generation, if the object survives few younger generation garbage collections then the object will get promoted to old generation.
The younger generation is typically small and collected
frequently. As most objects are short-lived, only a small percentage of young
objects are likely to get promoted to older generation. On the other hand the
older generation is typically larger and collected less frequently as it is
expected to have long lived objects.
In order for an object to be retained in younger
generation for few cycles, the younger generation is further split in to three areas.
The Eden: The place where most new objects are allocated,
this place is empty after a garbage collection cycle
Two-survivor space: These hold objects, which has survived
one garbage collection and have been given another chance to become unreachable
before they promoted to old generation. Like shown in the above figure only one
survivor space will have object while the other is most of the time empty.
Demonstration of a Minor [Young generation] GC cycle:
- In general all objects will be initially created in Younger generation [Eden space]
- When GC runs in young generation heap[both in eden and from], the object found for garbage are marked as X.
- Live
objects found in the Eden are copied to unused survivor space.
- Live
objects in the survivor space [From], which needs to be given another
chance, are copied to unused survivor space
- Live objects in the survivor space[From], which are old enough, are promoted to old generation
- Towards the end of the minor GC, the two survivor space swaps space [From <--> To]
- Eden will be empty, one survivor space[From] will have objects and other [To] will be empty
The cycle keeps continuing... At the time of next run: the Eden will be filled by new objects, From will have objects which has survived previous run, and To will be empty as usual. The GC cycle repeats and non-referenced objects are cleaned up.
With this I assume we have some good understanding how GC deals with short-lived objects.
With this I assume we have some good understanding how GC deals with short-lived objects.
Below are the commonly known garbage collectors:
1. The Serial GC:
For young generation, this GC work in the same way mentioned
above
For old generation, it mange by “sliding compacting
mark-sweep”
Sliding Compacting Mark-Sweep
After identifying the objects, which are going to live in
old generation, it slides them towards the beginning of the heap so that free
space will be available as contiguous chunk towards the end. Hence memory allocation for any new object will be easier.
Both Minor and Full GC take place in sequence in a Stop
the World fashion; the application will be stopped when GC is in progress.
2. The Parallel GC:
This operates similar to Serial GC [Stop the World
Fashion], however both the minor and full garbage collections take place in
parallel.
3. The Mostly Concurrent GC:
Also known as Concurrent Mark-Sweep [CMS], manages the young
generation in the same way [stop the world fashion], where as the old
generation [large part of the heap] is managed concurrently. Unlike other two GC, this does not do the fragmentation, free
space are not contiguous as a result allocation into old generation is
expensive.
Application, which requires huge heap, tends to opt to
this GC. Since Heap is huge the GC is going to take more time, and this does
not stop the application while GC is in progress
4. The Garbage-First GC:
This is parallel, concurrent, and compacting low-pause GC.
This does not have physically separate spaces for young
and old generation but it is a set of regions which allows this to resize young
generation in a flexible way
Detailed description of these GC is beyond the scope of
this article.
Notes:
Out Of Memory:When a request is made to create new object, and if heap
does not have sufficient space after GC cycle then application crashes with OOM
error.
The first thing comes to our mind is to increase the heap
size, but before doing that we need to check whether the app really requires
huge heap? As the problem might be because of Memory leak, if that is the case
then the increased heap might not be sufficient after some period of time. Using Eclipse Memory Analyzer Tool [MAT] we can identify memory leak, there are many such tools available in market.
Even if you decide to increase heap size, keep in the mind
that the huge the heap the more the workload on GC as it has to traverse
through the entire heap
Avoid Object Pooling: Pooled objects are long-lived
objects and it will get into Old generation
Sloppy sizing of Array: If Array list is initially sized
too small, then its backing array might subsequently needs to be resized several
times causing unnecessary allocation
Command Line Tools: -XX:+PrintGCDetails is command line argument to print the GC information during runtime. This will print Heap [Eden, From, To], Non-Heap memory usage.
References:
http://docs.oracle.com/javase/specs/jvms/se7/html/index.html
http://www.artima.com/insidejvm/ed2/gcP.html
Java Performance by Charlie Hunt & Binu John
http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
http://www.yourkit.com/docs/kb/sizes.jsp