Java Notebook

Friday, May 8, 2015

Focus SOAP

SOAP, Payload, RPC, WS, JAXB, SOAP 11, 12, what are all these? How are these used in webservice? Do you have these question in your mind as well?

I recently read a wonderful book titled "Java Webservices Up & Running" by "Martin Kalin" which explains the internal working of SOAP in detail. I thought of sharing my learning in my own words.

What is SOAP?

SOAP - A XML

In any client/server architecture data needs to be passed in some form. Here it is an xml which is nothing about SOAP.

Components of SOAP Message

SOAP - An Infrastructure [Transport Neutral]

SOAP can be transported through SMTP, MQ, etc. However HTTP remains most popular SOAP transport protocol.

SOAP - A Contract [WSDL]

SOAP provides a contract/agreement in the form of WSDL [webservice description language], where the service, operations, transport, location, etc are described.

In order to implement a working SOAP model you might be assuming that we have to be well versed with Networking, XML [Service call, XML construction/Parsing, etc] but that is not the case as there are utilities [available in almost all languages] which makes things simpler.

Java Support for Webservice - [JaxWS]:

Starting Java 1.5 comes bundled with JAX-WS, which provides utility to generate stub, request, response & dependent class from the wsdl, service. First version of JAX-WS was called JAX-RPC, whose support is limited to RPC style.

Two styles of SOAP Web services:

RPC [Remote Procedure call] - It supports only simple types like String, Integer, etc.

Document - Can support Complex types. The WSDL type section defines the schema of complex types. By default the style is Document.

RPC Style

Server Side Implementation:

1. Service

Note: The style is explicitly declared as RPC

2. Publish the Service

In general the service are deployed to J2EE server like tomcat, but to keep things simple the above service will be published in a standalone mode [using Endpoint]

3. WSDL

Up on execution java EndpointPublisher, the service starts and the WSDL can be viewed upon hitting the url http://localhost:9090/calc?wsdl

Takeaway sections from WSDL

types Defines all complex types involved in the service.
Here it will be empty, as RPC is meant for simple types

portType Defines the service signature, equivalent to java interface
i.e., method [operation] name, arguments [input] & return [output]

binding Can be considered as implementation of above interface [portType].
This also declares transport, soap style. Here the transport – http, & style is RPC

service Holds the URL where service is available

Client side Implementation:

1. Generate client classes from WSDL

WS import utility will generate required client classes from WSDL. Earlier this utility was called as WSDL-to-Java, now it is renamed to wsimport.

wsimport -p rpcclient -keep http://localhost:9090/calc?wsdl

where:

p - package [files will be generated under this package]
keep - Retain the Source code of client class

This generates the required stub under rpcclient folder

2. Client

Where CalcServiceService & CalcService are the wsimport generated classes. The service.getCalcServicePort() returns port [ie.,interface] from which we can make the necessary operation

Upon executing the client, Java CalcClient

Result: output 75.0

Here we deal only with Java objects, however behind the screen it’s all SOAP xml. Jax-Ws takes care of binding java & xml using JAXB [Java API for XML Binding].

Below is the SOAP Request/Response of the above service/client captured using tcpmon. The soap webservice can be invoked directly with soap request with any soap supported tool like soap ui, etc.

Document Style

Server Side Implementation:

1. Service - Remove Style from Service [By default the style is document]

2. Wsgen utility generates the java types required by publisher to generate WSDL [can be ignored in ede's like eclipse]

wsgen -cp . learning.websrvc.soap.service.doc.calc.CalcService

3. Publish the DOC service [same like RPC]

4. WSDL View

Up on execution java EndpointPublisher, the service starts and the WSDL can be viewed upon hitting the url http://localhost:9090/calc?wsdl

One of the key difference between RPC & DOC wsdl is the type section

Here it defines all complex types involved. The complex types can be directly embedded in wsdl or the schema can be imported in to the type section.

Client Side Implementation: [The same as RPC]

1. Generate client classes from wsdl

wsimport -p docclient -keep http://localhost:9090/calc?wsdl

2. Client

Document style is not limited to simple types, this can support complex types as well. WSDL type section holds all complex types involved in service.

Here Document client looks similar to RPC style i.e., parameterized invocation. This type of client is called Wrapped Client - where the req/response is wrapped to look like parameterized call. There is another type called UnWrapped/BARE client – where the request/response are exposed as it is as single object. These two types differs only on the client side, the soap req/resp does not have any change nor the server side. Wrapped is by default and is commonly used.

UnWrapped/Bare Client

1. Explicitly declare the Wrapper False, in a file name "custombinding.xml", and pass this to wsimport

wsimport -p rpcclient -keep http://localhost:9090/calc?wsdl –b custombinding.xml

2. Client

Here you can see, the request & response is exposed as it is in SOAP in to a single object Add & AddResponse respectively.

Handlers:

Just like filters in servlet we have handlers in SOAP, which can intercept request, response at both client & service side. Handlers are classified into two types

1. SOAP Hander - Has access to SOAP Envelope [Header + Body]

2. Logical Handler - Has access only to the payload [Body]

Handler chain can be configured in the order it needs to get invoked on every in/out. I have seen usecase where the authentication, logging are done at the handlers level. Incase of any failure the handlers can throw Exception [An exception class with getFaultMessage() ]

There are more to it. But I will wrap up here. I believe this will be useful piece of info for you as well. Please don’t forget to leave your feedback, suggestion, improvement comments.

Thank you!

Monday, January 14, 2013

Memory Management

Lets start from where we left on the earlier post: JVMView

We know each JVM instance is born with a Method Area and a Heap Memory

Heap Memory:

The place where object resides, the Default maximum heap size is 64 Mb, this may be configured with the following VM options as well:

· -Xmx<size> - to set the maximum Java heap size

· -Xms<size> - to set the initial Java heap size

Method Area or Non-Heap Area:

This is place where binary code, runtime constant pool, static fields/classes resides. GC will not visit this memory. By default the maximum size of this area is 64 Mb, this may be configured as:

-XX:MaxPermSize = to set max non heap size

Memory Management:

This is process of managing memory by allocating portion of memory during object creation and reclaiming memory when object is not reachable/referred.

In some programming, reclaiming memory is programmer's responsibility, which is quite complex; developer could spent most of time in debugging. In Java these overhead is taken care automatically by automatic memory management [Garbage Collector]

Garbage Collection Algorithm:

Garbage collection was invented by John McCarthy

Garbage Detection:

As a first step, GC has to detect/identify garbage [objects which are not referred directly/indirectly], for that GC first defines set of roots, from which it will try to reach out to other objects, objects which are not reachable are considered as garbage, because they are no longer required for program execution.

Tracing Approach [All modern JVM prefer this approach]:

Tracing start from root objects and traverse through the whole graph to find out live objects. Objects that are traversed during the trace are marked. After the trace is complete, unmarked objects are known to be unreachable and can be garbage collected. This basic tracing algorithm is called "mark and sweep."

Yellow are the objects, which are visited by GC during tracing and rest of the objects [dark blue] are considered as non-referred objects and can be garbage collected

So algorithm is quite simple right? Wait!

We all know a real time application is going to have more no. of objects. Expecting GC to traverse through each and every object is going to be time consuming and could degrade the system. Assume a application has some cache objects - which means they are going to be live for long time [if not till the end of the program, at least for some long time] – so GC performance can be improved if it can avoid visiting these kind of long lived objects regularly.

The two important observations about objects:

Most objects have very short lives [ex: local objects].
Some objects have long lifetimes [ex: Cache objects], and it could refer to any short-lived object as well.

Having these two points as base assumption, GC group’s objects by age, and GC collects younger [short lived] object more often than older [long lived] ones.

Generational Collectors:

In this approach, the heap is divided into two sub-heaps; each serves one generation of objects and garbage will be collected from each heaps separately.

When an object is created it allocates memory in the younger generation, if the object survives few younger generation garbage collections then the object will get promoted to old generation.

The younger generation is typically small and collected frequently. As most objects are short-lived, only a small percentage of young objects are likely to get promoted to older generation. On the other hand the older generation is typically larger and collected less frequently as it is expected to have long lived objects.

In order for an object to be retained in younger generation for few cycles, the younger generation is further split in to three areas.

The Eden: The place where most new objects are allocated, this place is empty after a garbage collection cycle

Two-survivor space: These hold objects, which has survived one garbage collection and have been given another chance to become unreachable before they promoted to old generation. Like shown in the above figure only one survivor space will have object while the other is most of the time empty.

Demonstration of a Minor [Young generation] GC cycle:

In general all objects will be initially created in Younger generation [Eden space]
When GC runs in young generation heap[both in eden and from], the object found for garbage are marked as X.
Live objects found in the Eden are copied to unused survivor space.
Live objects in the survivor space [From], which needs to be given another chance, are copied to unused survivor space
Live objects in the survivor space[From], which are old enough, are promoted to old generation

Towards the end of the minor GC, the two survivor space swaps space [From <--> To]
Eden will be empty, one survivor space[From] will have objects and other [To] will be empty

The cycle keeps continuing... At the time of next run: the Eden will be filled by new objects, From will have objects which has survived previous run, and To will be empty as usual. The GC cycle repeats and non-referenced objects are cleaned up.

With this I assume we have some good understanding how GC deals with short-lived objects.

Below are the commonly known garbage collectors:

1. The Serial GC:

For young generation, this GC work in the same way mentioned above

For old generation, it mange by “sliding compacting mark-sweep”

Sliding Compacting Mark-Sweep

After identifying the objects, which are going to live in old generation, it slides them towards the beginning of the heap so that free space will be available as contiguous chunk towards the end. Hence memory allocation for any new object will be easier.

Both Minor and Full GC take place in sequence in a Stop the World fashion; the application will be stopped when GC is in progress.

2. The Parallel GC:

This operates similar to Serial GC [Stop the World Fashion], however both the minor and full garbage collections take place in parallel.

3. The Mostly Concurrent GC:

Also known as Concurrent Mark-Sweep [CMS], manages the young generation in the same way [stop the world fashion], where as the old generation [large part of the heap] is managed concurrently. Unlike other two GC, this does not do the fragmentation, free space are not contiguous as a result allocation into old generation is expensive.

Application, which requires huge heap, tends to opt to this GC. Since Heap is huge the GC is going to take more time, and this does not stop the application while GC is in progress

4. The Garbage-First GC:

This is parallel, concurrent, and compacting low-pause GC.

This does not have physically separate spaces for young and old generation but it is a set of regions which allows this to resize young generation in a flexible way

Detailed description of these GC is beyond the scope of this article.

Notes:

Out Of Memory:When a request is made to create new object, and if heap does not have sufficient space after GC cycle then application crashes with OOM error.

The first thing comes to our mind is to increase the heap size, but before doing that we need to check whether the app really requires huge heap? As the problem might be because of Memory leak, if that is the case then the increased heap might not be sufficient after some period of time. Using Eclipse Memory Analyzer Tool [MAT] we can identify memory leak, there are many such tools available in market.

Even if you decide to increase heap size, keep in the mind that the huge the heap the more the workload on GC as it has to traverse through the entire heap

Avoid Object Pooling: Pooled objects are long-lived objects and it will get into Old generation

Sloppy sizing of Array: If Array list is initially sized too small, then its backing array might subsequently needs to be resized several times causing unnecessary allocation

Command Line Tools: -XX:+PrintGCDetails is command line argument to print the GC information during runtime. This will print Heap [Eden, From, To], Non-Heap memory usage.

References:

http://docs.oracle.com/javase/specs/jvms/se7/html/index.html

http://www.artima.com/insidejvm/ed2/gcP.html

Java Performance by Charlie Hunt & Binu John

http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)

http://www.yourkit.com/docs/kb/sizes.jsp

Thursday, December 20, 2012

The Class Loader Subsystem

The Class Loader is one of the key components of Java, All modern day java containers [EJB, Servlet,..] has custom class loaders. An in-depth understanding of class loaders is required for developing, debugging, packaging, deployment and better architecture of J2EE components.

As Wikipedia says, "The Java Classloader is a part of the Java Runtime Environment that dynamically loads Java classes into the Java Virtual Machine". It is as simple as that!

Now, even before we look into how class loader works, it is important to understand what loading a class means?

A class has to go through three phases in order to get loaded in to the JVM, they are Loading, Linking and Initialization.

The Java Specifications mandates that the Initialization phase should happen only when the Class gets its first active usage whereas Loading & Linking phases can vary from JVM to JVM. Most of the JVM loading happens much earlier whereas the linking and initializing will be delayed till the first use.

By delaying the Initialization phase, the class loading is delayed until the class is referred/required [using new, static reference, etc.] thus reducing memory usage and improving system response time.

Irrespective of when the loading, linking or initializing happens, any error occurred during these phases will be captured in the form of Linkage Error and thrown only at its first usage of that class.

The Class Loader is responsible for doing all these three phases, now let's see how Class Loader functions.

Almost all Class loader subsystem has minimum of three class loaders in it; in addition to that we can write our own class loaders. Almost all java programs/containers will relay on the class loaders delegation hierarchy to load their classes into JVM.

Bootstrap or Primordial Class loader: This is responsible for loading only the core Java API (e.g. classes files from rt.jar). This is the only class loader, which comes with JVM implementation [Mostly written in C, C++]. This is root of all the class loader hierarchy.

Extension Class loader: Responsible for loading classes from the Java extension directory (i.e. classes or jars in jre/lib/ext of the java installation directory). Bootstrap is the parent of this class loader.

System class loader or Application class loader: Responsible for loading classes from the java class path [class directories, jars from class path]. Extension class loader is the parent of this class loader. This class loader is by default the parent of all the custom/user defined class loaders in a java application.

The Bootstrap class loader is the primordial class loader comes with JVM implementation, rest of the class loaders are just like any other java objects.

Consider the below example: We are trying to find the class loader responsible for loading three classes, where the MyConstant.class is placed in ././ jre/lib/ext directory as well as in local workspace and DemoTest is the main class from local workspace

The output will be something like this

Where:

ExtensionClassLoader loads MyConstant class from ext directory

ApplicationClassLoader loads DemoTest class from local workspace

BootStrapClassLoader loads String class but it is null because it is not a java object

But we know that MyConstant class is there in the ext directory as well as the local workspace then why does the AppClassLoader didn’t load the class?

To understand this we need to be aware of the Class Loader Algorithm, which is based on Delegation Hierarchy Principle:

The current class loaders checks whether the class is already loaded, if so return it
If class is not already loaded, it delegates the work of loading classes to their parent. This delegation can go up to BootStrapClassLoader
If the parent classloader cannot find the class, then the current classloader attempt to find them locally.
If the class is not found then java.lang.ClassNotFoundException will be thrown

In our case, when
String class is referred - the AppClassLoader delegates the request to its parent ExtClassLoader, which in turn delegates the request to its parent BootStrapClassLoader. The Bootstrap class loader finds the class from java.lang package and returns it back

MyConstant class is referred - the ApplicationClassLoader delegates the request to its parent ExtClassLoader, which in turn delegates the request to BootStrapClassLoader; since MyConstant is not part of java api jars BootStrapClassLoader fails to find the class so the request comes back to ExtensionClassLoader, and since MyConstant class is available in ext directory, the ExtensionClassLoader loads the class.

That’s why the MyConstant was not loaded from local classpath; instead ExtClassLoader loaded it from ext directory.

Every class will be loaded only once by a class loader, there could be chances where a same class might get loaded by different class loaders, which is covered in the Class Loading Errors and Suggestion section.

With this, Assume when we are packaging a war file having its own libraries, say log4j jar [version x] whereas the servlet container [server] has its own library [log4j jar version y].

In this case, we want our application [war] to load your library classes [log4j of version x] and not the container’s. How it is achieved?

Java EE Delegation Hierarchy:

In general almost all container will have its own class loader to load their libraries.
Also every WAR/EAR has its own class loader as well.

The Java Servlet specification recommends [does not force] the war classloader should load the class even before it delegate to its parent i.e., container class loader. Container class loader will work in the normal way, it delegates to system, extension and bootstrap classloader.

Most of the application container this recommendation is followed, in container like GlassFish we will be able to configure on which delegation model to be followed.

With this I hope you would have got some basic understanding on class loader. Before we complete I would also want to let you know about few of the class loading error, which you would have seen very often

Class Loading Errors and Suggestions

ClassNotFoundException: This occurs when we try to load the class dynamically [Class.forName() or ClassLoader.loadClass() - where the class name is passed as String] and the class is not found. This is a Checked Exception and caller can perform any corrective action.

NoClassFoundError : This error occurs a class is directly referred [new or ClassName] in the code is not found. This comes under LinkageError, it occur's in the first phase of class loading life cycle.

ClassCastException: This exception occurs in two cases

When we try to cast object of two different incompatible types, which can be solved easily
When more than one class loader loaded the same class and casting between them [casting of object of same type loaded by different classloader]

Assume there is an EAR package, consisting of a WAR, and its library files [say lib1.jar, lib2.jar]. The WAR package in turn has its own library say [the same lib2.jar]

A java class from WAR package calls a method from lib1.jar say DemoHelper.getDemo() [available in EAR package], and that method returns a object from lib2.jar [say Demo] and it is casted to Demo. Something similar to below code snippet

Now the class cast exception will be thrown because the Demo class loaded by the WAR class loader is different from the Demo class loaded by EAR Class loader.

Just to reiterate the JavaEEContainer Delegation hierarchy, the WAR/EAR classloader will try to load the required class before it delegate to its parent.

In our case, when Demo is referred in EAR jar, the EAR classloader loads the Demo class from its library, and when Demo class is referred from WAR class [casting], the WAR classloader tries to loads the class from its library and it finds one but it is different from the only returned from the other class loader hence the error occurred.

Incase the WAR classloader couldn't find the Demo class [lib2.jar is not there in WAR], then the request would have delegated to its parent class loader, i.e. EAR class loader and which would have returned the already loaded Demo class hence this issue could have avoided.

To Summarize:

Class loaders are a powerful mechanism for dynamically loading software components on the Java platform
Every loaded class has an instance of the java.lang.Class object in the heap
Any class will be loaded only once by a class loader hierarchy
In general every class loader delegates the class load requests to its parent before trying to load the class on its own
Avoid using duplicated libraries, a good understanding of class loaders is required for packaging, deployment of J2EE components

There are much more on class loading, reloading, Hence requesting the readers to continue exploring. Below are some of the articles on Class Loaders.

http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html

http://www.artima.com/insidejvm/ed2/lifetype.html

http://tutorials.jenkov.com/java-reflection/dynamic-class-loading-reloading.html#dynamicloading

http://geekexplains.blogspot.in/2008/07/loading-linking-initialization-of-types.html

http://zeroturnaround.com/labs/rebel-labs-tutorial-do-you-really-get-classloaders/

http://javapapers.com/core-java/java-class-loader/

http://java.sys-con.com/node/37659

http://developeriq.in/articles/2007/oct/03/java-class-loader-class-apart/

http://onjava.com/pub/a/onjava/2005/01/26/classloading.html