David's profileDavid BuccolaPhotosBlogLists Tools Help
July 11

Configuring dfc.properties for an application archive

When you obtain a new application server archive such as war file or ear file and that application archive contains a copy of DFC you are often faced with the task of configuring the dfc.properties file for the archive before it can be used. One common technique for doing this is to insert a dfc.properties file of your choice into the archive. For an ear file you can place your dfc.properties into the /APP-INF/classes directory of the archive. For a war file you can place your dfc.properties into the /WEB-INF/classes directory of the archive.  This task can be accomplished using the jar utility from the JDK or using a zip utility such as Winzip.

For production environments that is often a good way to go. For development environments, however, constantly updating the archive each time you get a new version can be tedious. Another approach is to use a system property. If you set the “dfc.properties.file” system property then DFC will use that value to locate dfc.properties. For example:

java –Ddfc.properties.file=C:/Documentum/config/dfc.properties …

You can set this system property in a number of ways. If you are using an IDE to launch the application server you can generally set the system property in the “run” configuration screen of your IDE.

If you are starting your application server from a script you can modify the startup script to include the system property definition. In the case of Tomcat, you can also set the system property in an environment variable. For example:

set JAVA_OPTS=–Ddfc.properties.file=C:/Documentum/config/dfc.properties

Use the system property to select your dfc.properties externalizes the configuration from the application archive and allow you to easily switch between different versions of the archive without the need to configure each.

July 08

Configure Jersey Chunked Encoding when Using Multipart

Make sure to configure the Jersey chunked encoding size when using MIME multipart content.

There are a couple different modes of HTTP content transfer used by Jersey. If the length of the content is known up front then Jersey uses a fixed streaming mode for content transfer. An example of when the length is known is when the content comes from a file. If, however, the length is not known ahead of time then Jersey has two options. It can use a non-streaming mode or a chunked encoding mode. MIME multipart is a case when the length is not known up front and therefore transfer occurs in one of these two later modes.

The problem is that the default chosen by Jersey when MIME multipart is involved is to use non-streaming mode. The non-streaming mode copies data temporarily into a ByteArrayInputStream and for larger content this process creates a lot of temporary short-lived objects which places stress on the garbage collector. It also consumes lots of memory while the entire message is being buffered.

Luckily this behavior is configurable under Jersey. The solution is to configure the chunked encoding size so that Jersey will use chunked encoding instead of non-streaming mode whenever MIME multipart content is encountered. Here is a brief example of how this is done on the client:

ClientConfig config = new DefaultClientConfig();
config.getProperties().put(
    ClientConfig.PROPERTY_CHUNKED_ENCODING_SIZE, 32 * 1024);

Avoid Jar Indexing

If your classpath includes jars that are indexed then performance of service factory lookup can be severely impacted. You should avoid using indexed jars. During a quick scan of built-in JRE jars I saw no sign of jar indexing so it appears not particularly important to use. If the JRE doesn’t find it useful then perhaps we shouldn’t either.

The problem relates to caching of information in the classloader infrastructure. The service factory lookup algorithm performs a global search through the classpath for a particular resource (javax.xml.parsers.SAXParserFactory for example). Observation of system behavior and a brief examination of the JRE code suggest that this lookup is cached for jars. This means after the first lookup subsequent lookups perform no further file system activity. This caching helps reduce the cost of subsequent service factory lookups. In the case of indexed jars, however, I noticed that this caching breaks down. Information from dependant jars that are referenced through a jar index doesn’t seem to get cached correctly. This means that for every single service factory lookup the jars referenced through the index are physically scanned again for the resource. This can cause a lot of additional file system activity.

I noticed the problem when working with Jersey and JAXB. It turns out that certain usage patterns of Jersey and JAXB can result in a very high number of service factory lookups (for SAXParserFactory) and those lookups are extra slow if you happen to have an indexed jar in your classpath. In this case there were two issues compounding each other to become an obvious performance problem when each on their own might go unnoticed. Without excessive factory lookups or without indexed jars then the problem is not nearly so noticeable. With both, however, you end up with lots of extra file system activity that definitely impacts performance.

It is interesting to note how I ended up in this situation. Previous to this I wasn’t even aware of jar indexing. I was using maven to build my project and decided that I wanted to add some manifest information to my jar. I searched the internet and found an example of creating a jar manifest with maven. I proceeded to cut and paste the sample into my own project. Unfortunately it just so happened that the sample also requested jar indexing. Oops, now my performance was worse. I fear anyone else that finds this same sample could encounter the same problem and not even know.

July 07

Cache Jersey JAXB Marshaller and Unmarshaller Instances

If you create new JAXB Marshaller and Unmarshaller instances often then performance suffers. To avoid unnecessary creations you should cache and reuse your Marshaller and Unmarshaller instances.

The reason performance is affected is that creation of each new Marshaller or Unmarshaller instance results (indirectly) in the creation of a new SAXParserFactory deep in the bowels of JAXB. This creation of a new SAXParserFactory for each Marshaller or Unmarshaller can be very expensive. SAXParserFactory creation involves a service lookup that scans the classpath for “javax.xml.parsers.SAXParserFactory”. If your classpath includes directories or includes jars that are indexed then a lot of file system activity can occur and affect your performance.

Ideally it would be nice if JAXB did not create so many SAXParserFactory instances. Factory creation can be expensive and it is not a good strategy to perform factory lookups so often. Why does JAXB create so many SAXParserFactory instances? Is it a bug in JAXB? Typically when such code is found in applications it is considered bad coding practice.

After thinking about it some I came to the conclusion that it probably isn’t a bug in JAXB. These are smart guys and I couldn’t imagine them making this kind of choice on accident. Though I don’t know the exact reasoning I assume it relates to the fact that JAXB is a low-level library that can be used from many different contexts and it is difficult for JAXB to make a choice that meets all use cases.

Unfortunately this behavior is deep in the bowels of JAXB in protected or private methods and does not appear easy to change externally. In the case of Unmarshaller, for example, the factory creation is found in the protected method AbstractUnmarshallerImpl.getXMLReader(). Your best bet to avoid the expense is to simply not create Marshaller or Unmarshaller instances more often than necessary.

The fix this problem in my Jersey-based REST service I created my own custom Jersey ContextResolver for Marshaller and Unmarshaller. The custom resolver maintains a thread-based cache of Marshaller and Unmarshaller instances and thereby avoids unnecessary creation of fresh instances and the corresponding expense of the excessive SAXParserFactory creations.

Configure JVM Memory with –Xms when Using Jersey and JAXB

When using JAXB and Jersey make sure you configure the –Xms property to tune your JVM initial memory allocation. If you forget to do this you’ll likely end up with very poor performance caused by garbage collector thrashing.

Prior to discovering this problem I was a bit naïve about garbage collection (actually I must admit that I still am). I knew that big high performance production servers benefited from lots of memory configuration but I thought that smaller test and development environments would survive just fine mostly on JVM defaults. For these smaller environments I would just set –Xmx. My naïve assumption was that memory would start small and then grow as needed to the maximum configured.

What I failed to realize is that there are many different partitions of the memory space and that some of these areas are allocated at JVM startup and never grow. In other words, their size is only affected by the initial memory allocation and not the maximum memory allocation. It turns out that certain aspects of Jersey and JAXB can be very sensitive to these initial allocations. This means that if you use Jersey and JAXB you should configure a larger initial memory size even in simple test and development environments and certainly in larger production environments.

The problem is related to short lived temporary objects. It turns out that it doesn’t take much to encounter the problem. It doesn’t take a massive multithreaded program with lots of allocated objects. I created a single threaded one page program that exhibits the problem. If you run this program under a 32-bit JVM then you will experience the symptom first hand. Try running the program with different initial memory allocations and see what you get.

I first noticed the problem when working with Jersey to do high-performance multipart content transfer. There are a couple different modes of HTTP content transfer used by Jersey. If the length of the content is known up front then Jersey uses a fixed streaming mode of content transfer. If, however, the length is not known ahead of time then Jersey defaults to a non-streaming mode. This non-streaming mode copies data temporarily into a ByteArrayInputStream and for larger content this process creates a lot of temporary short-lived objects. If the initial memory allocation is too low then the result is a significant stress on the garbage collector. In my case of MIME multipart transfer the length is not known up front and therefore content transfer occurs in non-streaming mode and causes the problem to exhibit itself.

Just performing non-streaming HTTP transfer isn’t enough to make the problem obvious though. If the initial memory allocation is too small then performance is mediocre but not horrible. It isn’t until I bring JAXB into the picture that things get really bad. What I discovered is that the simple initialization of a JAXB context consumes some area of Java memory such that programs with lots of short-lived objects start performing really badly. You don’t even need to use JAXB ever again. Just simply create a JAXB context and then never use it and you will see things slow down. I am not exactly sure why this happens. Since the JAXB context is a long-lived object it is not clear to me how it so adversely affects the behavior of short-lived objects. The effect is dramatic though as you can see from the test program.

The moral is: if you use JAXB you better make sure to provide the JVM with a larger initial memory allocation (-Xms) or else your program will slow down dramatically if it creates lots of short-lived objects.

 

David Buccola

Occupation
Location