Notes On Technology: October 2012

Thursday, October 18, 2012

"redhat_transparent_hugepage" can hurt Java performance

Most modern operating systems utilize the paged virtual memory model. In this model each process has its own "virtual" address space and special page table is used to map each virtual memory address to the corresponding physical memory address.

Mapping of each byte of virtual memory to physical address would be very inefficient because for each allocated byte of we need to keep 4+ bytes entry in the page table. In practice memory is allocated by bigger chuncks called pages. Most common page size is 4096 bytes.

There is a trade-off between page size and page table size. For small pages we are wasting memory to keep huge page table. For big pages we're wasting memory on partially-used pages, because it is very rare when a memory allocation request corresponds to exact number of pages and last allocated page is only partially used.

Huge pages is a memory allocation mode when page size is bigger than 4kB. Actual page size depends on the OS and hardware platform and for most of x86 systems it can be either 2MB or 4MB. If a process operates with big blocks of memory and a computer has enough physical memory installed, then "huge pages" mode can significantly improve performance (up to 3x, according to some syntetic tests).

On Linux, if a process wants use "huge pages" support, it should use the libhugetlbfs OS API funtions to request it, i.e. program should be written and compiled with "huge pages" support in mind.

In order to bring "huge page" performance benefits to legacy software and software written without "libhugetlbfs" support, RedHat has implemented custom Linux kernel extension called "redhat_transparent_hugepage".

As I will show below, transparent huge pages is not always a good idea, even if you have more then enough physical memory installed.

To perform automatic Continious Integration builds of our Java software we're using a quite powerful server machine with 2 x Intel(R) Xeon(R) CPU X5660 @ 2.80GHz (2 CPU * 6 cores * 2[HT] = 24 threads), 36GB of memory and RHEL-based Scientific Linux with 2.6.32-279.9.1.el6.x86_64 kernel.

Despite the reasonable number of processors and only 4 concurrent builds running in parallel, the machine was showing a very high CPU average of 35% or more and was plagued with random freezes when almost everyting got blocked for 3-5 seconds.

After throughfull testing and analysis we noticed two interesting facts. First, CPU spends most of its time in the kernel mode. And second, for very short intervals of time "hugememd" daemon eats all CPU power.

Based on this symptoms, it was easy assumption that "hugememd" is strongly correlated with the Java performance degradation and system freezes, what led us to the existing bug report on the CentOS bugtracker.

And indeed, disabling "redhat_transparent_hugepage" extension eliminates the problem completely! Now average CPU rarely goes higher than 5% and server can keep up to 8 parallel Java builds.

It is still not clear to me if the problem is the way JVM manages memory what leads to huge "memory defragmentation" costs or if there is a bug in "redhat_transparent_hugepage" implementation, but the fact is that at the moment "redhat_transparent_hugepages" doesn't work well with Java.

This means that if you are experiencing surprisingly bad perfomance of your Java web server or database with accidental freezes or slowdowns, don't rush to blame Java. It may be your server OS doing some nasty "optimizations" behind you.

Wednesday, October 17, 2012

Apache Ant: customizing JUnit tests bechaviour with system properties

Despite increasing popularity of Apache Maven, Gradle and other modern Java build automation tools, considerable amount (if not the majority) of Java projects are still based on Apache Ant. And vast majority Ant-based builds are using Ant JUnit task to run the tests.

Sometimes it may be necessary to supply some optional configuration to the JUnit tests and developer may not want to store this configuration in the source code control system. Just to name few use cases, it may be a credentials used to establish connection to the testing database or (in case of integration tests) even an user name and password necessary to verify communication with "external" system components in production.

By default Ant JUnit task runs all tests in the same JVM instance and all system properties (including the properties set in command line as "-Dprop=value") are accessible from the tests code. But running tests in the same JVM can cause undesirable side effects.

The easiest way to isolate tests from Ant environment is to set the "fork" attribute of JUnit task to "true". But then all the tests will run in the "clean room" environment and custom properties supplied to Ant build will not be accessible.

Fortuantely, it is easy to instruct JUnit task to pass all (or just some) system properties to the forked JVM by using "property sets".

For example, to pass all system properties to the JUnit test JVM instance, you may use the property reference to builtin set "all":

Thursday, October 11, 2012

EGit "not authorized" error pushing changes to github using EGit

I just lost half an hour trying to push local changes to one of my GitHub repositories over HTTPS.
Despite the fact that I done this many times before and I have a proper write permissions and had specified correct username and password when cloning the repository, I was constantly receiving "not authorized" error!

It appeared that for some reason EGit ignores my credentials when performing a push over HTTPS, in particular EGit doesn't really use the credentials unless they are stored in the Eclipse "Secure Storage".

If you experience same problem, try the following:

click "Configure" button at the bottom of the "not authorized" error dialog
click "Change" button next to URI field
re-type your username and password in the "Authentication" box
(!) set "[x] Store in Secure Store" check box

Thursday, October 4, 2012

Pre-configuring Eclipse JadClipse Java decompiler using Genuitec "Secure Delivery Center"

Genuitec "Secure Delivery Center" (SDC) allows you to provide customized and pre-configured and centrally managed Eclipse distributions for your team(s), behind the firewall.
JadClipse is an Eclipse plug-in that automatically gives you a decompiled version of Java source code for any .class file you have.
This is invaluable when you don't have source code packages for your 3'rd party dependencies.
The current version of JadClipse (v3.4) doesn't include Jad decompiler and requires external Jad decompiler binary to be present somewhere on your system.
Let's assume you already have an SDC-managed Eclipse package. In my case JadClipse was configured on top of Eclipse 3.7.2 SDK based package.

Mirror JadClipse in SDC as "JadClipse"
First of all, you will need to mirror JadClipse as 3'rd party Eclipse extension. In SDC Admin Console, go to Third Party Libraries -> Import New Library -> Import existing Eclipse update site -> Add Source Site.
Use "http://jadclipse.sf.net/update" as URL and choose "JDT Decompiler Feature" from the list of available plug-ins.

Mirror Jad decompiler binaries in SDC as "Jad Decompiler"
Please download Jad decompiler binary for all platforms you want (Windows, Linux, Mac). You don't need to extract .zip archives.

In SDC Admin Console, go to Third Party Libraries -> Import New Library -> Package binary contents for delivery
For each downloaded downloaded jad*.zip archive do "Add binary contents", select .zip archive and mark platform (Windows, Linux or Mac) it's designed for.

Pre-installing new software into Eclipse package
Add "Jad Decompiler" and "JadClipse" to the Eclipse package software list, build and install the package (you may want to use "Test" build first).

Pre-configuring Jad binary location in JadClipse
Open Eclipse installation directory and locate Jad binary location. Typically SDC install it as ECLIPSE_HOME/binary/binary.contents-x.y.z/jad[.exe].
Copy & paste full absolute name of Jad binary to Eclipse - Window - Preferences - Java - Decompilers - Jad - Path to decompiler.
You may also want to check [x] Use Eclipse code formatter on the "Decompilers" page.

JadClipse stores all these settings as Eclipse preferences, so they are easily configurable with SDC.

Related property names are:

/instance/net.sf.jdtdecompiler.jad/net.sf.jdtdecompiler.jad.cmd
/instance/net.sf.jdtdecompiler.ui/net.sf.jdtdecompiler.use_eclipse_formatter

You may find all property names and values by opening Eclipe - Help - About Eclipse - Installation Details - Configuration dialog.

For each supported platform create new text file called "jadclipse-<platform>.epf" using your favorite text editor. Put "file_export_version=3.0" as a header and then property values, one property by line.

Here is the example for Windows platform. Please note that in your case Jad binary location will be different!

file_export_version=3.0
/instance/net.sf.jdtdecompiler.ui/net.sf.jdtdecompiler.use_eclipse_formatter=true
/instance/net.sf.jdtdecompiler.jad/net.sf.jdtdecompiler.jad.cmd=C:/Software/eclipse/binary/binary.contents.9391-Ahb-8849.win_1.5.8/jad.exe

If you need to support more than one platform, you'll need to create an SCD "Environment Policy" and assign it to the Eclipse package. In the environment policy configuration there is a dedicated configuration page for each platform, so you can easily specify platform-specific preferences.

Test and promote package changes

Basically, that's it. After promoting package changes all developers in your team will automatically get pre-configured JadClipse working out of the box.

Wednesday, October 3, 2012

Oracle: nested query with data set extension in FROM clause

Oracle (as well as MS SQL and some others) allows nested queries in FROM clause of the SELECT statement. Being combined with data set extension using UNION keyword, this can greatly simplify simplify complex table joins.

Let's imagine we have two simple tables:

PERSONS
id name
1 John Smith
2 Mike Douglas

COMPUTERS
id name main_user_id
1 Computer1 1
2 Computer2 2
3 Computer3 NULL

Now we want to have a full list of computers with appropriate main user names.

SELECT c.name as "computer", p.name as "user"
FROM COMPUTERS c, PERSONS p
WHERE c.main_user_id = p.id

> computer user
> Computer1 John Smith
> Computer2 Mike Douglas

To get full list of computers we could use outher join:

SELECT c.name as "computer", p.name as "user"
FROM COMPUTERS c, PERSONS p
WHERE c.main_user_id = p.id (+)

> computer user
> Computer1 John Smith
> Computer2 Mike Douglas
> Computer3 NULL

Or, we can use a nested SELECT to extend PERSONS table by an "unknown" user in the context of this query:

SELECT c.name as "computer", p.name as "user"

FROM
COMPUTERS c,
(SELECT *
FROM PERSONS
UNION
SELECT 0 AS "id", 'unknown' AS "name"
FROM DUAL) p
WHERE c.main_user_id = p.id

> computer user
> Computer1 John Smith
> Computer2 Mike Douglas
> Computer3 unknown

This is much more verbose way in comparison to simple outher join, but it gives us additional level of control on the input data and actually simplifies WHERE clause.