Thoughts on…

Java Middleware & Systems Management

Archive for March 2010

Performance Monitoring using Hibernate

with 13 comments

Performance monitoring is, by its very nature, a slippery slope. Once you start looking for inefficiencies in code, it’s easy to get carried away wanting to optimize every last line for supreme efficiency. It’s thus important to balance the value of an optimization versus the time spent investigating the proper fix, the effort and difficulty required to implement it, and its importance relative to other things on the roadmap.

Every environment, every team, and every piece of software is different – so I won’t dare try to formulate any hard and fast rules for what is an appropriate or inappropriate amount of optimization. Suffice it to say, before any decisions can be made as to whether an optimization should be implemented, the nature of the problem must be understood. Below I will show how to use functionality built into Hibernate to help understand where the performance issues in your application lie as well as the nature of each.

—–

The first thing to do is to turn on Hibernate Statistics.

Once enabled, this service will start recording metrics on a per session factory basis. However, we’re going to want to collect “windows” of information from the statistics object. These windows will help us answer questions such as how many entities were loaded during a particular session bean method call or how many database roundtrips were required to render an entire page of the UI.

So let’s implement a kind of “stop watch” that will take a snapshot of some of the more interesting metrics offered through the statistics API. We’ll record the values of the metrics at two different points in time, and then calculate the delta metrics for hand-off to a logger or some analyzer.

public class HibernateStatisticsStopWatch {

    private org.hibernate.stat.Statistics stats;

    // some interesting metrics
    long queryExecutions;
    long transactions;
    long entityLoads;
    long connects;
    long time;

    public HibernateStatisticsStopWatch() {
        this.stats = // get statistics service from the MBeanServer
    }

    public void start() {
        queryExecutions = -stats.getQueryExecutionCount();
        transactions = -stats.getTransactionCount();
        entityLoads = -stats.getEntityLoadCount();
        connects = -stats.getConnectCount();
        time = -System.currentTimeMillis();
    }

    public void stop() {
        queryExecutions += stats.getQueryExecutionCount();
        transactions += stats.getTransactionCount();
        entityLoads += stats.getEntityLoadCount();
        connects += stats.getConnectCount();
        time += System.currentTimeMillis();
    }

    // getter methods for various delta metrics

    public String toString() {
        return "Stats" 
            + "[ queries=" + queryExecutions 
            + ", xactions=" + transactions 
            + ", loads=" + entityLoads 
            + ", connects=" + connects 
            + ", time=" + time + " ]";
    }
}

Since the idea is to instrument all different parts of the codebase with these stopwatches, things could get rather hairy and difficult to manage over time. That’s why it’s prudent to create a central authority to manage them. The centralization will make it easy to turn off/on the logging across the system, or even enable/disable the stopwatches themselves.

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.atomic.AtomicLong;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

public class HibernatePerformanceMonitor {
    private static final Log log = LogFactory.getLog(HibernatePerformanceMonitor.class);

    private ConcurrentMap<Long, HibernateStatisticsStopWatch> watches;
    private AtomicLong idGenerator = new AtomicLong(0);
    private static HibernatePerformanceMonitor singleton = 
        new HibernatePerformanceMonitor();

    private HibernatePerformanceMonitor() {
        watches = new ConcurrentHashMap<Long, HibernateStatisticsStopWatch>();
    }

    public static HibernatePerformanceMonitor get() {
        return singleton;
    }

    public long start() {
        if (log.isDebugEnabled()) {
            HibernateStatisticsStopWatch watch = new HibernateStatisticsStopWatch();
            long id = idGenerator.incrementAndGet();
            watches.put(id, watch);
            watch.start();
            return id;
        }
        return 0;
    }

    public void stop(long id, String callingContext) {
        if (log.isDebugEnabled()) {
            HibernateStatisticsStopWatch watch = watches.remove(id);
            if (watch == null) {
                // could happen if debugging was enabled after start() was called
                return;
            }
            watch.stop();
            String caller = (callingContext == null ? "(unknown)" : callingContext);
            log.debug(watch.toString() + " for " + caller);
        }
    }
}

With these two items in place – the stopwatch class and the centralized, thread-safe manager for them – we’re now ready to instrument the application. If you want to monitor the performance of any particular block of code, all you have to do is:

long correlationId = HibernatePerformanceMonitor.get().start();
...
// some XYZ code block that appears not to perform well
...
HibernatePerformanceMonitor.get().stop(correlationId, "XYZ Code Block");

This, however, could get tedious if you had to instrument a few dozen places like this all over the code base. More problematic is the fact that you have to be nearly clairvoyant to know where issues might be without either seeing significant slowness in your application during normal QA, seeing performance degradation during scale testing, or having customer-filed support tickets.

Therefore, it would be nice to get an overall profile of the application first, before digging down into specific parts and wrapping smaller code blocks to see how they perform in isolation. Since RHQ‘s backend is written as a set of stateless session bean services, I can simply add an interceptor to get metrics for all method invocations across the entire system:

public class PerformanceMonitorInterceptor {
    @AroundInvoke
    public Object monitorHibernatePerformance(InvocationContext context) 
    throws Exception {
        String callingContext = 
                context.getMethod().getDeclaringClass().getSimpleName() + "." + 
                context.getMethod().getName();
        long monitorId = HibernatePerformanceMonitor.get().start();
        Object results = context.proceed();
        HibernatePerformanceMonitor.get().stop(monitorId, callingContext);
        return results;
    }
}

Make sure you add this interceptor declaration to your ejb-jar.xml file:

<ejb-jar>
   ...
   <assembly-descriptor>
      <interceptor-binding>
         <ejb-name>*</ejb-name>
         <interceptor-class>your.package.PerformanceMonitorInterceptor</interceptor-class>
      </interceptor-binding>
   </assembly-descriptor>
</ejb-jar>

Once this is all deployed, all you have to do is start clicking around your application and the performance information will start flowing to the server logs. Below is a snapshot of what is produced when visiting the inventory browser in RHQ:

01:04:03,165 DEBUG [HibernatePerformanceMonitor] 
    Stats[ queries=2, xactions=0, loads=0, connects=2, time=2 ]
    for ResourceGroupManagerBean.getResourceGroupCountByCategory 
01:04:03,166 DEBUG [HibernatePerformanceMonitor] 
    Stats[ queries=2, xactions=0, loads=0, connects=2, time=1 ]
    for GroupDefinitionManagerBean.getGroupDefinitionCount 
01:04:03,167 DEBUG [HibernatePerformanceMonitor] 
    Stats[ queries=13, xactions=0, loads=0, connects=13, time=12 ] 
    for ResourceBossBean.getInventorySummary 

Although these metrics will help you to determine if any individual SLSB method is a bottleneck, it is difficult to know how these methods are composed together to render larger functional chucks of your application. Since RHQ uses JSF/Facelets, we can capture the end-to-end cost of rendering a View with a ViewHandler:

public class PerformanceMonitorViewHandler extends FaceletViewHandler {
    ...
    @Override
    public void renderView(FacesContext context, UIViewRoot viewToRender) 
    throws IOException, FacesException {
        String callingContext = "URL " + viewToRender.getViewId();
        long monitorId = HibernatePerformanceMonitor.get().start();
        super.renderView(context, viewToRender);
        HibernatePerformanceMonitor.get().stop(monitorId, callingContext);
    }
}

Make sure you add this view handler declaration to your faces-config.xml file:

<faces-config ...>
   <application>
      <view-handler>your.package.PerformanceMonitorViewHandler</view-handler>
   </application>
</faces-config>

Now as you click around the application, the URL performance metrics appear end the end of each block of method-level metrics:

...
01:04:03,167 DEBUG [HibernatePerformanceMonitor] 
    Stats[ queries=13, xactions=0, loads=0, connects=13, time=12 ] 
    for ResourceBossBean.getInventorySummary 
...
2010-03-24 03:32:51,674 DEBUG [HibernatePerformanceMonitor] 
    Stats[ queries=19, xactions=27, loads=46, connects=64, time=622 ] 
    for URL /rhq/inventory/browseResources.xhtml 

OK, great. We have all of these performance metrics. Now what do they mean? How does one sift through the hundreds (if not thousands) of services calls to pinpoint application bottlenecks? Well…instead of paging through all of these raw metrics, it’s a lot easier to come up with a set of performance “equations”. Once you formulate these application-specific patterns to look for potential issues, you can trim down the log file by recording only the instances where certain thresholds were violated.

For instance, you might want to look for “slowness” in your application, which could be defined as some URL that takes longer than 3 seconds to load end-to-end, or an individual method that takes more than 500 milliseconds to return. This is easily obtained because the stopwatch records the amount of time between the calls to start() and stop(), and we’ve instrumented our codebase to wrap all SLSB method calls as well as all JSF views.

Another thing to look out for is the classic “N+1 selects” problem. Luckily, instances of this can be found by comparing the ratio of connects to query executions and entity loads. If you have many more connects than queries / loads (say 5 connects or more for each query / load) it’s likely the method in question suffers from the N+1 problem and is making many more trips to the database than should be necessary.

Fortunately, the “N+1 selects” problem is not the only issue that can be determined using derivative metrics. It’s also possible to find excessive transactions by comparing the ratio of transactions to query executions / entity loads. If you find many more transactions than queries / loads, chances are you have a method already inside a transaction that is calling into other methods that start their own transactions. This may in fact by the intended semantic, but not always. Regardless, if the inner methods are called in tight loops, the number of transactions required to carry out a single business workflow can explode. So obtaining this derivative metric can help determine whether your SLSB methods need to be refactored to carry out the business workflows in more appropriately sized transactional chunks. In general, though, it’s a good idea to be on the lookout for any single SLSB method that uses excessive transactions, regardless of the number of queries / loads.

All of these rules can be captured in just a few lines of code. Instead of logging every single method / URL in the HibernatePerformanceMonitor, can we instead narrow our focus by logging only those invocations that violate our rules:

public class HibernatePerformanceMonitor {
    ...
    public void stop(long id, String logPrefix) {
        if (log.isDebugEnabled()) {
            HibernateStatisticsStopWatch watch = watches.remove(id);
            if (watch == null) {
                return; // could happen if debugging was turned on and the start() call was already skipped
            }
            watch.stop();
            if (watch.getTransactions() > 10) {
                debug(watch, "(too many xactions?)", logPrefix);
            }
            if (watch.getQueryExecutions() != 0) {
                if ((watch.getConnects() / (double) watch.getQueryExecutions()) >= 5.0) {
                    // might indicate need for LEFT JOIN FETCHes
                    debug(watch, "(N+1 issue?) ", logPrefix); 
                }
                if ((watch.getTransactions() / (double) watch.getQueryExecutions()) >= 5.0) {
                    // might indicate excessive @REQUIRES_NEW
                    debug(watch, "(xaction nesting?) ", logPrefix); 
                }
            }
            if (watch.getTime() > 3000) {
                // might indicate inefficient query or table contention
                debug(watch, "(slowness?) ", logPrefix); 
            }
        }
    }

    private void debug(HibernateStatisticsStopWatch watch, String cause, String logPrefix) {
        String callingContext = " for " + (logPrefix == null ? "(unknown)" : logPrefix);
        log.debug(watch.toString() + cause + callingContext);
    }
}

It should be noted that these equations are only a starting off point. First, the thresholds they use must be tweaked to more appropriately match performance targets for your application…but even then they will not find every bottleneck in the system. Also, just because some of your business methods or UI pages violate these thresholds does not mean they are necessarily candidates for improvement. It may be the case that a particular page is known to consistently take 20 seconds to render, but it might be OK if it represents an executive report which aggregates lots of data across many subsystems. Another page might seem to use excessive transactions, but that might be OK if it is a system dashboard whose individual portlets are isolated from one another and request data independently.

So I leave it up to the reader to build on this solution and come up with increasingly sophisticated rules that operate off of application-specific thresholds and more accurately track potential issues within their environment.

Written by josephmarques

March 24, 2010 at 6:40 pm

Posted in hibernate

Tagged with

Git – Local Tracking Branches

leave a comment »

Whenever I clone a new copy of the RHQ repository from git, I’m placed in a local tracking branch of master. However, I rarely (if ever) want to work directly in master. Instead, I want to work in one of the numerous features-branches we have for the project.

I could create local tracking branches for the lines of development I’m interested in following, but that would be tedious considering there are nearly 3 dozen to choose from these days. Not only that, but I would need to repeat those commands each time I clone the repository (which I sometimes do so I can work in two different branches simultaneously as opposed to using ‘git stash’ and flipping back and forth between them), not to mention that I’d need to repeat those commands on all machines I work with (my home desktop, my work desktop, and my work laptop).

So I figured I would automate this by writing a quick shell script that would interrogate git, gather the lists of remote branches, and create local tracking branches automatically. I wanted to write the script such that it would set up local tracking branches for any newly created remote branches since the last time the script was run. Also, to keep things simple, I wanted the name of my local tracking branches to mirror the names of the remote branches. With those things in mind, I came up with:

#!/bin/sh

if [[ $# -ne 1 ]]
then
   echo "Usage: $0 "
   exit 127
fi

cd $1

# get all remotes except HEAD
remotes=`git branch -r | grep -v HEAD`
locals=`git branch -l`

locals_tracked=0;
total_remotes=0;
for remote_branch in $remotes 
do
   let "total_remotes++"

   # strip 'origin/' off of the remote_branch URL form
   local_branch=${remote_branch##origin/}
   
   already_had_local_branch=0
   for existing_local_branch in $locals 
   do
      if  [[ $existing_local_branch == $local_branch ]] 
      then
         already_had_local_branch=1
         break
      fi
   done

   if [[ $already_had_local_branch == 0 ]] 
   then
      git branch --track $local_branch $remote_branch
      let "locals_tracked++"
   else
      echo "Already had local branch '$local_branch'"
   fi
done

echo "Found $total_remotes remote branches"
echo "Created $locals_tracked local tracking branches"

And here is how I use it. First, I clone the RHQ repository:

[joseph@marques-redhat] git clone ssh://git.fedorahosted.org/git/rhq/rhq.git rhq-repo

Next I run my script, passing as an argument the name of the directory I just created to store my local copy of the repository:

[joseph@marques-redhat] ./git-setup-locals.sh rhq-repo

The script will show you output along the lines of:

Branch agentPlugin set up to track remote branch agentPlugin from origin.
...
Already had local branch 'master'
...

And finally print some summary information at the bottom:

Found 34 remote branches
Created 33 local tracking branches

So what is this good for? Well, it enables me to quickly bounce around multiple lines of development with ease (using ‘git checkout <branch>’) and without having to worry about whether or not I have a local tracking branch setup for that feature-branch yet.

—–

In the grand schema of things, this single issue isn’t a huge win. However, if you’re not mindful of the little things, enough of them can add up over time and start to have a real effect on productivity. So I tend to automate these things sooner rather than later so that I can forget about them and more easily focus on the code itself or the business problem at hand.

Written by josephmarques

March 17, 2010 at 3:46 pm

Posted in scm

Tagged with

Community Interaction in Open Source

with 16 comments

I noticed this morning, through the built-in update mechanism, that Firefox 3.6 was available for consumption (this was on my windows box). So I quickly stopped what I was doing, clicked on the button that was already staring me in the face, and about 15 seconds later (after restarting the browser) I was up and running with the latest and greatest.

I twisted my chair around to my F12 box, switched to my running Firefox instance, and opened the Help menu to force the update immediately…only to be reminded that the Fedora distribution has explicitly disabled the built-in update feature. Duh, of course — I’m supposed to be getting my updates from yum. OK, no biggie. I whip open a terminal and type ‘yum update firefox’ only to realize that my currently installed 3.5.8 is the latest and greatest available. After a few moments, my disappointment wore off and I realized this made perfect sense. If FF 3.6 was just recently made available through the built-in mechanism, I gotta give the channels a few days (at the very least) to catch up.

Out of curiosity, though, I wanted to see if someone from the community could give me a ballpark estimate on when FF 3.6 might arrive. I hopped on freenode and joined #fedora. Here’s what happened:

me: anyone know when the firefox 3.6 update will be available through yum?
guy1: when it is built

I paused for a few moments, wondering if he was being gruff (on purpose) or just trying to correct how I phrased my question to more closely match the proper vernacular I should have used. So I tried again…

me: ok, anyone know when the firefox 3.6 update will be built?
guy2: me: enough. No.

Well, I guess I knew how the first answer was supposed to be interpreted. Nonetheless, I felt inclined to finish on a somewhat positive note…

me: great, appreciate the pleasant community responses.  have a great day!

Looking back over that short interchange, I wonder whether it was really too hard just to say “sorry, we’re not sure when that will be prioritized into official channels.” Even something along the lines of “no one is working on that currently” would have been more descriptive and helpful. If they absolutely felt the need to save on keystrokes then “unsure” would have been sufficient.

I discussed what transpired with a friend of mine, and we agreed how strange it is that open source people (not all of them, but certainly too many of them IMO) feel they have a license to be jerks. The fact that I was tersely dismissed, however, was not the most frustrating part of this interaction. I wasn’t even all that disappointed to learn that Firefox 3.6 wasn’t available on F12 (yet). No – the most disheartening part of this interchange is that I…am…Red Hat.

Despite the fact that I’ll soon by entering my 5th year here working alongside Shadowman, my community experiences have largely been constrained to the Red Hat-sponsored project I work on — RHQ. Despite the daily stresses that I or any of my teammates have had to deal with internally (in our quest to put out a great release each and every time) we always try to be pleasant, helpful, and encouraging to our community through the public forums / IRC / mailing lists. As a result, I blindly assumed that others under the same umbrella would be acting similarly.

If this kind of interchange had happened with some other company, I could have easily written it off as “<those guys> are jerks.” But it stung when I realized that this is how some individuals in a Red Hat-sponsored community were acting. Whether or not guy1 or guy2 are actually employed by Red Hat is irrelevant here. If someone on my community forums, or in my community IRC, or on my community mailing list gave a response like that to a community member…actions would be taken to correct that behavior. Open source is supposed to be about the free exchange of ideas…the theory that we can compete with the big corporations because our collaboration model is superior. But if people see simple questions like “is anyone working on <foo> yet?” being dismissed, then I can’t imagine they would feel all that comfortable about asking their own questions. And that’s just shameful.

Written by josephmarques

March 15, 2010 at 11:32 am

Posted in opensource

Tagged with