Thoughts on…

Java Middleware & Systems Management

Archive for the ‘hibernate’ Category

Performance Monitoring using Hibernate

with 13 comments

Performance monitoring is, by its very nature, a slippery slope. Once you start looking for inefficiencies in code, it’s easy to get carried away wanting to optimize every last line for supreme efficiency. It’s thus important to balance the value of an optimization versus the time spent investigating the proper fix, the effort and difficulty required to implement it, and its importance relative to other things on the roadmap.

Every environment, every team, and every piece of software is different – so I won’t dare try to formulate any hard and fast rules for what is an appropriate or inappropriate amount of optimization. Suffice it to say, before any decisions can be made as to whether an optimization should be implemented, the nature of the problem must be understood. Below I will show how to use functionality built into Hibernate to help understand where the performance issues in your application lie as well as the nature of each.

—–

The first thing to do is to turn on Hibernate Statistics.

Once enabled, this service will start recording metrics on a per session factory basis. However, we’re going to want to collect “windows” of information from the statistics object. These windows will help us answer questions such as how many entities were loaded during a particular session bean method call or how many database roundtrips were required to render an entire page of the UI.

So let’s implement a kind of “stop watch” that will take a snapshot of some of the more interesting metrics offered through the statistics API. We’ll record the values of the metrics at two different points in time, and then calculate the delta metrics for hand-off to a logger or some analyzer.

public class HibernateStatisticsStopWatch {

    private org.hibernate.stat.Statistics stats;

    // some interesting metrics
    long queryExecutions;
    long transactions;
    long entityLoads;
    long connects;
    long time;

    public HibernateStatisticsStopWatch() {
        this.stats = // get statistics service from the MBeanServer
    }

    public void start() {
        queryExecutions = -stats.getQueryExecutionCount();
        transactions = -stats.getTransactionCount();
        entityLoads = -stats.getEntityLoadCount();
        connects = -stats.getConnectCount();
        time = -System.currentTimeMillis();
    }

    public void stop() {
        queryExecutions += stats.getQueryExecutionCount();
        transactions += stats.getTransactionCount();
        entityLoads += stats.getEntityLoadCount();
        connects += stats.getConnectCount();
        time += System.currentTimeMillis();
    }

    // getter methods for various delta metrics

    public String toString() {
        return "Stats" 
            + "[ queries=" + queryExecutions 
            + ", xactions=" + transactions 
            + ", loads=" + entityLoads 
            + ", connects=" + connects 
            + ", time=" + time + " ]";
    }
}

Since the idea is to instrument all different parts of the codebase with these stopwatches, things could get rather hairy and difficult to manage over time. That’s why it’s prudent to create a central authority to manage them. The centralization will make it easy to turn off/on the logging across the system, or even enable/disable the stopwatches themselves.

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.atomic.AtomicLong;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

public class HibernatePerformanceMonitor {
    private static final Log log = LogFactory.getLog(HibernatePerformanceMonitor.class);

    private ConcurrentMap<Long, HibernateStatisticsStopWatch> watches;
    private AtomicLong idGenerator = new AtomicLong(0);
    private static HibernatePerformanceMonitor singleton = 
        new HibernatePerformanceMonitor();

    private HibernatePerformanceMonitor() {
        watches = new ConcurrentHashMap<Long, HibernateStatisticsStopWatch>();
    }

    public static HibernatePerformanceMonitor get() {
        return singleton;
    }

    public long start() {
        if (log.isDebugEnabled()) {
            HibernateStatisticsStopWatch watch = new HibernateStatisticsStopWatch();
            long id = idGenerator.incrementAndGet();
            watches.put(id, watch);
            watch.start();
            return id;
        }
        return 0;
    }

    public void stop(long id, String callingContext) {
        if (log.isDebugEnabled()) {
            HibernateStatisticsStopWatch watch = watches.remove(id);
            if (watch == null) {
                // could happen if debugging was enabled after start() was called
                return;
            }
            watch.stop();
            String caller = (callingContext == null ? "(unknown)" : callingContext);
            log.debug(watch.toString() + " for " + caller);
        }
    }
}

With these two items in place – the stopwatch class and the centralized, thread-safe manager for them – we’re now ready to instrument the application. If you want to monitor the performance of any particular block of code, all you have to do is:

long correlationId = HibernatePerformanceMonitor.get().start();
...
// some XYZ code block that appears not to perform well
...
HibernatePerformanceMonitor.get().stop(correlationId, "XYZ Code Block");

This, however, could get tedious if you had to instrument a few dozen places like this all over the code base. More problematic is the fact that you have to be nearly clairvoyant to know where issues might be without either seeing significant slowness in your application during normal QA, seeing performance degradation during scale testing, or having customer-filed support tickets.

Therefore, it would be nice to get an overall profile of the application first, before digging down into specific parts and wrapping smaller code blocks to see how they perform in isolation. Since RHQ‘s backend is written as a set of stateless session bean services, I can simply add an interceptor to get metrics for all method invocations across the entire system:

public class PerformanceMonitorInterceptor {
    @AroundInvoke
    public Object monitorHibernatePerformance(InvocationContext context) 
    throws Exception {
        String callingContext = 
                context.getMethod().getDeclaringClass().getSimpleName() + "." + 
                context.getMethod().getName();
        long monitorId = HibernatePerformanceMonitor.get().start();
        Object results = context.proceed();
        HibernatePerformanceMonitor.get().stop(monitorId, callingContext);
        return results;
    }
}

Make sure you add this interceptor declaration to your ejb-jar.xml file:

<ejb-jar>
   ...
   <assembly-descriptor>
      <interceptor-binding>
         <ejb-name>*</ejb-name>
         <interceptor-class>your.package.PerformanceMonitorInterceptor</interceptor-class>
      </interceptor-binding>
   </assembly-descriptor>
</ejb-jar>

Once this is all deployed, all you have to do is start clicking around your application and the performance information will start flowing to the server logs. Below is a snapshot of what is produced when visiting the inventory browser in RHQ:

01:04:03,165 DEBUG [HibernatePerformanceMonitor] 
    Stats[ queries=2, xactions=0, loads=0, connects=2, time=2 ]
    for ResourceGroupManagerBean.getResourceGroupCountByCategory 
01:04:03,166 DEBUG [HibernatePerformanceMonitor] 
    Stats[ queries=2, xactions=0, loads=0, connects=2, time=1 ]
    for GroupDefinitionManagerBean.getGroupDefinitionCount 
01:04:03,167 DEBUG [HibernatePerformanceMonitor] 
    Stats[ queries=13, xactions=0, loads=0, connects=13, time=12 ] 
    for ResourceBossBean.getInventorySummary 

Although these metrics will help you to determine if any individual SLSB method is a bottleneck, it is difficult to know how these methods are composed together to render larger functional chucks of your application. Since RHQ uses JSF/Facelets, we can capture the end-to-end cost of rendering a View with a ViewHandler:

public class PerformanceMonitorViewHandler extends FaceletViewHandler {
    ...
    @Override
    public void renderView(FacesContext context, UIViewRoot viewToRender) 
    throws IOException, FacesException {
        String callingContext = "URL " + viewToRender.getViewId();
        long monitorId = HibernatePerformanceMonitor.get().start();
        super.renderView(context, viewToRender);
        HibernatePerformanceMonitor.get().stop(monitorId, callingContext);
    }
}

Make sure you add this view handler declaration to your faces-config.xml file:

<faces-config ...>
   <application>
      <view-handler>your.package.PerformanceMonitorViewHandler</view-handler>
   </application>
</faces-config>

Now as you click around the application, the URL performance metrics appear end the end of each block of method-level metrics:

...
01:04:03,167 DEBUG [HibernatePerformanceMonitor] 
    Stats[ queries=13, xactions=0, loads=0, connects=13, time=12 ] 
    for ResourceBossBean.getInventorySummary 
...
2010-03-24 03:32:51,674 DEBUG [HibernatePerformanceMonitor] 
    Stats[ queries=19, xactions=27, loads=46, connects=64, time=622 ] 
    for URL /rhq/inventory/browseResources.xhtml 

OK, great. We have all of these performance metrics. Now what do they mean? How does one sift through the hundreds (if not thousands) of services calls to pinpoint application bottlenecks? Well…instead of paging through all of these raw metrics, it’s a lot easier to come up with a set of performance “equations”. Once you formulate these application-specific patterns to look for potential issues, you can trim down the log file by recording only the instances where certain thresholds were violated.

For instance, you might want to look for “slowness” in your application, which could be defined as some URL that takes longer than 3 seconds to load end-to-end, or an individual method that takes more than 500 milliseconds to return. This is easily obtained because the stopwatch records the amount of time between the calls to start() and stop(), and we’ve instrumented our codebase to wrap all SLSB method calls as well as all JSF views.

Another thing to look out for is the classic “N+1 selects” problem. Luckily, instances of this can be found by comparing the ratio of connects to query executions and entity loads. If you have many more connects than queries / loads (say 5 connects or more for each query / load) it’s likely the method in question suffers from the N+1 problem and is making many more trips to the database than should be necessary.

Fortunately, the “N+1 selects” problem is not the only issue that can be determined using derivative metrics. It’s also possible to find excessive transactions by comparing the ratio of transactions to query executions / entity loads. If you find many more transactions than queries / loads, chances are you have a method already inside a transaction that is calling into other methods that start their own transactions. This may in fact by the intended semantic, but not always. Regardless, if the inner methods are called in tight loops, the number of transactions required to carry out a single business workflow can explode. So obtaining this derivative metric can help determine whether your SLSB methods need to be refactored to carry out the business workflows in more appropriately sized transactional chunks. In general, though, it’s a good idea to be on the lookout for any single SLSB method that uses excessive transactions, regardless of the number of queries / loads.

All of these rules can be captured in just a few lines of code. Instead of logging every single method / URL in the HibernatePerformanceMonitor, can we instead narrow our focus by logging only those invocations that violate our rules:

public class HibernatePerformanceMonitor {
    ...
    public void stop(long id, String logPrefix) {
        if (log.isDebugEnabled()) {
            HibernateStatisticsStopWatch watch = watches.remove(id);
            if (watch == null) {
                return; // could happen if debugging was turned on and the start() call was already skipped
            }
            watch.stop();
            if (watch.getTransactions() > 10) {
                debug(watch, "(too many xactions?)", logPrefix);
            }
            if (watch.getQueryExecutions() != 0) {
                if ((watch.getConnects() / (double) watch.getQueryExecutions()) >= 5.0) {
                    // might indicate need for LEFT JOIN FETCHes
                    debug(watch, "(N+1 issue?) ", logPrefix); 
                }
                if ((watch.getTransactions() / (double) watch.getQueryExecutions()) >= 5.0) {
                    // might indicate excessive @REQUIRES_NEW
                    debug(watch, "(xaction nesting?) ", logPrefix); 
                }
            }
            if (watch.getTime() > 3000) {
                // might indicate inefficient query or table contention
                debug(watch, "(slowness?) ", logPrefix); 
            }
        }
    }

    private void debug(HibernateStatisticsStopWatch watch, String cause, String logPrefix) {
        String callingContext = " for " + (logPrefix == null ? "(unknown)" : logPrefix);
        log.debug(watch.toString() + cause + callingContext);
    }
}

It should be noted that these equations are only a starting off point. First, the thresholds they use must be tweaked to more appropriately match performance targets for your application…but even then they will not find every bottleneck in the system. Also, just because some of your business methods or UI pages violate these thresholds does not mean they are necessarily candidates for improvement. It may be the case that a particular page is known to consistently take 20 seconds to render, but it might be OK if it represents an executive report which aggregates lots of data across many subsystems. Another page might seem to use excessive transactions, but that might be OK if it is a system dashboard whose individual portlets are isolated from one another and request data independently.

So I leave it up to the reader to build on this solution and come up with increasingly sophisticated rules that operate off of application-specific thresholds and more accurately track potential issues within their environment.

Written by josephmarques

March 24, 2010 at 6:40 pm

Posted in hibernate

Tagged with

Hibernate Many-To-Many Revisited

with 12 comments

The modeling problem is classic: you have two entities, say Users and Roles, which have a many-to-many relationship with one another. In other words, each user can be in multiple roles, and each role can have multiple users associated with it.

The schema is pretty standard and would look like:

CREATE TABLE app_user ( 
   id INTEGER,
   PRIMARY KEY ( id ) );

CREATE TABLE app_role (
   id INTEGER,
   PRIMARY KEY ( id ) );

CREATE TABLE app_user_role ( 
   user_id INTEGER,
   role_id INTEGER,
   PRIMARY KEY ( user_id, role_id ),
   FOREIGN KEY ( user_id ) REFERENCES app_user ( id ),
   FOREIGN KEY ( role_id ) REFERENCES app_role ( id ) );

But there are really two choices for how you want to expose this at the Hibernate / EJB3 layer. The first strategy employs the use of the @ManyToMany annotation:

@Entity 
@Table(name = "APP_USER")
public class User {
    @Id
    private Integer id;
    
    @ManyToMany
    @JoinTable(name = "APP_USER_ROLE", 
       joinColumns = { @JoinColumn(name = "USER_ID") }, 
       inverseJoinColumns = { @JoinColumn(name = "ROLE_ID") })
    private Set<Role> roles = new HashSet<Role>();
}

@Entity 
@Table(name = "APP_ROLE")
public class Role {
    @Id
    private Integer id;
    
    @ManyToMany(mappedBy = "roles")
    private Set<User> users = new HashSet<User>();
}

The second strategy uses a set of @ManyToOne mappings and requires the creation of a third “mapping” entity:

public class UserRolePK {
    @ManyToOne
    @JoinColumn(name = "USER_ID", referencedColumnName = "ID")
    private User user;

    @ManyToOne
    @JoinColumn(name = "ROLE_ID", referencedColumnName = "ID")
    private Role role;
}

@Entity @IdClass(UserRolePK.class) 
@Table(name = "APP_USER_ROLE")
public class UserRole {
    @Id
    private User user;

    @Id
    private Role role;
}

@Entity 
@Table(name = "APP_USER")
public class User {
    @Id
    private Integer id;
    
    @OneToMany(mappedBy = "user")
    private Set<UserRole> userRoles;
}

@Entity 
@Table(name = "APP_ROLE")
public class Role {
    @Id
    private Integer id;
    
    @OneToMany(mappedBy = "role")
    private Set<UserRole> userRoles;
}

The most obvious pro for the @ManyToMany solution is simpler data retrieval queries. The annotation automagically generates the proper SQL under the covers, and allows access to data from the other side of the linking table with a simple join at the HQL/JPQL level. For example, to get the roles for some user:

SELECT r 
FROM User u 
JOIN u.roles r 
WHERE u.id = :someUserId

You can still retrieve the same data with the other solution, but it’s not as elegant. It requires traversing from a user to the userRoles relationship, and then accessing the roles associated with those mapping entities:

SELECT ur.role 
FROM User u 
JOIN u.userRoles ur 
WHERE u.id = :someUserId

The inelegance of the second strategy becomes clear if you had several many-to-many relationships that you needed to traverse in a single query. If you had to use explicit mapping entities for each join table, the query would look like:

SELECT threeFour.four
FROM One one 
JOIN one.oneTwos oneTwo 
JOIN oneTwo.two.twoThrees twoThree 
JOIN twoThree.three.threeFours threeFour
where one.id = :someId

Whereas using @ManyToMany annotations, exclusively, would result in a query with the following form:

SELECT four 
FROM One one 
JOIN one.twos two 
JOIN two.threes three 
JOIN threes.four 
WHERE one.id = :someId

Some readers might wonder why, if we have explicit mapping table entities, we don’t just use them directly to make the query a little more intelligible / human-readable:

SELECT threeFour.four
FROM OneTwo oneTwo, TwoThree twoThree, ThreeFour threeFour
WHERE oneTwo.two = twoThree.two
AND twoThree.three = threeFour.three
AND oneTwo.one.id = :someId

Although I agree this query may be slightly easier to understand at a glance (especially if you’re used to writing native SQL), it definitely doesn’t save on keystrokes. Aside from that, it starts to pull away from thinking about your data model purely in terms of its high-level object relations.

In a read-mostly system, where access to data is the most frequent operation, it just makes sense to use the @ManyToMany mapping strategy. It achieves the goal while keeping the queries as simple and straight forward as possible.

However, elegance of select-statements should not be the only point considered when choosing a strategy. The more elaborate solution using the explicit mapping entiies does have its merits. Consider the problem of having to delete users that have properties matching a specific condition, which due to the foreign keys also require deleting user-role relationships matching that same criteria:

DELETE UserRole ur 
WHERE ur.user.id IN ( 
   SELECT u 
   FROM User u 
   WHERE u.someProperty = :someInterestingValue );
DELETE User u WHERE u.someProperty = :someInterestingValue;

If the mapping entity did not exist, the role objects would have to be loaded into the session, traversed one at a time, and have all of their users removed…after which, the role objects themselves could be deleted from the system. If your application only had a handful of users that matched this condition, either solution would probably perform just fine.

But what if you had tens of millions of users in your system, and this query happened to match 10% of them? (OK, perhaps this particular scenario is a bit contrived, but there *are* plenty of applications out there where the number of many-to-many relationships order in the tens of millions or more.) The logic would have to load more than a million users across the wire from the database which, as a result, might require you to implement a manual batching mechanism. You would load, say, 1000 users into memory at once, operate on them, flush/clear the session, then load the next batch, and so on. Memory requirements aside, you might find the transaction takes too long or might even time-out. In this case, you would need to execute each of the batches inside its own transaction, driving the process from outside of a transactional context.

Unfortunately, the data-load isn’t the only issue. The actual deletion work has problems too. You’re going to have to, for each user in turn, remove all of its roles (e.g., “user.getRoles().clear()”) and then delete the user itself (e.g., “entityManager.remove(user)”). These operations translate into two native SQL delete statements for each matched user – one to remove the related entries from the app_user_role table, and the other to remove the user itself from the app_user table).

All of these performance issues stem from the fact that a large amount of data has to be loaded across the wire and then manipulated, which results in a number of roundtrips proportional to the number of rows that match the criteria. However, by creating the mapping entity, it becomes possible to execute everything in two statements, neither of which even load data across the wire.

So what’s the right solution? Well, the interesting thing about this problem space is that the two solutions described above are not mutually exclusive. There’s nothing that prevents you from using both of them simultaneously:

public class UserRolePK {
    @ManyToOne
    @JoinColumn(name = "USER_ID", referencedColumnName = "ID")
    private User user;

    @ManyToOne
    @JoinColumn(name = "ROLE_ID", referencedColumnName = "ID")
    private Role role;
}

@Entity @IdClass(UserRolePK.class) 
@Table(name = "APP_USER_ROLE")
public class UserRole {
    @Id
    private User user;

    @Id
    private Role role;
}

@Entity 
@Table(name = "APP_USER")
public class User {
    @Id
    private Integer id;
    
    @OneToMany(mappedBy = "user")
    private Set<UserRole> userRoles;
    
    @ManyToMany
    @JoinTable(name = "APP_USER_ROLE", 
       joinColumns = { @JoinColumn(name = "USER_ID") }, 
       inverseJoinColumns = { @JoinColumn(name = "ROLE_ID") })
    private Set<Role> roles = new HashSet<Role>();
}

@Entity 
@Table(name = "APP_ROLE")
public class Role {
    @Id
    private Integer id;
    
    @OneToMany(mappedBy = "role")
    private Set<UserRole> userRoles;
    
    @ManyToMany(mappedBy = "roles")
    private Set<User> users = new HashSet<User>();
}

This hybrid solution actually gives you the best of both worlds: elegant queries and efficient updates to the linking table. Granted, the boilerplate to set up all the mappings might seem tedious, but that extra effort is well worth the pay-off.

Written by josephmarques

February 22, 2010 at 12:52 pm

Posted in hibernate

Tagged with