Thoughts on…

Java Middleware & Systems Management

Resource Group Versatility

leave a comment »

When you first download and install RHQ, you’ll log in to the web console and notice that there are two different types of grouping constructs for resources – mixed and compatible. In short, compatible groups must contain the same types of resources, whereas mixed groups do not. Under the covers, these are implemented by the exact same construct, but how meaning has been applied to them, and what you can do with each of them, is why this blog got the title it did.

Mixed groups are predominantly used for security, in particular, authorization. With them you can put all sorts of resources together – Windows and Linux platforms, IIS and Apache servers, etc. Then, you can attach that mixed group to a role, and any users in that role will be able to see those resources.

If you want to be able to give someone access to an entire box, then create a mixed group with the “recursive” option enabled. By turning that option on, any resource you add to the group automatically adds all descendant resources to the group as well. For instance, if you add a platform, it will indirectly add all servers under that platform, as well as all services under all of those servers, and so on.

While mixed groups have one thing they’re good at, compatible groups have an array of functionality they excel at providing. First and foremost is their “compatibility” with all of the other subsystems RHQ provides: monitoring, configuration, operations, etc.

For monitoring, RHQ shows aggregate and average metrics across the group members. For configuration, RHQ enables you to change the configured connection properties across everybody in the group at the same time. For operations, RHQ allows you to execute the same operation against all resources in the group – at the same time, or serially (one after the other, in rolling fashion).

Very recently, a customer pointed out to me how groups – mixed and compatible – can be used in a novel way. Their question was simple: what’s the easiest method to see all of the resources in their environment that are down?

In order to do this today, you have to use the Browse Resources page, go to the each tab in turn – platforms, servers, and services – and sort on the availability column. Granted, this is fairly easy to do and doesn’t take all that long, but wouldn’t it be nice to be able to automatically create a group that contained any and all resources that were down in the system.

OK, maybe you’re initial thought is “why not just use the Problem Resources portlet?” Well, a ‘problem’ resource isn’t necessarily one that is down. If you have ANY alerts, or if you have metrics that are more than 5% outside of their baseline range (a running average calculated over time automatically by RHQ), the resource will also show up in this portlet. This customer JUST wanted the unavailable resources.

Alright, and maybe your second thought was “well, why not use alerts?” Today, we can fire alerts when a resource goes down, and you CAN use the notification mechanism so that you get an email when this happens. However, there are at least two problems with this strategy:

Problem 1

Alerts are only good at telling you what JUST happened in the system. Alerts will be created as the result of some agent sending data up to the server, such as an availability report or the results of an operation. So, if you already have resources that are down before you set up your alert definitions, you will not be notified because those resources were already down.

Problem 2

Setting up availability alerts across ALL resources in the system will take a while. A lot of time could be saved by using the alert templates feature (Administration > Monitoring Defaults), which would make sure that all existing resources (and any resources that are imported in the future) automatically have alert definitions created for them. However, you’d still have to set up one template across every single resource type in the system, and so depending on how many plugins you have installed could be several dozen templates to create. Also, for each of those alert templates, you’d have to setup identical notification rules too, which takes more time still.

Interestingly enough, before I could even reply to the customer, they suggested a solution – a feature enhancement, to be precise – which would do the trick. They wanted to extend DynaGroups to be able to aggregate resources by availability.

I was floored by the simplicity of this suggestion. In fact, I sort of recall rubbing my eyes looking to wake up from a dream, because I thought it was so incredible that the development team hadn’t thought of this before. And I wasted no time creating the issue in JIRA to track this request.

Anyone that knows me probably already guessed I had the fix locally within an hour, but because the request came in during the final seconds just before the 1.1 release I held off on committing it. Though, as soon as SVN was unlocked for 1.2 development it was one of the first commits.

If you’re building off of trunk (or running anything rev1730 or greater), it’s easy to create a Group Definition that will always keep a DynaGroup populated with the resources that are unavailable.

resource.availability = DOWN

But let’s say you are monitoring a very large inventory, and want to break things down further to keep the groups more granular. For example, let’s say you wanted to create different DynaGroups for each type of resource that’s down. This way you can look at your IIS servers that have failed, independent from your Apache vhosts that aren’t up, separate from your File Systems that aren’t at their expected mount points. That expression set would be as follows:

resource.availability = DOWN
groupby resource.type.plugin
groupby resource.type.name

But maybe that creates too many groups, or gives you results for resource types you aren’t interested in. Let’s say you want to focus your search because you only care about one specific type of resource failing, maybe just your Apache servers. Instead of grouping by the plugin and resource type, specify those pieces of information exactly:

resource.availability = DOWN
resource.type.plugin = Apache
resource.type.name = Apache HTTP Server

Thus, in a roundabout way, resources groups can actually be used as indirect tools for monitoring the health of your platforms, servers, and services.

This, however, just scratches the surface in terms of how groups can be used to monitor your enterprise. One major focus for the 1.2 release of RHQ is going to be on cluster management. Remember, compatible groups serve as a natural way of exposing RHQ subsystems at the group-level. So expect to see lots of new group-level services and UI functionality.

At the time of this writing, the requirements for cluster support were in their infancy, but we encourage you to read the latest requirements and post your ideas back to the resource clustering thread in the forums.

Advertisements

Written by josephmarques

October 17, 2008 at 7:28 pm

Posted in rhq

Tagged with

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: