Thoughts on…

Java Middleware & Systems Management

Archive for the ‘inventory’ Category

Resource Group Versatility

without comments

When you first download and install RHQ, you’ll log in to the web console and notice that there are two different types of grouping constructs for resources – mixed and compatible. In short, compatible groups must contain the same types of resources, whereas mixed groups do not. Under the covers, these are implemented by the exact same construct, but how meaning has been applied to them, and what you can do with each of them, is why this blog got the title it did.

Mixed groups are predominantly used for security, in particular, authorization. With them you can put all sorts of resources together – Windows and Linux platforms, IIS and Apache servers, etc. Then, you can attach that mixed group to a role, and any users in that role will be able to see those resources.

If you want to be able to give someone access to an entire box, then create a mixed group with the “recursive” option enabled. By turning that option on, any resource you add to the group automatically adds all descendant resources to the group as well. For instance, if you add a platform, it will indirectly add all servers under that platform, as well as all services under all of those servers, and so on.

While mixed groups have one thing they’re good at, compatible groups have an array of functionality they excel at providing. First and foremost is their “compatibility” with all of the other subsystems RHQ provides: monitoring, configuration, operations, etc.

For monitoring, RHQ shows aggregate and average metrics across the group members. For configuration, RHQ enables you to change the configured connection properties across everybody in the group at the same time. For operations, RHQ allows you to execute the same operation against all resources in the group – at the same time, or serially (one after the other, in rolling fashion).

Very recently, a customer pointed out to me how groups – mixed and compatible – can be used in a novel way. Their question was simple: what’s the easiest method to see all of the resources in their environment that are down?

In order to do this today, you have to use the Browse Resources page, go to the each tab in turn – platforms, servers, and services – and sort on the availability column. Granted, this is fairly easy to do and doesn’t take all that long, but wouldn’t it be nice to be able to automatically create a group that contained any and all resources that were down in the system.

OK, maybe you’re initial thought is “why not just use the Problem Resources portlet?” Well, a ‘problem’ resource isn’t necessarily one that is down. If you have ANY alerts, or if you have metrics that are more than 5% outside of their baseline range (a running average calculated over time automatically by RHQ), the resource will also show up in this portlet. This customer JUST wanted the unavailable resources.

Alright, and maybe your second thought was “well, why not use alerts?” Today, we can fire alerts when a resource goes down, and you CAN use the notification mechanism so that you get an email when this happens. However, there are at least two problems with this strategy:

Problem 1

Alerts are only good at telling you what JUST happened in the system. Alerts will be created as the result of some agent sending data up to the server, such as an availability report or the results of an operation. So, if you already have resources that are down before you set up your alert definitions, you will not be notified because those resources were already down.

Problem 2

Setting up availability alerts across ALL resources in the system will take a while. A lot of time could be saved by using the alert templates feature (Administration > Monitoring Defaults), which would make sure that all existing resources (and any resources that are imported in the future) automatically have alert definitions created for them. However, you’d still have to set up one template across every single resource type in the system, and so depending on how many plugins you have installed could be several dozen templates to create. Also, for each of those alert templates, you’d have to setup identical notification rules too, which takes more time still.

Interestingly enough, before I could even reply to the customer, they suggested a solution – a feature enhancement, to be precise – which would do the trick. They wanted to extend DynaGroups to be able to aggregate resources by availability.

I was floored by the simplicity of this suggestion. In fact, I sort of recall rubbing my eyes looking to wake up from a dream, because I thought it was so incredible that the development team hadn’t thought of this before. And I wasted no time creating the issue in JIRA to track this request.

Anyone that knows me probably already guessed I had the fix locally within an hour, but because the request came in during the final seconds just before the 1.1 release I held off on committing it. Though, as soon as SVN was unlocked for 1.2 development it was one of the first commits.

If you’re building off of trunk (or running anything rev1730 or greater), it’s easy to create a Group Definition that will always keep a DynaGroup populated with the resources that are unavailable.

resource.availability = DOWN

But let’s say you are monitoring a very large inventory, and want to break things down further to keep the groups more granular. For example, let’s say you wanted to create different DynaGroups for each type of resource that’s down. This way you can look at your IIS servers that have failed, independent from your Apache vhosts that aren’t up, separate from your File Systems that aren’t at their expected mount points. That expression set would be as follows:

resource.availability = DOWN
groupby resource.type.plugin
groupby resource.type.name

But maybe that creates too many groups, or gives you results for resource types you aren’t interested in. Let’s say you want to focus your search because you only care about one specific type of resource failing, maybe just your Apache servers. Instead of grouping by the plugin and resource type, specify those pieces of information exactly:

resource.availability = DOWN
resource.type.plugin = Apache
resource.type.name = Apache HTTP Server

Thus, in a roundabout way, resources groups can actually be used as indirect tools for monitoring the health of your platforms, servers, and services.

This, however, just scratches the surface in terms of how groups can be used to monitor your enterprise. One major focus for the 1.2 release of RHQ is going to be on cluster management. Remember, compatible groups serve as a natural way of exposing RHQ subsystems at the group-level. So expect to see lots of new group-level services and UI functionality.

At the time of this writing, the requirements for cluster support were in their infancy, but we encourage you to read the latest requirements and post your ideas back to the resource clustering thread in the forums.

Written by josephmarques

October 17, 2008 at 7:28 pm

Posted in dynagroups, inventory, rhq

Tagged with , ,

Physical Enterprise vs Logical Inventory

without comments

A recent chat with a colleague reminded me today how important it is to clearly distinguish between what’s in your enterprise and what’s in your inventory. There doesn’t exist an RHQ dictionary yet, so until then, the following entries will have to do:

  • Enterprise – refers to all the physical machines connected by wires and power cords, installed in the racks in your data center, or plugged into the wall under your feet
  • Inventory – refers to the list of logical “resources” discovered by your RHQ infrastructure via some plugin

When you fire up the web console and login, you need to keep in mind that what you’re viewing is an abstracted layer. The inventory represents the information your RHQ plugins collected and sent back up to the server. So when you want to make a change, you have to decide whether you mean to make that change on the physical or logical level.

Logical Changes

For instance, if you just want to suppress the information that RHQ discovered (likely because it found and auto-imported much more than you need to manage/monitor right now), then your inventory – not your enterprise – is what you want to change. From the resource browser, regardless of whether you’re looking at the platforms, servers, or services tab, there is an “uninventory” button at the bottom.

Clicking it will tell RHQ to remove all information it knows about it (and all of its child resources) from its datastore. You’re effectively telling RHQ that you don’t want to manage this resource anymore. As a consequence, you will also lose any and all audit trails for that resource (and its children). Audit items could be anything from the results of operations you performed against it to the list of alerts that fired because the resource met some trigger condition. Don’t forget, audit items also include the entire set of configuration changes you’ve made to these resources since they’ve been in inventory, etc.

Physical Changes

On the other hand, sometimes you actually want to make a change to your physical enterprise, whether it be adding some new user to an existing Postgres database, or uninstalling an old enterprise/web application archive (ear/war) from a JBoss Application Server. In both cases, you want to go to the inventory tab of the parent of the resource you want to manipulate.

To delete an item from your physical enterprise, simply select one of the children resources from the tabular set and click “delete”. This sends a request down to the agent managing that resource and performs the necessary operations required to remove that item from your enterprise. This, in turn, also removes the logical resource from your inventory, but that’s really just a convenience because RHQ knows that if the delete succeeds, the resource no longer exists, so there’s nothing left to manage/monitor about it.

Adding a new item to your enterprise is just as simple. At the bottom of the table you’ll see a combobox labeled “Create New”. It will be populated with all of the resource types the RHQ plugin managing this parent resource knows how to physically create in your enterprise. Select one of them, click the button labeled “Add”, and follow the various steps on the subsequent pages.

One last reminder…

I can’t emphasize enough how important it is to keep these two concepts separate. One deals with adding / removing meta-information from the RHQ datastore; another is basically a primitive form of provisioning. If you accidentally deleted a physical entity when you only meant to uninventory its logical resource, don’t bother asking for help on any forum because there’s nothing that can be done. The product did what you asked it to do; your data is gone. But that’s OK because you religiously keep backups…right?

Written by josephmarques

May 7, 2008 at 10:10 pm

Posted in inventory

Tagged with ,