Thoughts on…

Java Middleware & Systems Management

Posts Tagged ‘rhq

RHQ’s Powerful New Search Facility

leave a comment »

Search was developed to enable users to gain deeper insight, more quickly, into their enterprise by supporting a sophisticated method of querying system state. Some of the notable features that this powerful facility brings are:

  • Arbitrarily Complex Search Expressions
  • Search Suggestions / Auto-Completion / Search Assist
  • Results Matching / Highlighting
  • User Saved Searches

Take a look at the end-user docs (with screen shots) here. Or, if you want to interact with the Search facilities up close & personal, you can download the latest RHQ binaries here.

Considering the SearchBar was written with the primary purpose of being extensible, it will surely become a much more pervasive concept across RHQ in the future. So, let me know what you liked and/or what you would improve, especially if you can think of other features you’d like to see added. You can either post back here or subscribe to the RHQ developer mailing list.

Written by josephmarques

August 4, 2010 at 4:01 am

Posted in rhq

Tagged with

Jopr has embedded database support!

with 6 comments

Jopr now supports an embedded database option during installation (using H2 under the covers). This option, aside from preparing the embedded database, also configures an embedded agent running inside the same VM. This means it takes only a few keystrokes now to install an entire Jopr system, end-to-end.

The bits can be found here.

Quick install steps:

  • download & unzip the jopr-server-2.2.1.zip binary
  • start the server using scripts at the top-level bin dir
  • go to http://localhost:7080/
  • click the “Embedded Mode” button at the top at the page (this enables the embedded db *and* the embedded agent)
  • click the install button at the bottom of the page

Comments and questions welcome on the Jopr forums.

Note: the embedded database and embedded agent options are not intended production use. They were developed to facilitate quick installs so that new users could easily demo the product, or existing users get a quick overview of what features the latest distribution has to offer. At the time of this writing, there is no upgrade support when choosing the embedded mode option.

Written by josephmarques

June 3, 2009 at 5:15 pm

Posted in rhq

Tagged with

Don’t reinvent the wheel

with 2 comments

I can’t help but relish from time to time how powerful the Jopr / RHQ platform is. When you look at a product like Tomcat you might think that to write an end-to-end management framework around it would take a few man years at best.

Not so with Jopr / RHQ. Instead of spending time worrying about the ins and outs of how to move metric data around your network (and subsequently graph it), how to send a remote method execution down to your server, how to be notified when certain events take place on your servers, how to audit what users do and/or control who does what – you get all of that out-of-box with Jopr / RHQ.

So what does this all mean? It means that by leveraging a robust management platform like Jopr / RHQ you get to focus on what’s important – the servers you’re managing.

Jay Shaughnessy, a colleague of mine, provided the most recent proof of these claims. He wrote a plugin for end-to-end management of Tomcat. By leveraging Jopr / RHQ, he was able to accomplish an impressive amount in a relatively short period of time. But I’ll let him tell you about it…

http://jayshaughnessy.blogspot.com/2009/04/jopr-22-adds-tomcat-management.html

Written by josephmarques

May 1, 2009 at 9:09 pm

Posted in rhq

Tagged with

Jopr 2.2.0 released!

leave a comment »

Jopr 2.2.0 has finally been posted to SourceForge!

I’m happy to say that if you download the bits today, assuming you’ve been a user since it’s initial release, you may very well have trouble recognizing that you’re using the same product as before. Although this release was originally focused on just two items – enhancements to support cluster-oriented views and support for monitoring/managing external Tomcat instances – so much more was accomplished.

As you’ll see below, and as the 2.2 sneak peak clearly revealed, usability inadvertently became a focus of this release – a new menu bar with resource and group search functions, resource and group navigation trees with right-click context menus, and a sticky tabbing infrastructure are just some of the notable enhancements.

I’m especially pleased with the number of performance fixes that made it into this release. These weren’t just constrained to back-end processes or how the agents and servers communicate, but several people have been testing the 2.2.0 BETA and have noted they can feel the difference in the user interface. If there’s any one thing that would frustrate me about using a software product I’d have to say it’s how long I wait between when I click something in the UI and when I receive feedback about that action. So we hope you – the community – appreciate the time we took to improve that part of the overall experience for you.

For those that like details, I’ve again gone through JIRA and scanned the more than 600 issues resolved this release. Below is a summary of all of the noteworthy changes:

UI Enhancements

  • complete rewrite of event history and / monitoring UI pages from struts/tiles to jsf/facelets for resources, groups, and autogroups
  • addition of availability history page for resources
  • addition of xmas tree lights for autogroups and compatible groups
  • improve display of availability and membership counts for groups
  • brand new menu bar to replace old, flat link menu
  • brand new resource and group tree navigation components, with right-click context menu support
  • subsystem views – config updates, operation history, new oob metrics, alert definitions and alert history
  • several other markers/icons added to the overview>timeline page, and fixed it so it works in IE6
  • sticky tabs when navigating between resources and a split pane divider that remembers it’s relative percentage location
  • gave our tabbing infrastructure a facelift, moved from image rollovers to pure css, and are now using a wicked cool gradient
  • revamped resource and group favorites into the menu bar
  • support for recently visited resources and groups in the menu bar
  • auto-complete / search for resources and groups in the menu bar
  • several fixes for ops and alerts pages to improve usability
  • added new resource summary page
  • updated nearly all of our icons across the entire site
  • numerous IE6 and IE7 fixes

Notable Features

  • self-healing agents when they detect they are sick
  • clean up of authorization rules across several pages in the app
  • change to how authz is done – explicit res group denotes membership, while implicit res group is solely for authorization
  • dynagroup querying off of current res availability; also support for parent / grandparent / child context querying
  • support for dynagroup filtering and pivoting off of properties will NULL values
  • auto-recalculation of dynagroups, added some dynagroup “templates” too
  • proper use of XA transactions
  • config change detection + alerting off of that
  • support for aggregate config w/group res config & group plugin config implementations

Plugin Development / Deployment Improvements

  • standalone tomcat and EWS support
  • fix hot deploy of plugins
  • new plugin generator
  • pushed deploy of plugins to DB now, not just filesystem
  • lenient deployment of plugins whose dependency graph is not satisfied

Caching / Performance Tuning

  • fixed paginatedDataTables so they only load once (page-scoped caching)
  • all user preferences cached to the session
  • MICA icon generation scheme doesn’t require DB hits at all – all done via in-memory lookup
  • solved a dozen or so N+1 (or worse) query issues
  • revamped measurement out-of-bounds system
  • fine-grained updates for measurement baseline calculations
  • precompute of current availability for resources
  • replaced resource group SLSB methods with native sql solutions, supporting recursive rules
  • alerts cache rewritten to be near-lockless, and reloading cache doesn’t block readers
  • upgraded quartz library, resolved qrtz_table locking issues
  • using sigar proxy cache
  • async committal of measurement data on postgres
  • events throttling

So download it, try it out, and ping back here or on the Jopr forums if you have any questions.

Written by josephmarques

April 30, 2009 at 1:55 pm

Posted in rhq

Tagged with

Cluster Management

with 2 comments

For a majority of the past year the RHQ team has worked diligently to deliver a stable, extensible, and fault tolerant infrastructure for managing and monitoring your enterprise. Most of the focus has been on providing services at the individual resource level – a singular Apache install, a sole IIS instance, a solitary JBoss Application Server. Nowadays, however, the name of the game is clustering. Redundant servers, clustered services, data replication, cloud computing – all have a slightly different role in the game.

There are plenty of companies out there trying to get in on the action, who have high hopes of becoming formidable players. In a similar vein, the next release of RHQ promises to make its platform technology a force to reckon with when it comes to managing and monitoring clustered resources. There are several key facets to this (in no particular order):

• aggregate and average views of metric / measurement data
• operational scheduling and execution
• fine-grained configuration control

The first two of these are actually partially implemented. Today, if you create a compatible group (a resource group that only contains a single type of resource, e.g. only JBossAS instances) you can schedule and execute operations against the members in the group, in rolling fashion or simultaneously against all of them, as well as view aggregate and average metrics.

The shortcoming here, however, is that you need to explicitly add resources to a compatible group before you can perform these aggregate business services across them. Manually adding resources to the group would be absurd for anything but the smallest inventories. This is largely why DynaGroups exist. They were created with the sole purpose of being able to create vast numbers of groups according to flexible rules for how to partition your resources.

So why isn’t this good enough? It’s because DynaGroups can not create hierarchies of grouped resources, such as a group of HTTP Connectors under a group of Embedded Tomcat Servers under a group of JBoss Application Servers. And this group-wise, hierarchical navigation is where a lot of the power will be derived from.

If you could aggregate your JBossAS instances into a cluster group, and then have logically clustered servers and services automatically grouped beneath that, you could navigate the resource hierarchy very quickly and efficiently. Instead of having to view data from multiple contexts, a “cluster view” would aggregate the data into a single, navigable tree structure – a single context. You wouldn’t be bouncing back and forth between different pages; everything would be at your fingertips from a single, landing page.

A secondary benefit of building cluster views – instead of explicitly using compatible groups – is that you won’t clog up your resource browser with thousands of groups you might use only rarely, if ever.

Think about this: let’s say we have 60 physical hosts, 5 JBossAS instances on each of them, 3 instances to a logical cluster, which creates 100 (60*5/3) cluster groups. Let’s further say that on each instance we have 250 unique business services (this includes enterprise applications, session & entity beans, connectors, virtual hosts, datasources, queues / topics, etc). If we were to create explicit compatible groups for all nested servers and services under each of these 100 cluster groups, it would be another 25K (250*100) groups.

The cost of group creation is not the worry here; it’s the sheer volume of groups that would be shown in the application’s user interface. If you had a handful of compatible groups that you were previously using to manage your inventory, say a dozen or two, it would be much more difficult and frustrating to sift through one- to two-thousand times as much data. Granted, a very nice search interface could mitigate the situation, but it wouldn’t eliminate the underlying problem of unnecessary group creation.

So the solution migrates back to our cluster view concept. The team has come together several times over the past 2 weeks to flesh out the details, and the current functional and user interface requirements have been kept updated on the RHQ Project site here. If you have ideas above and beyond what you see there, we encourage you to speak up and let us know how we can make the new features and product improvements around resource clustering even better. If you want to be part of the action and you’re interested in becoming a contributor to the RHQ Project, look for my handle – joseph42 – in #rhq on freenode.

As always, post backs are greatly appreciated.

Written by josephmarques

October 24, 2008 at 7:59 am

Posted in rhq

Tagged with

One Step Closer…

leave a comment »

I played a little word association game with my team today. I asked them the following,

With respect to RHQ, when you hear the word ‘configuration’, what is the first word that comes to mind?

Though the answers were varied – resource, edit, product, settings, and properties – they were all perfectly in line with my suspicions. All were either synonyms of configuration, or actions you would take against something that is configurable.

So what’s wrong with that? Well, RHQ is a platform that, in a nutshell, performs systems management AND monitoring. However, these results show that people initially, predominantly, and perhaps only think of ‘management’ when they hear ‘configuration’.

Maybe the question wasn’t fair. I DID ask them to only give me one-word responses. Maybe they would’ve also mentioned monitoring had they had the ability to answer less pointedly.

Or…maybe not. More than half of the responses I got were NOT one-word answers. They were short phrases, or even multiple full sentences explaining why certain words came to mind. Granted, there is a chance the results could be skewed, because the mailing list I asked this question on has only a few dozen people subscribed to it; but it’s enough evidence to show me that the common association is management, not monitoring.

The next version of RHQ will close this gap and bring to the fore monitoring capabilities around configuration. This solution is actually twofold:

1) Detecting agent-side configuration changes

The RHQ agent, since it already knows how to discover the configuration for some managed product on demand, simply has to keep a record of what the last known configuration was. After that, it would need a mechanism to periodically scan for the current configuration, and test whether or not the last known was different than the current. If it was, the RHQ agent would send the results up to the server, which would persist it as the new configuration for that resource and at the same time add an element to the configuration audit trail, so that administrators can see what changed over time.

This development work was committed last week, and the QA for it (RHQ-988) can be tracked in JIRA here.

2) Alerting against changed configurations

This logic is completely server-side, and deals with the ability to set up alert definitions against resources that support configuration. As a consequence of doing this, alert templates will be able to create “monitors” across a large segment of the inventory quickly, thus making it easy to receive notifications when any managed resource in your enterprise has its configuration changed external to the RHQ infrastructure.

This feature has been on the docket for nearly 6 months, but depended on configuration change detection (see above) to be written first. So, once I saw the bits for RHQ-988 in SVN, I wasted no time implementing RHQ-342. That development work was completed this past weekend, and the QA can be tracked here.

So what does this all mean? Well, it means that any plugins that have written configuration support for the resources they define can now be managed AND monitored.

Below is a short list of the various different configurations provided by the base plugins found in the RHQ project:

* APT repository locations
* GRUB kernel entries
* hosts file mapping of IP to canonical names
* SSHD settings, advanced configuration, and X11 properties
* PostgreSQL configuration files as well as runtime properties; database user settings, passwords, and privileges; and table schemas
* RHQ agent configurations

At the time of writing, this author knows of at least one other project that builds extensions to the RHQ platform, and it is called Jopr. Its primary focus is to provide plugins for JBoss Application Server and related services. Simply by dropping the Jopr plugins into your RHQ distribution, you would extend the configuration monitoring capabilities to the following items:

* datasource configuration and advanced settings
* connection factory properties
* JMS queue & topic information

Configuration monitoring just scratches the surface of some of the feature enhancements targeted at the 1.2.0 release of RHQ, but it does bring the platform one step closer to being a complete, end-to-end management and monitoring solution for your enterprise.

If you’re interested in helping to improve the base platform, have ideas for new plugins or extensions to existing plugins, or just want to be closer to the action, please visit the development team in #rhq on irc.freenode.net – my handle is joseph42.

Written by josephmarques

October 21, 2008 at 5:21 am

Posted in rhq

Tagged with

Resource Group Versatility

leave a comment »

When you first download and install RHQ, you’ll log in to the web console and notice that there are two different types of grouping constructs for resources – mixed and compatible. In short, compatible groups must contain the same types of resources, whereas mixed groups do not. Under the covers, these are implemented by the exact same construct, but how meaning has been applied to them, and what you can do with each of them, is why this blog got the title it did.

Mixed groups are predominantly used for security, in particular, authorization. With them you can put all sorts of resources together – Windows and Linux platforms, IIS and Apache servers, etc. Then, you can attach that mixed group to a role, and any users in that role will be able to see those resources.

If you want to be able to give someone access to an entire box, then create a mixed group with the “recursive” option enabled. By turning that option on, any resource you add to the group automatically adds all descendant resources to the group as well. For instance, if you add a platform, it will indirectly add all servers under that platform, as well as all services under all of those servers, and so on.

While mixed groups have one thing they’re good at, compatible groups have an array of functionality they excel at providing. First and foremost is their “compatibility” with all of the other subsystems RHQ provides: monitoring, configuration, operations, etc.

For monitoring, RHQ shows aggregate and average metrics across the group members. For configuration, RHQ enables you to change the configured connection properties across everybody in the group at the same time. For operations, RHQ allows you to execute the same operation against all resources in the group – at the same time, or serially (one after the other, in rolling fashion).

Very recently, a customer pointed out to me how groups – mixed and compatible – can be used in a novel way. Their question was simple: what’s the easiest method to see all of the resources in their environment that are down?

In order to do this today, you have to use the Browse Resources page, go to the each tab in turn – platforms, servers, and services – and sort on the availability column. Granted, this is fairly easy to do and doesn’t take all that long, but wouldn’t it be nice to be able to automatically create a group that contained any and all resources that were down in the system.

OK, maybe you’re initial thought is “why not just use the Problem Resources portlet?” Well, a ‘problem’ resource isn’t necessarily one that is down. If you have ANY alerts, or if you have metrics that are more than 5% outside of their baseline range (a running average calculated over time automatically by RHQ), the resource will also show up in this portlet. This customer JUST wanted the unavailable resources.

Alright, and maybe your second thought was “well, why not use alerts?” Today, we can fire alerts when a resource goes down, and you CAN use the notification mechanism so that you get an email when this happens. However, there are at least two problems with this strategy:

Problem 1

Alerts are only good at telling you what JUST happened in the system. Alerts will be created as the result of some agent sending data up to the server, such as an availability report or the results of an operation. So, if you already have resources that are down before you set up your alert definitions, you will not be notified because those resources were already down.

Problem 2

Setting up availability alerts across ALL resources in the system will take a while. A lot of time could be saved by using the alert templates feature (Administration > Monitoring Defaults), which would make sure that all existing resources (and any resources that are imported in the future) automatically have alert definitions created for them. However, you’d still have to set up one template across every single resource type in the system, and so depending on how many plugins you have installed could be several dozen templates to create. Also, for each of those alert templates, you’d have to setup identical notification rules too, which takes more time still.

Interestingly enough, before I could even reply to the customer, they suggested a solution – a feature enhancement, to be precise – which would do the trick. They wanted to extend DynaGroups to be able to aggregate resources by availability.

I was floored by the simplicity of this suggestion. In fact, I sort of recall rubbing my eyes looking to wake up from a dream, because I thought it was so incredible that the development team hadn’t thought of this before. And I wasted no time creating the issue in JIRA to track this request.

Anyone that knows me probably already guessed I had the fix locally within an hour, but because the request came in during the final seconds just before the 1.1 release I held off on committing it. Though, as soon as SVN was unlocked for 1.2 development it was one of the first commits.

If you’re building off of trunk (or running anything rev1730 or greater), it’s easy to create a Group Definition that will always keep a DynaGroup populated with the resources that are unavailable.

resource.availability = DOWN

But let’s say you are monitoring a very large inventory, and want to break things down further to keep the groups more granular. For example, let’s say you wanted to create different DynaGroups for each type of resource that’s down. This way you can look at your IIS servers that have failed, independent from your Apache vhosts that aren’t up, separate from your File Systems that aren’t at their expected mount points. That expression set would be as follows:

resource.availability = DOWN
groupby resource.type.plugin
groupby resource.type.name

But maybe that creates too many groups, or gives you results for resource types you aren’t interested in. Let’s say you want to focus your search because you only care about one specific type of resource failing, maybe just your Apache servers. Instead of grouping by the plugin and resource type, specify those pieces of information exactly:

resource.availability = DOWN
resource.type.plugin = Apache
resource.type.name = Apache HTTP Server

Thus, in a roundabout way, resources groups can actually be used as indirect tools for monitoring the health of your platforms, servers, and services.

This, however, just scratches the surface in terms of how groups can be used to monitor your enterprise. One major focus for the 1.2 release of RHQ is going to be on cluster management. Remember, compatible groups serve as a natural way of exposing RHQ subsystems at the group-level. So expect to see lots of new group-level services and UI functionality.

At the time of this writing, the requirements for cluster support were in their infancy, but we encourage you to read the latest requirements and post your ideas back to the resource clustering thread in the forums.

Written by josephmarques

October 17, 2008 at 7:28 pm

Posted in rhq

Tagged with

Follow

Get every new post delivered to your Inbox.