The Software Dinner Party
Running a successful project and putting out a successful software product that sells is much like organizing a dinner party. Your guests are going to have a wide variety of personalities and experiences, which will undoubtedly lead to a range of different tastes in music and cuisine.
If you focus solely on your favorite food and entertainment, you’re very likely going to find at least one or two people that don’t enjoy themselves. Instead, you should concentrate mostly on your guests. Learn their likes and dislikes, their preferences, and what their expectations are – then try to satisfy as many of them as possible, while delivering just a bit more than they expected.
Software development can benefit from following a similarly balanced plan: learn the likes and dislikes of your user community, what they want to see in terms of bugs fixed and new features added, and what their prioritizations are as far as the most important things to see get done in the next release cycle – then try to satisfy as many of them as possible, while delivering just a bit more than they expected.
One of the more important parts of becoming a seasoned and well-rounded software developer is not your ability to write code, it’s the ability to recognize that your personal interests and stake in a project / product are not always the best for the community at large. This author makes no attempt whatsoever to hide that his interests lie mostly in and around the platform, as opposed to plugin development / refinement, or writing an abundance of documentation, or working on audio-visual demos.
That said, and even though I realize I’m just a small slice of a much larger pie, I try to advocate for what I feel is right for the product on the off-chance that some customer simply forgot to mention it or, more likely, didn’t realize it was something they actually wanted in the first place. At the same time, I still appreciate and respect the dinner party analogy, and the fact that the software I write is primarily to serve others – not myself. A perfect example of this happened just recently.
The 1.1.0 release of RHQ brought with it the long-awaited arrival of the high availability and failover feature set. Despite this having taken months and months of coordinated, distributed man effort mucking around with some of the lowest-level APIs of the platform it, in my eyes, only scratched the surface of what still needed to be done: higher scale, better isolation of services, visualization of the agent-server runtime topology, and greater visibility into data flow patterns across the infrastructure.
However, perhaps more than any of the aforementioned, I really wanted to see us simplify the configuration of the communication layer. Today, regardless of which RHQ release you use, you need to have two endpoints exposed: the agent needs to contact the server on some address/port combo, and the server needs to contact the agent on some address/port combo. There’s no technical reason why there needs to be a full, bidirectional link between these two endpoints. The communication, in theory, could be rewritten to open a unidirectional link, and then have responses piggybacked on the open connection.
The telephony industry solved this problem a long time ago. You pick up a phone, dial someone, then you can both talk back and forth across the line even though only one of you initiated the call. It seems rather silly and overly complicated, in retrospect, to even think of doing it any other way. Just imagine how awkward that would have been: I call you and talk on that line, but you have to call on a second line if you want to talk back to me, and we each need to use two hands to hold both our phones up.
With telephones, any person can initiate the call because both people have telephone numbers. Likewise, the unidirectional link between the server and agent could be established at either endpoint, but it makes the most practical sense to have the agent initiate the connection. From a security standpoint, under the assumption that your network is locked down, you’d only have to punch holes for incoming communication on the handful of RHQ servers you have in your infrastructure, as opposed to the hundreds (or perhaps, in the future, thousands) of agents you would have installed if the connection was initiated in the reverse order.
Unfortunately, adding this feature complicates the required semantics in order to properly deliver some business services. As it stands today, RHQ has many services which were written under the assumption that every single agent is visible to the server performing the business service workflow. However, since it’s possible that any server can initiate that workflow, it really implies that every agent must be reachable from every server. But this runs counterpoint to the unidirectional channel idea, which states that servers can only piggyback messages down to agents that initiated connections to them.
Thus, these business services need to be refactored. Though, rewriting each service in an isolated fashion would only create havoc within the code and make things rather difficult to understand and maintain over time. Instead, there needs to be something that can, in a generic fashion, distribute a single business workflow across a range of servers, as dictated by the need to communicate with specific agents.
The solution I’m hinting at is what I’ve termed a fully partitioned services framework – a mechanism by which servers can indirectly communicate with one another when they need to send a request to or get data from an agent that isn’t connected to them. By writing this logic as its own abstract mechanism, the framework can expose itself to programmers via a simple API, and any business service that needs to be partitioned would thus be written in a consistent way. The programmer wouldn’t even have to care how the request is being carried out, just that it delivers on its promise.
With these two devices in place – unidirectional communication and full partitioning of business services – the platform architecture simplifies into an easy to understand, easy to visualize, and incredibly easy to paraphrase topology: agents talk to servers, servers talk to the database. It simplifies configuration, and thus makes installation simpler, reduces the maintenance burden (when you want to add more agents), and even makes upgrading easier too.
So why, with all of these benefits, have we not already done this? Well, it goes back to the dinner party. As with most things in life, there needs to be a balance of priorities. If we spent all of our time focusing on the architecture, the platform wouldn’t have lots of different business services. If we spent all of our time focusing on adding new business services, there would be no plugins that took advantages of the new services. If we spent all of our time writing new plugins (or improving existing ones), they might not have a solid base into which to be installed.
Being a platform guy at heart, it sometimes pains me to see that I can’t spend every waking moment improving JUST the platform. However, my role as technical lead constantly reminds me of this balance that must be maintained for the greater overall success of the product. And, when I take all of that together, my personal preferences are a concession easily made…because I want to make my guests happy. Nonetheless, I’m confident there will come a time when it’s right to work on these – it’s just not today.