This Week in OpenNMS, Friday December 12th

So far the response has been positive to TWiO, so it's time again for another chapter.

Project Updates

  • Stable: Ticketing Updates

    Jonathan Sartin worked more on the OpenNMS trouble-ticketing API, finishing up a change to allow ticketer plugins to be able to raise exceptions when errors occur. In addition, he resurrected work on an RT (Request Tracker) ticketer plugin. Both will be merged to the 1.6 branch when they're ready.

  • Stable: Bug Fixes

    I spent a little time this week cleaning up some bugs, mostly exceptions, simple bugfixes, and other minor changes.

  • Stable: Windows Updates

    Bobby Krupczak was kind enough to lend a Win64 machine with Visual Studio so I could get 64-bit binaries built. They are available now on SourceForge. (Copy the jicmp and msvcr90 DLLs to your system directory, don't forget to rename them to remove the -win64 bit.)

    In the process of working on that, a reasonably serious bug was found in the JICMP libraries on Windows. Windows passes around HANDLEs instead of file descriptors for the purposes of file I/O (and socket I/O). We were treating them as if they were normal file handles (ie, integers), which just happened to work on Win32, not because it was correct, but because we didn't try to access the filehandle from Java directly. 🙂 The code has now been cleaned up to treat socket operations on HANDLEs (well, SOCKETs) properly on Windows, and the new DLLs in the JICMP 1.0.9 binary release contain these changes (win32 and amd64).

    Additionally, it was determined that Windows XP, at least, is pickier about reading from raw sockets than other platforms, and requires that at least one packet is sent out the socket before reading any in (even though we're just listening for any ICMP packets on the raw socket, so it shouldn't really matter). This is fixed in the jicmp.jar in the 1.0.9 distribution; until 1.6.2 comes out, it is recommended you copy this jar over the one included in $OPENNMS_HOME/lib if you are running on Windows.

  • Trunk: More Work on the New Provisioner

    Donald continued implementing "detectors", the equivalent of Capsd's protocol plugins, including a new asynchronous API which will allow us to interact with services at a much more granular level.

    Matt has proposed we use more agile development practices, and towards that goal, he started working on a document describing the current desired featureset of the provisioner. We've broken it up into chunks which will then be used to do code sprints so we can do a better job of divvying up the work. We're working on going through and cleaning up the documentation and hopefully we'll get it into the wiki soon.

  • Trunk: Refactoring Maven (Services)

    Right now, one of the biggest chunks of code in our source tree is "opennms-services". It's kind of a dumping-ground of all of the high-level code that isn't webapp code, and it's a bit unweildy. I've started a branch to attempt to refactor at least some of it into smaller chunks (splitting out capsd, collectd, config, eventd, mock, notifd, poller, threshd, trapd, xmlrpcd, and so on). Work will be ongoing, but hopefully eventually we'll have things a bit more manageable.

  • (Almost) Trunk: WMI Support

    Matt Raykowski has been finishing up work on his WMI (Windows Management Instrumentation) monitor and collector and we hope to be merging it into trunk in the near future. There are just a few more finishing touches to put on it, and then it will be ready for inclusion in the future 1.8 release.

Upcoming Events

(Or, as Tarus calls it, Git Yer Learnin' On.)

If you have anything to add to the events list, please let me know.

Development: A Brief Interlude

It's time for a little aside on some of the tools we use for development.

Maven is an awesome tool, but with a large project like OpenNMS, checking repositories and other things are actually a significant drag on the build process. To help alleviate this (and to speed up building and sharing artifacts on our Bamboo continuous-integration server) we set up a Nexus proxy at the office. It worked so well, I set one up at home too. =)

If you'd like to set up your own, I've documented what we did on the OpenNMS wiki. Feel free to update it (and/or ask questions) if you run into any issues.

And on the subject of Bamboo, I realized I haven't yet publicly thanked Atlassian for their generosity. They've generally been very generous to open-source projects, offering free licenses for their tools (continuous integration, bug tracking, etc.) OpenNMS has a very large build, and it can take a long time to work it's way through the continuous-integration process even with a fast machine. They were kind enough to extend our open-source license to allow for more than the usual number of agents, and we're now burning rubber with our entire set of classroom machines acting as build agents. No longer do we have huge queues of waiting builds, we get very timely Jabber notices when we break the build. Hm... maybe that's not such a good thing after all. 😉

Anyways, thanks again to Atlassian for the license upgrade, it's working great!

That's It for This Week

That's it for now. I hope you're enjoying these updates. If you have any questions or comments, or if you've done something cool with OpenNMS that you'd like included, please let me know.

Share on Facebook

This Week in OpenNMS, Friday December 5th

I was recently remembering fondly the oldskool "OpenNMS Updates" that Shane O'Donnell used to do, and I thought I'd take a crack at reviving the tradition. So, without further ado...

Project Updates

  • Stable: Current Release: 1.6.1

    OpenNMS 1.6.1 seems to be holding up without any huge bugs reported. There are no immediate plans for a 1.6.2 release but I would expect we'll look into doing one in a month or so just to roll up any bugfixes that have happened, if nothing else.

  • Stable: Configuration Tweaks

    David did some configuration changes that will be in a future 1.6 release which automatically roll up SNMP LinkUp and LinkDown traps into an alarm. Also, Jeff added a bunch of IETF and vendor MIB data collection and graphs to the 1.6 branch.

  • Trunk: Provisiond

    Development is still churning along in trunk, with the focus on the Capsd rewrite. Since this is the first (of hopefully many) TWiO, I will go into a bit more depth.

    OpenNMS has always been written with scalability in the forefront of our minds, but JVM technology and development methodologies have moved a lot since we first started the project 9 years ago. Since the OpenNMS 1.2 series, and even a bit before, many of the subsystems of OpenNMS have been tweaked, cleaned up, and even rewritten for performance and other reasons. One of the things that's needed attention for quite some time is Capsd, the part of OpenNMS that figures out what devices are on your network, and what those devices' capabilities are (hence, Capsd).

    Matt Brozowski has been working on a complete replacement, called Provisiond. One of the benefits of the OpenNMS architecture is that it's event-driven structure with distinct daemons for different subsystems makes it easy to make new implementations without breaking existing things. Thus, Provisiond will be installed (disabled, by default) alongside Capsd in 1.8, and in 1.10 (or 2.0), it will become the default capabilities scanner, with Capsd deprecated but not yet removed.

    The Provisiond architecture is such that a very robust threading model has been introduced to allow scanning huge numbers of resources in a very efficient manner, and the API has been written to make it very simple to implement new "detectors" which detect services and resources.

    Eventually this new architecture can (and should) be extended to polling/monitoring and data collection, but... one thing at a time. 😉

    Anyways, Matt has been rockin' the Java working on Provisiond, and I've been pairing with him off and on. I really like the way this code is shaping up. Donald in the meantime has been making tons of detectors so we can cover the large range of services we already detect with Capsd.

  • Trunk: Mobile UI

    Alexander Finger started work on a simple mobile UI that gives you the outages summary on a single page (at least on my iPhone <g>).

    OpenNMS Mobile UI

Upcoming Events

If you have anything to add to the events list, please let me know.

From the Discussion Lists - Cartographer

[original thread]

Bobby Krupczak has created a tool called Cartographer which "implements a novel approach to managing distributed systems by automatically discovering and tracking the relationships between its component systems and applications." In addition to the Cartographer protocol (XMP), software, and agent, he has written a collector plugin for OpenNMS which allows the collection and graphing of Cartographer XMP data. Very cool!

Hopefully soon we will get the collector integrated into OpenNMS proper and a future release (1.6 if it's not too invasive, or at the latest, 1.8).

That's It!

So that's it for this week. Let me know if you think this is a useful thing for me to do, if there's anything you'd like me to talk about that's missing, or, well, if you have any comments at all. Until next week...

Share on Facebook

PostgreSQL ‘IPLIKE’ Plugin Available for Windows

Since we (regrettably <g>) support a couple of customers running OpenNMS on Windows, I spent some time yesterday getting IPLIKE built on it, finally. You no longer have to rely on the slow PL/PGSQL version of it, and can instead use the nice speedy native C version instead.

For details, see the wiki page for IPLIKE.

As far as I'm aware, there is not a win64 version of PostgreSQL, so I've punted investigating what it would take to get stuff built on it, but I would like to get a 64-bit JICMP built at least. Does anyone have a win64 development environment that could get it building for us? I have no Win64 licenses, much less development environment. A little investigating dug up a MinGW64 preview, but I have no idea if it would actually work or not. =)

If you run into any problems with it, let me know!

Share on Facebook

OpenNMS 1.6.0 Is Out

...and it features a ton of changes since the last stable release. Here's what I put in the release notes as an introduction to the 1.6.0 release:

Release 1.6.0 is the first stable release in the OpenNMS 1.6 series.

It's been 3 and a half years since the last OpenNMS stable version, 1.2, was branched and released as production-ready. In that time, OpenNMS as a project has changed tremendously, the community has grown exponentially, and massive numbers of new features have been incorporated into the "unstable" 1.3.x series.

In that time, the unstable codebase solidified to the point that The OpenNMS Group supported it as if it were stable; it was at least as stable as 1.2.x was, but many users held off on upgrading because of the unstable moniker.

After a lot of work, and a renewed focus on getting the next stable release out the door, we are now prepared to declare OpenNMS 1.6 release-candidate-ready.

Why 1.6 instead of 1.4? 3 years is a lot of time, and a lot has happened in that time. We're not ready to call it 2.0, we want to redo the web UI first, but 1.4 didn't really do the massive changes since 1.2 justice. So: 1.6 it is.

Since it is a lot easier to do a release than it was in the 1.2 series (now that the native code is moved out into separate packages, and OpenNMS itself is distributed as pure-java sources), the goal is to continue to be on a much faster 6-month or year cycle for new releases.

Please, let us know if you have any problems at all in our Bugzilla bug tracker.

To give an idea of what's changed, I put together a list of major changes since 1.2 with a couple of the other OGP folks.

Architecture and New Subsystems

  • Alarms: The largest architectural change from a user point of view is the addition of the concept of Alarms. Events mean so many different things in OpenNMS, it made sense to have a higher-level "event" which represents significant happenings in the system. Alarms fill that role, and as we move towards 2.0, events will be de-emphasized in favor of alarms for reacting to significant events. The new alarms system will allow important events to be "reduced" into alarms. If an event comes in with the same "reduction key" as a previous event, the alarm will increment the "count" of events, yet it will still only take up a single line in the alarm browser. Clicking on the count will bring up the event browser with just the events that have been reduced.
  • Automations: It is now possible to do a variety of automated actions through "automations". For example, say you have an alarm with the severity of Minor that has not been acknowledged in the last 20 minutes you might want to escalate the severity. Vacuumd has been enhanced with a configuration that now allows configuration of processes we're calling Automations that are defined by Triggers and Actions.
  • Windows: OpenNMS now runs on Windows.
  • PostgreSQL: OpenNMS supports running on top of PostgreSQL 7.4 through 8.3.
  • Syslog Improvements: The syslog daemon included with OpenNMS has been significantly enhanced, including regular-expression matching and back-reference support.
  • Model Importer: OpenNMS can now import node, interface, and service information from an external provisioning source. This facility can augment or replace the discovery functionality provided by Capsd.
  • Categories: Nodes can be assigned to one or more categories (eg Production/Test, Datacenter A, Datacenter B); these categories can be used in filter rules. This permits to selectively forward Alarms into certain destination paths based on the node category: "Send Alarms for Production in Datacenter A to Team A, Send Alarms for Test Systems in all Datacenters into the Maintenance Queue".

Polling and Data Collection

  • Generic-indexed data collection modeling makes it easy to collect, graph, and threshold on multi-instanced performance data, such as values residing in SNMP MIB tables.
  • SNMP4J: In addition to the existing SNMPv1 and SNMPv2 support provided by our in-house JoeSNMP Java library, OpenNMS now supports SNMP v1 through v3 using SNMP4J. The SNMP4J strategy is enabled by default, but you can go back to the JoeSNMP one if you have a specific need for bug-for-bug compatibility with OpenNMS 1.2's SNMP behavior.
  • JMX: Support was added for polling and data collection.
  • HTTP Collector: Support was added for data collection via HTTP.
  • NSClient: Support has been added for NSClient (and NSClient++) polling and data collection.
  • Data Export: It is now possible to export RRD data through the web UI.
  • Windows Service Monitoring: Windows services can be monitored through the NSclient support and via a special-purpose poller monitor that uses SNMP.
  • Mail Transport Monitor: It is possible to monitor the complete round-trip availability of a mail system, from sending to checking a mailbox.
  • Page Sequence Monitor: Support has been added for monitoring a complete transaction against a web site, including cookie storage, form submission, and checking the results of the output of a URL.
  • Distributed Monitoring: There is now a distributed monitor that allows you to do service monitoring from multiple locations reported to a single OpenNMS instance.

Thresholding

  • Thresholding for collected performance data is now performed in-line with collection by default. This change makes threshold evaluation virtually instantaneous while drastically lowering the CPU and I/O overhead associated with thresholding. Thresholding for latency data (data from the poller monitors) is still done in the old asynchronous fashion.
  • Absolute Change Thresholds: A new type of threshold useful for monitoring the values of such variables as radio transmitter power (in dB) where a relative change of a given magnitude may not be noteworthy, but an absolute change above some threshold is considered significant.
  • Expression-Based Thresholds: A new type of threshold allowing the user to specify an expression, in standard mathematical terms, involving one or more data source names, operators, and constants.
  • Custom Event UEIs in Thresholds: The types of events generated when thresholds are exceeded or re-armed can now be specified on a per-threshold-definition basis, allowing for much more flexibility in using thresholds as the basis of alarms and notifications.

Notifications

  • Roles: OpenNMS now supports on-call roles. If you have, say, an On-Call role where the users change over time, this feature allows you to schedule them in advance and OpenNMS will manage that schedule for you.
  • Group Duty Schedules: Works like normal duty schedules, except if a Group is listed as a target in a destination path, the duty schedule will apply to the whole group (individual users and roles also in the target are not affected).
  • JavaMail: JavaMail is now the default API used for sending e-mail notifications. This change eliminates the burden of installing, configuring, and troubleshooting a local mail transport agent such as Sendmail or Postfix on the OpenNMS server.
  • Path Outages: A basic path outage capability has been added. This feature addresses the need to suppress notifications for nodes that appear to be down to the OpenNMS system due to a failure in the network path between the nodes and OpenNMS.

Integrations

Web UI

  • Jetty: OpenNMS has a built-in web server (including AJP support), and no longer requires Tomcat for the web UI (although it can still optionally be used)
  • JFreeChart Support: OpenNMS now supports a JFreeChart integration which lets you add charts to the web UI.
  • Zooming: It is now possible to interactively zoom in on graphs.
  • StrafePing: OpenNMS includes an implementation of SmokePing.
  • RSS Feeds: Support has been added for RSS feeds for notifications, outages, alarms, and events.
  • New Look: The OpenNMS web UI got a face lift.
Share on Facebook

OpenNMS 1.6.0 On the Horizon

So I just finished getting OpenNMS 1.5.98 out the door. This is the first release that we've left a few (small) known issues in because we're in hard freeze.

I am so ready for this release to be out; there have been a ton of improvements since 1.2.x and the sooner we can get folks to the current codebase, the better.

Of course, while I was in the process of writing this blog post, Dave found a small but not-insignificant bug that is worth doing another RC for, so here comes 1.5.99! 😉

Share on Facebook

The Age of Scrutiny

Interface

I swear, this is the only political post I will do before the elections. No, really!

As the elections get closer and closer, the more I realize Neal Stephenson is not an author, but a prophet. He (co-)wrote a book called Interface which was a book about a politician who has a stroke, and has a chip implanted in his brain by the shadow government. It restores his motor control, but has the side-effect of having the ability to trigger memories with the direction of an external wireless device (designed to be a kind of "pacemaker" for the chip).

I know it sounds pretty crazy, but in the context of the book, it actually flows pretty believably.

Anyways, he goes on to run for President, and his campaign works out a way to use this memory-trigger to their advantage. They pick a small sample of people that represent a cross-section of the country, and then hook them up to a little Dick Tracy TV with an EKG in it that transmits their immediate emotional response to whatever they show them back to the campaign (sound familiar?). The campaign then triggers various memories so he can change tactics instantly if he starts losing support during a television appearance.

It's insane, and basically completely believable with current technology. Polling has become more and more prominent, to the point where polls about people's opinions about how they feel about how they think other people will react to polls is considered normal.

My favorite part of the book is when his campaign manager, Cy Ogle, is explaining why the issues don't matter in the current political realm.

"In the 1700s, politics was all about ideas. But Jefferson came up with all the good ideas. In the 1800s, it was all about character. But no one will ever have as much character as Lincoln and Lee. For much of the 1900s it was about charisma. But we no longer trust charisma because Hitler used it to kill Jews and JFK used it to get laid and send us to Vietnam." ...

"So what's it about now?" Aaron said.

"Scrutiny. We are in the Age of Scrutiny. A public figure must withstand the scrutiny of the media," Ogle said. "The President is the ultimate public figure and must stand up under the ultimate scrutiny; he is like a man stretched out on a rack in the public square in some medieval shithole of a town, undergoing the rigors of the Inquisition. Like the medieval trial by ordeal, the Age of Scrutiny sneers at rational inquiry and debate, and presumes that mere oaths and protestations are deceptions and lies. The only way to discover the real truth is by the rite of the ordeal, which exposes the subject to such inhuman strain that any defect in his character will cause him to crack wide open, like a flawed diamond. It is a mystical procedure that skirts rationality, which is seen as the work of the Devil, instead drawing down a higher, ineffable power. Like the Roman haruspex who foretold the outcome of a battle, not by analyzing the strengths of the opposing forces, but by groping through the steaming guts of a slaughtered ram, we seek to establish a candidates fitness for office by pinning him under the lights of a television studio and constructing the use of eye contact, monitoring his gesticulations-- whether his hands are held open or closed, toward or away from the camera, spread open forthcomingly or clenched like grasping claws."

Spooky, isn't it?

Share on Facebook

Mono 2.0 in Fink Unstable

I've got Mono 2.0 updated and packaged up for Fink unstable. It includes Cocoa#, Gtk#, and MonoDevelop 1.0, all tested and working.

Congratulations to the Mono team on getting 2.0 released!

Share on Facebook

OpenNMS 1.5.94/1.5.95/1.5.96 Released

After a few issues with an annoying poller bug and some cross-site scripting issues that ended up triggering a series of quick releases over just a few days, things are settling down again in the wake of the OpenNMS 1.5.94-1.5.96 releases.

Let me start by saying, holy crap we fixed a lot of bugs, and we're on track to get 1.6 out the door in the next month or so. There's only a few bugs left, and we're pretty much 100% focused on finishing those off.

For the first time in a while, this is more than just a suggested update, since a number of cross-site security issues were fixed. If you're running anything in the OpenNMS 1.3.x or 1.5.x series, it is very strongly recommended that you upgrade to 1.5.96.

As always, feedback is encouraged, please let us know if you run into issues, awesomeness, or anything inbetween. 😉

Share on Facebook

OpenNMS 1.5.94 Coming

So I'm getting really excited about the next OpenNMS release on the road to 1.6. We've got over 100 more bugs closed since 1.5.93, and only 22 left before we can release a 1.6 release candidate. We're holding off on 1.5.94 because of a rather interesting bug that can cause strange outage results for nodes not using the "critical service" functionality. Matt's worked out a unit test for it, so hopefully we'll have it wrapped up shortly and get the release out this week.

That said, if you're interested in helping us wrap up the bug-search for 1.6, feel free to try out the "1.6 testing" snapshot RPMs by following the yum installation instructions for the "testing" release and giving it a shot, and open a bug if you find any issues.

One of the big blockers holding up a real release is consolidating the installation documentation in one place, be it the wiki, or the out-of-date install guide. If you have suggestions on things that could be clearer for installation, configuration, or anything else, please open a bug! At this point OpenNMS is pretty solid, it's arguably more important to get documentation cleared up where it's confusing.

Share on Facebook

Google Chrome on Mac OS X (In Wine)

Just a quick note to say that based on these instructions, I was able to get Google Chrome running on Mac OS X, using Fink.

You'll need to enable unstable ("fink configure", followed by "fink selfupdate-rsync"), and then do a "fink install wine cabextract". Then start at the "offline installer" part of the instructions.

Woot!

Google Chrome on Mac OS X (Screenshot)

Share on Facebook