Splunk Live: A Quest for Splunk Singularity by John Bythrow, OpenSky Virtualization Engineer


I recently attended the yearly SplunkLive event in NYC in an effort to get my customer’s fledgling Splunk infrastructure monitoring effort on track. If Splunk, system monitoring, big data, or business analytics lie in your wheelhouse I’d strongly recommend attending SplunkLive yourself sometime. I am admittedly a Splunk neophyte and was therefore encouraged to find that the attendance was brisk, the content relevant, and the attendees energized.

Zeroing in on my perspective here, I’ll reiterate, that my focus was in getting a broad sense of Splunk’s true value & strength relative to my customer’s desire to use it as a focal point for infrastructure monitoring. My client is a large health insurance company & my team is responsible for tens of thousands of Virtual Machine instances, with all the supporting software and hardware. My gut feeling going into this was that Splunk had a key role to play, but not perhaps the one-stop-shop of infrastructure monitoring that some on my team were pushing for. We do have an existing Splunk infrastructure, but its use case has been more around security, audit and compliance. Hence, this infrastructure monitoring effort is more or less an animal unto itself, with a potential user base far exceeding our current use case.

What follows is a brief list of take-aways, conclusions & impressions, with an emphasis on purposing Splunk for IT Infrastructure monitoring.

Upgrade to Version 6.2 ASAPSince I’m targeting a green-field deployment here (for the infrastructure monitoring), I’m more concerned with having our new users face the latest & greatest, & avoid a second learning curve when we upgrade. Also, since we have a boatload of integration ahead of us, we’ll want to leverage the new features to speed development. If you’re looking at an upgrade scenario yourself, the usual caveats of rolling upgrades apply, but are not addressed here. Splunk does recommend you run with all indexers, search heads, & collectors at the same version, although a mild mix of versions may function in a pinch.

Important new features – Splunk is pitching this as a groundbreaking revision. You can get the low-down on what’s new here: http://docs.splunk.com/Documentation/Splunk/6.2.1/ReleaseNotes/MeetSplunk.

  • Features that I found relevant to my customer are as follows:
    • Has powerful new “Patterns Tab” when searching that can aggregate 1000’s of results to a few unique instances. Click HERE for a demo of how it works.
    • Dashboards are a bit easier to share with “Prebuilt Panels.” You can share a prebuilt panel on multiple dashboards and, when displayed, it is essentially a pointer to the source panel. You can convert any normal panel to a prebuilt, or share via basic XML.
    • Pivots are made much easier because they build data models on the fly. Previous versions force you to figure out a relevant data model before you can do anything. Click HERE for the Demo.
    • For a more in-depth look, you can install the 2 Overview Splunk App onto one of your Search Heads from: https://apps.splunk.com/app/1892/ . It gives you a quick walk-through of what’s new in 6.2.
  • HUNK: Splunk and Hadoop Get Busy– Being new to Splunk (and Hadoop for that matter), this presents more questions than answers (but cool questions & promising answers).
    • HUNK is a Splunk product that allows Splunk to probe an Hadoop cluster.
      • This could allow you to “collect it all” and let Splunk sort it out later and enable you to both architect and afford a much longer retention period to analyze trends over months and years.
      • Some monitoring solutions even use Hadoop as a back-end.
      • Yahoo, the creator or Hadoop, uses Splunk to query Hadoop.
  • Splunk Free version has a lot to offer
    • A startup health insurance company had an IMPRESSIVE demo of a huge analytic infrastructure they built all on Splunk Free.
      • Their Splunk interface is fully Dashboarded, so that regular users have no free-range searching.
    • Important Note:
      • You should lock down users to disallow “*” and “all time” searches so as to not allow a single user to peg the system
    • The health insurance company trains one user in each department, as the Splunk-Lead, who in turn teaches team members to search, dashboard, etc.
      • This is a key model for, as whatever monitoring solution one uses, you’d need consumers to create their own dashboards. The end-state is that application owners create application centric dashboards with that application’s key metrics. (I.e. storage dashboard, exchange dashboard, accounting dashboard, etc.).
  • Splunk for IT (Vmware, Netapp, Exchange, Windows, Linux, etc.)
    • The actual builder of of the Cisco UCS Splunk App gave a presentation and provided valuable insight into Splunk’s commitment to “IT” apps.
      • I told him I sensed Splunk’s commitment to “IT” apps was lukewarm, as a lot were v1.x. I said I felt Splunk had a split-brain condition between monitoring & big data analytics. He agreed that Splunk was not a monitoring solution, and never would be, and that the big money was on the Big-Data side of the business. He also agreed with my analogy that Splunk was the MRI in the hospital and standard monitoring was the thermometer in the medicine cabinet. Both essential.
    • Confirmed that data in Vmware app and Netapp app can be cross-correlated. e. you see a datastore problem in the Vmware app, and you can drill into the Netapp aggregate.
    • A medical company spoke about how their Splunk environment kept growing organically. Users would find out about it and continue finding use cases. The point here is that Splunk seems to lend itself to federated management; either that or the designated Splunk guru will burn out fast.

Key Take-Aways


“If I had an hour to solve a problem I’d spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.” Albert Einstein – Quoted by Presenter at SplunkLive

  • So maybe we need to spend more time figuring out exactly what monitoring problem we are trying to solve before we start putting more effort into a solution that may or may not suit the problem.
  • There is a lot more momentum and excitement around Splunk as a big data solution than a strict monitoring solution.
  • There is more Splunk momentum around security monitoring than infrastructure monitoring.
  • There is more Splunk momentum around business analytics than infrastructure monitoring.
  • I sensed a certain lethargy and apathy around the infrastructure Apps.

So, in conclusion . . . to my quest for insight into Splunk for It Infrastructure Monitoring . . .

Is a comprehensive monitor like Nagios or Sciencelogic a more practical approach, since the hard part is extracting intelligence from the data, & not the mere collection of it? A standard monitor is tuned to poll common infrastructure components (Netapp, Citrix, VMware, etc.). Splunk is a little light on apps to dive deep into vendor specific Machine Data streams. They are growing, but firmly in catch-up mode to full blown monitoring solutions (and frankly, they have a lot more money to make elsewhere).

And, assuming Splunk is collecting ALL the machine data (its mainstay) maybe the strength is two-fold:

  • Post-Mortem analysis, after an event, or trending even.
  • Finding what you weren’t looking for. (I.e. exposing patterns of usage, errors, etc. that you didn’t know enough to even be looking for) Splunk answers the old adage: “You don’t know what you don’t know.”

Then . . . conditions exposed via Splunk deep analysis can be piped into your standard monitoring solution as alerts, data points, or properties of objects (I.e. red/green condition on a VM, or a fault count on a switch).

If you’re still with me, I’ll wrap up with my favorite Splunk analogy. When your child gets the sniffles you rush down to the emergency room for an MRI. <SCREECH!!!> Or . . . grab the thermometer from the medicine cabinet. Yeah, the second one. Likewise, a purpose-build infrastructure monitoring solution will always have its role as your go-to for basic visibility, your thermometer in the house. But, in the wild world there are nastier bugs than the common cold. And for that, we have Splunk, the MRI Machine, the veritable Scanning tunneling microscope of the IT industry. SMBs may do fine with just Band-Aids and aspirin, but in critical, fully-scaled, enterprise infrastructures, the MRI that is Splunk, is an option whose co-pay may be worth the outlay.

More Cool Splunk Resources: