AIOps for IT Ops — Part Two: Gartner Market Guide Insights
AIOps for IT Ops – Industry analyst firm Gartner recently released a new report entitled Market Guide for AIOps Platforms. It’s a 20-page document that offers their perspective on the AIOps market. Unlike a Gartner Magic Quadrant, the Market Guides are not vendor comparisons. Market Guides are often precursors to MQs – they are used for emerging markets that may eventually have an MQ. Among other things, they often help explain what a market is, how it can align to buyers’ future plans, and what risks are involved for those investing in the market.
I’m writing a blog series to dig a little deeper into some aspects of this very important Market Guide. You can read the first in this series. It covers the basics, including explaining what AIOps is. For this post, let’s start by reiterating this key point made in the market guide:
“There is no future of IT operations that does not include AIOps. This is due to the rapid growth in data volumes and pace of change (exemplified by rate of application delivery and event-driven business models) that cannot wait on humans to derive insights.”
As I said previously, I’ve been working with Gartner for over 15 years, and I don’t remember seeing verbiage this strong from them ever, including in the previous Market Guide on AIOps Platforms. I infer from this that Gartner is hearing this from their customers, and it’s pervasive. Aside from that, Gartner acknowledges in the document that, “It is simply impossible for humans to make sense of thousands of events per second being generated by their IT systems.”
Whether or not you accept that premise, it certainly seems that AIOps will be a factor in IT Operations and DevOps for the foreseeable future.
So let’s dig into another key topic covered in the document – topology. The document calls out three key attributes of AIOps platforms – data ingestion and handling, machine learning (ML) analytics, and remediation. Within the ML analytics category, there are multiple approaches, and one of them is topological analysis. It says:
“Topological analysis. AIOps platforms may use application, network, infrastructure or other topologies to provide contextualized analysis. Deriving patterns from data within a topology will establish relevancy and illustrate hidden dependencies. Using topology as part of causality determination can greatly increase its accuracy and effectiveness.”
For years the analyst community has been saying that a key inhibitor for Gen 1 AIOps platforms is that they don’t have topology. (For an explanation of Gen 1 AIOps platforms, please see the first blog post in this series linked above.) Topology was the key thing missing from those platforms that prevented them from root causing issues.
What is topology? Topology is the mapping of IT services. It is the understanding of what systems constitute an IT service and how those systems are connected and reliant upon each other. As with many things, this started with a simple approach – drawing infrastructure diagrams and updating them when something changes. It’s probably obvious this approach doesn’t last long because scale and complexity quickly make that approach untenable. Add in the third element of dynamic/ephemeral systems, such as virtual machines, containers and microservices, and that approach is rendered pretty useless.
Now there are monitoring tools that determine topology automatically and dynamically. This is the only real way to have a clear understanding of all components of an IT service and how those components are dependent upon each other. This enables an understanding of how the failure of one component may affect the others and the risk associated for the performance of the IT service.
Gen 1 AIOps tools had no concept of topology. They were designed to ingest events from scads of monitoring tools and use machine learning algorithms to find patterns in that data. What the world determined after a few years of many failed projects is that this wasn’t a reliable way to root cause IT incidents and certainly wasn’t a reliable way to prevent IT disruptions.
So monitoring platforms had topology, and AIOps tools had machine learning algorithms to sort through data. Originally, neither had either. Monitoring platforms like Zenoss Cloud began building the AIOps capabilities, which is the essence of the Gen 2 AIOps platforms. But to date, no AIOps tools have been able to build the capability to automatically determine topology. It turns out it’s much more difficult to go in one direction than the other. Why? Because determining topology requires the hard work of collecting raw data from machines, not just collecting events. The best algorithms in the world aren’t going to solve real-world problems if they don’t have the data. And this is why topology should be a key consideration when you’re evaluating AIOps platforms.