Nowadays the large companies and institutions have complex IT Services to support business value chains that execute millions of business transactions around the clock. All this business activity need to be monitored for problems and technical failures.
Question: How can we assure 99.7% availability and reduce cost at the same time ???
In this article I will describe how a typical company monitors their IT services and how it can achieve situational awareness.
Current state of monitoring at so many companies
Keeping your business services running smoothly and on track is challenging. Because of cost cutting, most monitoring specialists have left the company and monitoring is delegated to the DevOps team. DevOps teams have been experimenting with open source tools for some time. They now use ElasticSearch and Logstash for events and Graphite and Grafana for metrics. Some monitoring guidelines have been defined and shared but teams interpret these guidelines differently. All teams have developed their own dashboards and the monitoring data generated by their services cannot be used for comparising. The team dashboards show services managed by a single team and require team specific knowledge for interpretation. All this makes it very hard to get a global overview when handling company wide incidents in the war-room.
In addition to this the DevOps teams focus on delivering new services to the business. This results in little time left to monitor and optimize IT services. As a result availability drops, system complexity increases and hardware resource utilisation efficiency is low. Resulting in high costs for emergency repairs and underutilized server resources.
- Teams primarily focus on managing their own services
- End to end business value chain management is required but not in place
- Monitoring data and dashboards need to be standardised
- There is no global overview and no one is in control
- The quality of IT services is below expectations
Where do we want to go?
We are aiming for Maturity Level 4/5. This requires us to implement:
- End to end chain monitoring
- Business Transaction Monitoring
- Event tracing with UUID
What do we need to get there
- Get organized
- Take monitoring serious (on the management agenda)
- Setup guidelines / share knowlegde / inspire
- Provide monitoring as a Service (help the DevOps teams)
- Collect data (quality and value)
- Collect event data on businesss transactions
- Tag activities with channel and business value chain identifiers
- Clean up, normalize and enrich existing monitoring events
- Provide global overview
- Standardize dashboards
- Visualize end to end business value chains
- Generate reports on availability
- Improve and simplify your infrastructure
Smart Monitoring provides:
- Real-time support
- End to end business value chain monitoring
- Business transaction monitoring
- A single view on service health
- Incident impact analysis, predictive monitoring
- Long term support
- Improve system design, reduce complexity
- Explore and find weak points
This article is part of a series on monitoring.
I work as a consultant and developer, building and managing microservices.