Only 2…

That’s the number of KPI’s you need for improving your service management related to incident, change and problem management.

Only 2 you say? Yes, only 2. And they are:

  • Number if incidents (with a certain threshold, preferably 0)
  • Resolution time (with a certain threshold, preferably ASAP)

Incident management
The less incidents you have the better your services perform. There are 2 ways to decrease the number of incidents you have on your plate at any moment in time:

  • Prevent incidents from happening
  • Solve incidents faster

Actually, the number of incidents shouldn’t be a KPI for incident management on it’s own. Incident management starts when an incident happens, so it has “no” influence on the creation of incidents. It does have an influence on how fast the incidents are solved.

Problem management
The goal of problem management is to actually prevent incidents from happening again and/or elsewhere, therefore problem management directly influences the number of incidents you have. If you speed up your problem solving process, you will have less incidents faster.

Change management
We all know that a lot of incidents happen because a change wasn’t implemented correctly (the code itself, the way it was implemented, or whatever). If you perform changes really well, this directly influences the number of incidents. The faster and better you implement changes, the number of incidents will go down.

What are your KPI’s? And can they be reduced to jus these 2? Let me know and see if we can find a way 🙂

DoD for incidents

So, incident management and Scrum. Unplanned work vs. planned work. The shear definition of things that pop up unexpectedly vs. sprints with fixed work that cannot be changed. ITIL vs. Agile. And how to connect both.

One of the ways to connect different ways of working with different goals, is to try to speak the same language and align communication. With this goal in mind, I’m experimenting with what I call the Definition of Done for Incidents. In this way at the end of the incident management process, you know what needs to be done. And when asked the question: “Are you done done?”, you can answer: “Yes”.

I’m suggesting that an incident is done, if:

  • All the incident AC’s are met
  • (Temporary) fix is available on production
  • Live (temporary) fix is verified by reporter/enduser/customer
  • Service is back to normal service level (SLA)
  • Documentation is updated (if needed)
  • Problem ticket is created if temp fix or critical incident
  • The incident ticket is updated

My questions to you are: What do you think? What am I missing? What is unnecessary? What other feedback do you have?