"It does what it says on the tin and it works every time we need it to work."
Matt Sanabria
Staff Site Reliability Engineer
Before Rootly, the SRE team at Cockroach Labs faced time-consuming manual workflows and spent countless hours piecing together post-mortems. Now, with Rootly’s seamless integrations and automation, they’ve transformed their entire process—streamlining workflows, minimizing effort, and cutting post-mortem creation time to a fraction of what it used to be.
The impact? Faster resolution times, fewer man-hours spent, and improved team morale.
Hear directly from the team at Cockroach Labs about how Rootly has empowered them to focus on solving incidents—not managing tools.
Before Rootly, incident management was basically a manual process. We’d create a Confluence page and try to pull in things from Slack— or whatever channel we used—into that page. We’d gather everyone’s story to build our timeline, figure out root causes, and ask our five whys. It was a lot of overhead that I didn’t want to deal with.
As an SRE on the on-call path for production infrastructure, that meant extra work. When an incident arises and I’m on call, I’m the one actioning it.
Being able to create an incident right from Slack and pin messages so they show up in Rootly and become part of the postmortem is probably the best feature. It keeps our engineers focused on the problem. Without that, it’s a nightmare trying to organize everyone.
The Confluence integration is also huge. We need an artifact that lives on afterwards, and Rootly automatically generates a Confluence page for you. The second you click, boom, here it is. Then we hold a short meeting—thirty to forty-five minutes—where we walk through that doc, ask the five whys, and the incident commander ensures the timeline is accurate.
The cost of getting people involved is lower now because we have escalation points and clear roles. It gives me peace of mind to focus on fixing the problem while others know what they need to do. Fewer man-hours go into each incident because we're using Rootly.
On top of that, the time between incident declaration, resolution, and postmortem is much smaller. We can focus on action items while the context is fresh in our minds. That means once the incident is resolved, we can jump straight into implementing the fixes.
It used to take many hours. Sure, you can just spin up a Confluence doc, but then you have to edit it for accuracy, check grammar, and make sure the audience can read it. Now, it’s around one to two hours total to get an incident postmortem out and shared with stakeholders.
There’s definitely better team morale. Nobody has to worry about being the scribe anymore. We can just communicate in Slack and trust that Rootly will synthesize everything. When you’re in the middle of an incident, you don’t have time to perfectly document each step. And after the incident, you don’t have to deal with a bunch of tedious tasks.
There used to be a huge checklist in a document. It’s not fun. But with Rootly, it just works every time. We don’t spend time fighting the tool or reconfiguring it. It does what we need, and that’s exactly what we were looking for.