The Suez Canal has been big news over the last couple of weeks. We wondered how a Site Reliability Engineer (SRE) might conduct a postmortem on what happened with the Ever Given, and what that might mean if a comparable incident occurred at a modern tech company.


By now, almost everyone who pays attention to news or social media is aware of the Ever Given, and how it blocked the Suez Canal on March 23, 2021. We thought it would be interesting to take a look at this incident from an SRE perspective, and discuss postmortem best practices as viewed through from that perspective.

Before we dive in though, it's worth talking a little about Ever Given and the Suez Canal for a bit of context and scope.

The Ever Given

The Ever Given belongs to a class of ship called Golden-class container ships. It is one of eleven of this class of ship that's been built to date. Golden-class ships are considered some of the largest ships in the world, and are truly massive vessels.

Ever Given and her sister ships are approximately 400 meters in length, or about 1,312 feet--almost one-quarter mile long. They are about 59 meters (193 feet) wide.

For comparison, the Statue of Liberty is about 305 feet tall; the Eiffel Tower is 984 feet tall; and the Empire State Building (depending on how you measure it…) is between 1,250 feet and 1,454 tall. If you were to stand Ever Given and the Empire State Building upright next to each other, they would stand almost the same height.

The Titanic, arguably one of the most infamous ships that ever existed, measured in at only 269 meters (882 feet) long, and 28 meters (93 feet) wide. So in terms of length, Titanic was approximately two-thirds the length of Ever Given, and slightly less than half as wide.

To give you some perspective on just how large this class of ship is, here is an image of one of her sister ships, the Ever Glory. Each of those green boxes you see is a shipping container.

Shipping containers come in various sizes, but the standard unit of measurement for such containers is called a TEU, which stands for "Twenty-foot Equivalent Unit". Meaning, their base size is 20 feet in length.

Ever Given has a capacity of approximately 20,124 TEU. That's a lot of containers and a lot of weight these ships can carry.

The Suez Canal

The Suez Canal connects the Mediterranean Sea and the Red Sea. It provides a key route for ships of all types travelling from Asia and the Middle East to Europe. Another available route, around the tip of Africa, takes weeks longer than transiting through the canal.

It's said that an average of 50 vessels transit the canal a day. This amounted in 2020 to approximately 18,500 vessels. The Suez accounts for approximately 12% of global trade transiting through it.

Opened in 1869, the Suez Canal has been enlarged over time to its present length of 193 kilometers (120 miles) long. It is approximately 205 meters (673 feet) wide, and 24 meters (79 feet) deep. 

At the time when the canal was originally constructed, ship building technology was significantly different than it is now. As time has gone on, the Suez was enlarged to its present dimensions to accommodate these newer--much larger--classes of ships. A second channel was added along part of the canal to allow for two-way traffic through that portion.

Even with these improvements, the canal is legacy infrastructure badly in need of more updates to keep up with current ship technology and global trade demands.

The Incident

A lot of what happened on March 23, 2021 is still under investigation. What seems to be the general agreement at the time is that a sandstorm with high winds up to 46 mph blew the ship off course and it became lodged sideways in the canal, completely blocking it.

Considering that the canal is only 205 meters wide, and the Ever Given is approximately 400 meters long, it's not hard to imagine that a ship of this size blown off course could get wedged into the constricted space of the canal if it were to drift sideways at an angle.

As the ship became grounded at a point in the canal that did not allow for two-way travel, the Suez was effectively blocked. Over the course of the six days the canal was blocked, hundreds of vessels were delayed or had to be rerouted.

Due to the enormous weight and size of the ship, a massive effort was required to free her from the banks of the canal. Dredging equipment, tug boats, and pumps to help redistribute ballast and fuel inside the ship were all required.

On March 29, 2021, Ever Given was refloated and towed to a lake in the middle of the Suez for inspection and investigation. Fortunately, there was no loss of life. So far no major damage to the ship or the canal has been reported. Global trade through the Suez has resumed.

The Postmortem

We are neither Maritime nor Civil Engineers, so we're going to keep the focus to how one might conduct a postmortem of this incident if it were done by an SRE team at a modern tech company. Since all of the information about the incident still hasn't been presented, we'll also limit scope to what is currently available and keep things simple for now.

Best Practices

First, we'd start with an up front statement that our investigation would be conducted as a blameless postmortem. This means the people involved would all be brought together with an understanding there would be no blaming, shaming, or recriminations for presenting the information they have on the incident.

Second, we'd use the "five whys" to analyze the incident to try and determine a root cause for what happened. We'll demonstrate how this technique might be used shortly.

Third, a thorough analysis of the incident would be compiled into a postmortem document. This document would be made available to all interested parties to serve as a learning experience, and provide valuable information on how to prevent similar problems in the future.

Fourth, once all the information has been compiled into our postmortem document, we'd make suggestions for remediations--along with associated task tickets--on how similar incidents could be avoided in the future. Team members would be assigned as the responsible person for executing each ticket and due dates would be established.

Finally, regular follow ups would be conducted to track progress of the tickets until the work is completed.

The Analysis

Based on what we know at present, here's a simplified version of how our postmortem might go.

1. What was the expected outcome before the incident happened?

  • The ship would transit safely through the canal.

2. What actually happened?

  • The ship became stuck in the Suez and the canal was blocked and other ships were blocked from moving through the canal.

3. Why did it become stuck?

  • It appears high winds during a sandstorm may have blown her off course, causing her to veer sideways until she became wedged between the banks of the canal.

4. Why did this happen?

  • The Suez canal is a constricted space and Ever Given is a really big ship. It was probably difficult to navigate in such a tight space with the wind blowing it off course.

5. Some additional whys…

  • It doesn't appear there was an escort vessel such as one or more tug boats to assist the Ever Given in navigating the canal
  • There may have been crew and/or canal pilot training issues needing to be addressed.
  • As already established, the Ever Given is a huge vessel transiting through a canal that first opened over 150 years ago. Design requirements for the canal were significantly different at the time. It would have been hard at the time to anticipate how cargo vessels would be constructed in the 21st century.

6. How can we prevent similar incidents in the future?

  • Short term, perhaps tug boats should accompany large vessels over a certain size to help keep them from drifting off course in the future; ensuring they reach their destination safely. This is common practice in many harbors and ports.
  • Review the ship crew and pilot training to ensure they are properly equipped to deal with adverse situations when navigating the canal.
  • Long term, further efforts to modernize the canal would probably be necessary to better accommodate more modern and larger shipping vessels.

As an SRE, there is probably one last thing we'd want to do. One of the key tenets of Site Reliability Engineering is to assume that failure is normal.

We also know this isn't the first time that this sort of problem has happened. With that in mind, we'd want to make better preparations for dealing with similar incidents in the future.

Preparations of this type would include making an analysis of all past failures, and reducing each down to a root cause. Finding commonalities between each would allow formulation of better responses when another incident occurs. Doing so should help reduce the time to respond and time to resolve each incident.

Conclusion

It will be interesting to see what happens in the coming months once investigations on Ever Given and the Suez Canal have been conducted. We'd guess that this will cause the global shipping and transit industries to rethink infrastructure deficiencies and hopefully begin modernization efforts.

Along with these efforts, there will most likely be a great deal of thought into how to more quickly recover from these types of incidents in the future, and better ways to work around them when they do.

You can coordinate response to your own Suez Canal incidents or postmortem analysis with Rootly. Try it free today or book a demo!