The Toaster Didn't Trip the Breaker: or, Why the Container Died
Ah, morning tea. My favourite. Plug the kettle in and get it going. Oh! How about some toast to go with that. Plug the toaster in and pop a nice slice of sour dough bread in. My favourite.
💥
Power goes out.
What happened? "The toaster flipped the breaker!" You might say. But, I'm pedantic. No, the toaster did not flipped the breaker. "Was it the kettle, then?" Again, no, not the kettle.
"Oh, come on, there was nothing else plugged in?"
Correct! So that means that the load of the toaster and the kettle blew the breaker. The appliances didn't do it themselves. And one appliance doesn't draw enough power to cause breaker to trip.
Even though we say "the toaster blew the breaker," the toaster doesn't have anything to do with the breaker. The toaster just happened to be the last thing to draw power from the circuit.
Okay, okay, what does all this toaster pedantry have to do with Docker
We make the same linguistic slip up when we talk about processes dying on computers. However, if we diagnose it incorrectly, we'll look in all the wrong places for solutions.
Imagine I said "the toaster blew the breaker" and then we went down a rabbit hole trying to get the toaster to draw less power! We'd end up with a poorly working toaster and a circuit that could still blow at any moment. When the real solution is to use the toaster and the kettle on different circuits.
So, next time you're looking at a process like a Docker container that dies, ask yourself, what symptoms are part of the cause, and what symptoms are correlations.
A correlation may be that you had requests running for a long time. But a long running request doesn't kill a container. The cause may be that the long-running requests were causing timeouts and the container's healthcheck timed out one-too-many times.
Diagnosing long-running requests is important, for sure. But suggesting that the request killed the container is not accurate. The container died because a monitor, like the breaker in an electrical circuit, was tripped.
Similarly, containers can churn because they run out of memory. Is the solution to give the container more memory so it doesn't run out? No! Find the memory problem and solve that. There is a good reason we have safeguards that monitor Docker container health and reboot them when needed.