This started off as a note to go with sharing GoDaddy’s statement on their outage yesterday on Facebook. It grew. It’s a bit geeky, so I shan’t be terribly offended if both of my readers skip over it.
The only political angle to this is that the anarchist hackers at Anonymous
claimed credit accepted blame for it. But it appears someone was just capitalizing on events that they had nothing to do with.
GoDaddy says that they were not hacked, nor was there a DDoS (Distributed Denial of Service) attack. There was ”a series of internal network events that corrupted router data tables.”
Reading between the lines, (and having witnessed many stupid things while working at various networking companies) (well, maybe I was a participant rather than a witness once or twice) I can easily believe this.
I suspect we won’t hear the details because it’s embarrassing to say something like “The new guy put a default route in the routing table instead of an access list so it sucked all of the traffic from the neighboring routers into itself, then sent that traffic back to the neighbors, which sent it back to them. But before they got overloaded they advertised that default route to THEIR neighbors, so that all of their neighbors sent their traffic towards the first router, which valiantly tried to send the increasing load back out. Meanwhile the second ring of routers told all of THEIR neighbors about this neat default route to anywhere on the Internet, which THEY promptly passed on to all of the routers they had for neighbors as they began happily sending their data towards the one poor overloaded router at the center of things where the problem began. If you’re lucky the spread stops at the edge of your routing domain. If you REALLY screwed the pooch it gets into the global routing tables. This will cause other network operators to ask embarrassing questions.
There’s a reason we call a place that traffic can’t escape from a “Black Hole.” it works very much like the collapsed star. And like the stellar black hole, the thing at the center gets crushed, often so badly that you can’t log into it to issue commands to correct the problem. I recall one event where the only way to correct it was to physically pull the power connectors on a router.
Of course I don’t know that it was a black hole, though it fits. Nor does a black hole require someone making a mistake at that moment. Another way is for someone else to make a configuration change to a router, but not save it to permanent memory. The next time that router is rebooted the changes all vanish. Depending on the nature of the changes to the configuration the results can be quite bad.
If you cause a black hole at 5 am, fess up at once, and everything is fixed in 20 minutes you may survive at that employer, though you wear a virtual Cone of Shame till someone else steals the spotlight. (Don’t ask how I know.) Causing a big public hours long outage during the business day may be somewhat harder to survive professionally.