Something Went Wrong Facebook
By
Ega Wahyudi
—
Friday, September 20, 2019
—
What's Wrong With Facebook
Something Went Wrong Facebook
The vital flaw that triggered this blackout to be so extreme was an unfortunate handling of an error problem. An automatic system for verifying setup worths wound up causing far more damage than it fixed.
The intent of the computerized system is to look for setup worths that are invalid in the cache and change them with updated worths from the persistent store. This works well for a transient issue with the cache, but it doesn't function when the consistent store is invalid.
Today we made a change to the persistent copy of an arrangement value that was interpreted as invalid. This indicated that each and every single client saw the void value and also attempted to repair it. Since the fix involves making an inquiry to a collection of databases, that cluster was swiftly bewildered by hundreds of hundreds of queries a second.
To make issues worse, every time a customer obtained a mistake attempting to inquire one of the databases it analyzed it as an invalid value, and deleted the matching cache trick. This meant that even after the initial problem had been repaired, the stream of questions proceeded. As long as the data sources fell short to service some of the requests, they were creating much more demands to themselves. We had actually gone into a comments loop that didn't enable the databases to recoup.
The way to stop the comments cycle was rather painful - we had to quit all traffic to this database collection, which suggested shutting off the site. When the data sources had recovered and the source had been fixed, we slowly allowed even more people back onto the website.
This obtained the site back up as well as running today, as well as for now we've shut off the system that tries to fix setup values. We're exploring new styles for this setup system complying with design patterns of other systems at Facebook that deal more with dignity with responses loops as well as short-term spikes.
We apologize once more for the website outage, as well as we want you to recognize that we take the efficiency as well as integrity of Facebook extremely seriously.