Facebook Location Wrong
By
Ega Wahyudi
—
Monday, December 30, 2019
—
What's Wrong With Facebook
Facebook Location Wrong
The crucial problem that triggered this blackout to be so serious was an unfavorable handling of an error problem. An automatic system for validating configuration values ended up triggering far more damages than it repaired.
The intent of the automatic system is to look for configuration worths that are invalid in the cache and also replace them with updated values from the consistent shop. This functions well for a short-term issue with the cache, yet it doesn't work when the consistent store is void.
Today we made a change to the relentless duplicate of a setup value that was interpreted as invalid. This indicated that each and every single customer saw the void worth as well as tried to repair it. Since the fix entails making an inquiry to a cluster of data sources, that cluster was rapidly bewildered by thousands of countless questions a second.
To make issues worse, every time a customer obtained a mistake attempting to inquire one of the databases it interpreted it as an invalid worth, and also removed the equivalent cache secret. This implied that even after the original problem had been repaired, the stream of questions continued. As long as the databases failed to service some of the requests, they were creating much more demands to themselves. We had gotten in a comments loophole that didn't permit the databases to recuperate.
The way to quit the feedback cycle was rather uncomfortable - we needed to stop all web traffic to this database cluster, which suggested shutting off the website. As soon as the data sources had recuperated as well as the root cause had been repaired, we gradually permitted even more people back onto the website.
This obtained the website back up and also running today, and in the meantime we've shut off the system that tries to remedy arrangement values. We're checking out new layouts for this setup system following style patterns of various other systems at Facebook that deal even more beautifully with responses loops and also transient spikes.
We apologize once again for the site blackout, and also we want you to recognize that we take the efficiency and reliability of Facebook very seriously.