What is Wrong with Facebook
By
Ega Wahyudi
—
Friday, April 10, 2020
—
What's Wrong With Facebook
What Is Wrong With Facebook
The crucial defect that triggered this failure to be so extreme was a regrettable handling of an error condition. An automated system for validating configuration values ended up triggering much more damages than it dealt with.
The intent of the computerized system is to look for configuration worths that are invalid in the cache and also replace them with upgraded worths from the relentless store. This works well for a short-term issue with the cache, yet it does not function when the relentless store is void.
Today we made a change to the persistent copy of a setup value that was taken invalid. This meant that every single client saw the invalid value and attempted to repair it. Due to the fact that the fix includes making a question to a cluster of databases, that cluster was quickly overwhelmed by thousands of hundreds of inquiries a second.
To make matters worse, every single time a client obtained a mistake trying to inquire among the databases it analyzed it as an invalid worth, as well as removed the matching cache trick. This suggested that also after the original problem had been fixed, the stream of inquiries continued. As long as the data sources stopped working to service a few of the requests, they were creating a lot more requests to themselves. We had entered a feedback loop that didn't enable the data sources to recoup.
The way to stop the feedback cycle was rather agonizing - we needed to quit all website traffic to this data source collection, which indicated turning off the site. Once the databases had recovered and the origin had been taken care of, we gradually allowed more individuals back onto the website.
This obtained the site back up as well as running today, and also for now we have actually turned off the system that tries to deal with configuration worths. We're checking out brand-new layouts for this setup system following layout patterns of various other systems at Facebook that deal even more beautifully with comments loopholes and transient spikes.
We apologize again for the site blackout, and we desire you to recognize that we take the efficiency as well as reliability of Facebook extremely seriously.