As you may have been well aware, yesterday, October 4, at around 11:40 A.M. EST, Facebook and its constituent platforms began exhibiting connectivity issues. What seemed to be localized blackouts for select users was soon revealed to be a complete outage that affected all Facebook, Messenger, Instagram, WhatsApp, and Oculus users worldwide.

Though network outages are common, the sudden “blackout” of Facebook and all of its services caused quite a stir — a little more than the 3.5 billion people who use Facebook and its owned apps were significantly impacted as the abrupt stop caused significant hindrances in their day-to-day activities.

It’s normal to experience outages during routine network maintenance and service, but for an outage to be of such scale and with no immediate statement does cause a bit of restlessness. Businesses, schools, firms, and even governments make use of Facebook and its family of platforms, so to have all of those down at once presents a massive roadblock to operations.

So, what caused it then? It was all because of a single mistake — one caused by the company’s network engineers.

Following a report by “9 to 5 Mac”, Facebook released this statement in a blog post:

“Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.”

A Tricky Workaround

Specific technical elements are involved in understanding why the problem occurred and why it took so long to fix. Though exact details are unknown, the preliminary information we have currently access to dictates that it was most likely a mix of issues involving domain name server (DNS) and border gateway protocol (BGP) configuration.

According to “9 to 5 Mac,” the way these network protocols work is comparable to how air traffic control works. To reach a destination, a plane would require GPS coordinates. The IP address serves as the target’s ‘airport,’ while the DNS provides the IP address with the coordinates. The BGP provides air traffic control functions by providing the plane (your device) with the optimum ‘route’ to the desired IP address.

The issue was more minor on DNS and more on BGP. Facebook lost complete control of its BGP configurations, thus eliminating access to network tools and servers that usually allowed for remote servicing. This meant that employees had to gain physical access to data centers and receive step-by-step instructions from senior engineers. However, barring them from the final objective, the outage also completely locked them out from the access doors. They must have gotten to it somehow, though details on that might be of less importance. We are sure that it must have been tough and it’s good that they were able to get things back online.

Beneath The Blackout

While server outages don’t typically severely impact users, closely looking at an incident of this magnitude brings to light a couple of revelations. If you’re unable to access your Facebook or Instagram newsfeeds for a few hours, chances are the most you’ve lost out on several pet videos and motivational fitness content. However, simultaneously losing access to newsfeeds and significant messaging platforms (Messenger and WhatsApp), which have become critical communications tools, causes major disruptions to industries that have built entire methodologies around them.

And we’re only talking about the Facebook fam here. Consider a comparable scenario employing a larger domain, such as the Google Search Engine, and consider the ramifications of a ‘complete’ outage. Google is used by millions. Individuals, companies, and entire markets have been affected, and it has permeated economics to the point where industries have integrated it into their framework – they have constructed important junctures and infrastructure around Google.

What we must never forget is the fact that the reality depicted online exists. What we can do on the internet we can do in real life; it’s just more complicated and more inconvenient. We say this because everyone should be aware that the entire world is linked and is dependent on a relatively small number of critical servers. If they were, for some reason, to go offline, whether by mistake or mishap, how far would the damage go? Along with improving stability and upgrading capacities, more effort should also be put into back-ups and fail-safes to limit the amount of disruption and loss an outage causes.

Subscribe to our ‘Bottoms Up!’ Newsletter. Get the latest social media news, strategies, updates and trends to take your business to the highest level.


Sources

https://9to5mac.com/2021/10/05/facebook-outage-cause-mistake/

https://www.nytimes.com/2021/10/04/technology/facebook-down.html