Valued PEAK Customer,
I would like to take a moment to describe the events that lead up to an Internet connectivity outage, on Monday, March 14, 2011 at 1:35pm PDT and the steps we have taken to prevent this type of outage from happening in the future.
The outage was caused by a peering partner at the Northwest Access Exchange (NWAX) in Portland sending the full Internet routing table to PEAK while this peer tested a network routing configuration. Even though PEAK uses multiple up-stream Internet providers over different paths, the NWAX peer sending full routes became the preferred path for traffic leaving the PEAK network.
During the outage, customers trying to call the PEAK support center experienced fast busy signals. This was caused by the high volume of calls, which exceeded the maximum number of concurrent calls allowed on our telephone circuits.
What is NWAX?
NWAX is the Northwest Access Exchange located at the Pittock building in Portland, Oregon. Providers use this facility to peer with other to exchange network routing information with each other’s networks. This keeps local traffic local, which allows for quick access to local resources. An example of a NWAX peer is the State of Oregon Department of Administrative Services (DAS). By peering with DAS, PEAK’s customers become directly connected to the DAS network, thus providing the quickest access to statewide government systems.
PEAK currently peers with around 20 regional providers at NWAX.
Border Gateway Protocol (BGP) is the core network routing protocol of the Internet. BGP maintains a table of IP networks or prefixes reachable throughout the Internet. BGP is used to exchange Internet routing information with NWAX peers.
What we did wrong.
PEAK did not take a defensive approach when configuring BGP with NWAX peers by not filtering allowed prefixes received by BGP. As a result, the NWAX peer’s misconfigured router was able to exchange the full Internet routing table with PEAK. This exceptional situation caused all traffic leaving PEAK’s network to transit through the NWAX peer, which effectively dropped all traffic. This situation is known as a routing black hole.
What are we doing to fix it?
Effective immediately, PEAK’s BGP configuration policy will filter allowed prefixes received for all peers. By filtering these updates, full Internet routing table will not be able to be exchanged with PEAK, and the black hole routing situation will be prevented.
In addition, we are currently evaluating our options for increasing the maximum number of concurrent calls allowed on our telephone circuits to prevent unavailability when calling the PEAK support center in the future.
We sincerely apologize for the interruption of service this outage caused you. We take great pride in the reliability of our infrastructure. You can be confident that we will continue to work on improving our infrastructure and procedures to ensure highly available service delivery to our valued customers.
If you have specific questions or concerns, please do not hesitate to contact me directly.
Chief Technology Officer