Internet Service Updates, 3-14-2011

Valued PEAK Customer,

I would like to take a moment to describe the events that lead up to an Internet connectivity outage, on Monday, March 14, 2011 at 1:35pm PDT and the steps we have taken to prevent this type of outage from happening in the future.

The outage was caused by a peering partner at the Northwest Access Exchange (NWAX) in Portland sending the full Internet routing table to PEAK while this peer tested a network routing configuration. Even though PEAK uses multiple up-stream Internet providers over different paths, the NWAX peer sending full routes became the preferred path for traffic leaving the PEAK network.

During the outage, customers trying to call the PEAK support center experienced fast busy signals. This was caused by the high volume of calls, which exceeded the maximum number of concurrent calls allowed on our telephone circuits.

What is NWAX?

NWAX is the Northwest Access Exchange located at the Pittock building in Portland, Oregon. Providers use this facility to peer with other to exchange network routing information with each other’s networks. This keeps local traffic local, which allows for quick access to local resources. An example of a NWAX peer is the State of Oregon Department of Administrative Services (DAS). By peering with DAS, PEAK’s customers become directly connected to the DAS network, thus providing the quickest access to statewide government systems.

PEAK currently peers with around 20 regional providers at NWAX.

Border Gateway Protocol (BGP) is the core network routing protocol of the Internet. BGP maintains a table of IP networks or prefixes reachable throughout the Internet. BGP is used to exchange Internet routing information with NWAX peers.

What we did wrong.

PEAK did not take a defensive approach when configuring BGP with NWAX peers by not filtering allowed prefixes received by BGP. As a result, the NWAX peer’s misconfigured router was able to exchange the full Internet routing table with PEAK. This exceptional situation caused all traffic leaving PEAK’s network to transit through the NWAX peer, which effectively dropped all traffic. This situation is known as a routing black hole.

What are we doing to fix it?

Effective immediately, PEAK’s BGP configuration policy will filter allowed prefixes received for all peers. By filtering these updates, full Internet routing table will not be able to be exchanged with PEAK, and the black hole routing situation will be prevented.

In addition, we are currently evaluating our options for increasing the maximum number of concurrent calls allowed on our telephone circuits to prevent unavailability when calling the PEAK support center in the future.

We sincerely apologize for the interruption of service this outage caused you. We take great pride in the reliability of our infrastructure. You can be confident that we will continue to work on improving our infrastructure and procedures to ensure highly available service delivery to our valued customers.

If you have specific questions or concerns, please do not hesitate to contact me directly.

Sincerely,

David Placko
Chief Technology Officer
PEAK Internet
david.placko@peakinternet.com

IPv6 set to take stage as Internet’s new addressing protocol

When the Internet and its underlying protocols were devised over 30 years ago, no one ever imagined it would become as integral and as widely utilized as it is today.  The success of this worldwide network was created in part due to an explosion of information and its accessibility from an increasing amount of Internet-enabled devices, including not only PC’s and cell phones, but also automobiles and even home appliances.


All this integration of technology and Internet presents a brave new frontier, but it has one crucial caveat — each device needs its own IP address to connect and identify itself to the network. As a result, there will soon not be enough addresses to go around as the Internet’s current network protocol, “Internet Protocol version 4 (or IPv4)”, is on pace to exhaust its available addresses early in 2011.  When that happens, all of the addresses will have been allocated to the ISPs of the world, and they will no longer be able to get additional addresses to hand out to new users. Eventually (over the next few years) the ISPs will themselves run out, and no longer be able to fulfill their own customer requests for new addresses.

Continue Reading »

PEAK builds Linn County Fair & Expo Center Wi-Fi

Users of the Linn County Fair and Expo center will now be able to connect reliably to a new wireless network, recently completed by PEAK Internet. The upgraded network replaces an aging system and is designed to support the growing demand of internet access from various devices at events and expos within the facility.

The Linn County Fair and Expo center is a hub for mid-Willamette Valley activity. Events like the Linn County Fair, the Willamette Valley Ag Expo and Linn County Home Show are held at the Fair and Expo center and draw thousands of attendees throughout the year. At these events, Internet access is an increasingly imperative amenity not only for attendees, but also exhibitors that are now expecting access to online resources for transactions and information.

PEAK Internet was awarded the contract to build an improved wireless network capable of meeting these requirements – quickly and affordably. In mid-November the first phase new network was debuted for the 2010 Willamette Valley Ag Expo, providing wireless internet access for all attendees. At the event, hundreds of users and around 60 vendors accessed the network with improved speeds and without incident.

The first phase of the project covers nearly 100% of the usable indoor facilities and is specified to cover a set of designated “hot spot” areas outdoors and in the parking lot. Completion for this second phase is expected in January 2011.

PEAK is a proud supporter of Linn County Fair and Expo center and this network was made possible, in part, by PEAK’s sponsorship of the network build. More than 25 access points and several network switches were part of the equipment deployment. As part of the agreement, PEAK will be monitoring the network 24/7 and maintaining the equipment as needs arise.

View coverage of the new Wi-Fi network debut at the Willamette Valley Ag Expo (Democrat Herald): Article – 11/25/2010

Data Center UPS Upgrade

Dear Valued PEAK Customer,

I would like to take a moment to describe the sequence of events that lead up to the complete data center outage, which occurred on Saturday, October 9th at 3:15am. This outage affected all PEAK customer services.

During planned maintenance to the Uninterruptible Power Supply (UPS) we experienced an unplanned electrical interruption to the PEAK data center (DC). For more information regarding the project, see the PEAK blog at:
http://blog.peakinternet.com/monthly-newsletter/the-anatomy-of-a-data-center-ups-upgrade/

Part of our plan was to ensure continuous operation during the upgrade. To accomplish this, we started our back-up generator to supply continuous power to the DC in the event of interruption of city power. Second, we installed a temporary back-feed electrical circuit between the electrical panel feed by the generator and the electrical panel normally fed by the UPS, which feeds all equipment in the DC.

The electrical system of the DC is a three-phase system, which runs at 208 volts. During the weeks leading up to the upgrade, engineers moved workloads to get the power load at the time of the upgrade to less than 80 amps per phase to be in safe tolerance with the circuit breakers. At the time of the upgrade loads per phase were: A:78 amps B:80 amps C:79 amps.

OLD UPS

OLD UPS

Electricians installed a circuit capable of supplying 100 amps of load between the generator panel and UPS panel. We used 100 amp three-phase breakers on both sides of the temporary back-feed circuit.

At 2:30 AM, the stand-by generator was started and electrical load was transferred from CITY to GENERATOR power.

At 3:00 AM, we initiated procedures to shutdown the old Liebert UPS, and put the UPS in maintenance-bypass mode.  This allowed electrical power to flow from the generator panel through the UPS to the UPS sub-panel. At this time, we closed (turned on) the temporary back-feed breakers, which put the UPS in parallel with the temporary back-feed circuit. Next we completely shut down the UPS. This transferred the load in the DC to the temporary back-feed circuit. We amp probed the back-feed circuit and confirmed loads at A:78 amps B:80 amps C:79 amps. The electricians felt comfortable that this circuit would hold the load properly.

PEAK engineers, equipment movers and electricians began the process of safely disconnecting and removing the existing UPS from the DC, while the generator supplied power.

OLD UPS

OLD UPS

Around 3:15 AM, for an un-known reason, the 100-amp breaker on the generator side of the back-feed circuit tripped, causing complete electrical loss of power to the DC. At this time, the decision was made to shut down the electrical sub-panel supplying power to the DC co-location area, which would remove around 50 amps from our workload.

Around 3:20 AM, we re-closed the tripped breaker and power was restored to the DC.

There were cascading failures caused by the sudden unplanned loss of electrical power to the DC.  Our infrastructure relies on hundreds of devices which all work together to provide Internet and information technology services to our customers.  The procedures to re-start these systems are tedious, time-consuming, and must be done in a specific order. Engineers immediately and swiftly initiated this re-start procedure.

The most significant failure during the re-start was our network switching/routing core, which runs on Juniper EX4200 switches. Two of the four switches did not re-start properly and required a re-load of the operating system, which runs on the device.

In addition, a Network Appliance Filer for data storage did not re-start. The controller for this device completely failed and a replacement has been procured, which will arrive on Tuesday. Most of the data that runs on the filer was moved to alternate servers, but there were a few services, which rely on this server. The most significant service is customer personal web space.

At 5:00 AM, the existing UPS was completely removed from the DC and the process of bringing in the new UPS begun.

APC TECHNICIAN

ELECTRICIAN

Around 6:00 AM, the factory service technician from APC arrived on-site. The new UPS is a modular 3 cabinet system consisting of an inverter cabinet, battery cabinet and distribution cabinet. Electricians and APC started the process of cabling the cabinets and connecting incoming and outgoing power to the distribution cabinet.

At 9:00 AM, after APC completed start-up checklist procedures, power was supplied to the UPS to confirm proper operation and cabling.

At 9:15 AM, the new UPS was put in maintenance bypass mode, which put the new UPS in parallel with the temporary back-feed circuit. At this time, the breakers to the temporary back-feed circuit were opened (turned off) and the new UPS carried the full DC load. After a few minutes, the UPS taken out of maintenance bypass mode and it began normal operation protecting electrical loads in the DC.

At 9:20 AM, the stand-by generator was shut down and incoming electrical load was transferred from the GENERATOR to CITY power.

At 9:35 AM, power to the co-location area was restored.

NEW UPS

NEW UPS

This marked the completion of the UPS upgrade project. Although we experienced a significant service interruption to our customers, which was not planned, the new UPS will provide increased reliability and capacity.

The new UPS is an APC PX80 Symmetra Modular UPS. Some of the features and benefits of this system are:

  • Modular design: Provides fast serviceability and reduced maintenance requirements via self-diagnosing, field-replaceable modules
  • Configurable for N+1 internal redundancy: Provides high availability through redundancy by allowing configuration with one more Power Module than is necessary to support the connected load.
  • Redundant Intelligence Modules: Provides higher availability to the UPS connected loads by giving redundant communication paths to critical UPS functions.
  • Hot-swappable intelligence modules: Ensures clean, uninterrupted power to protected equipment during Intelligence Module replacement.
  • Hot-swappable power modules: Ensures clean, uninterrupted power to protected equipment during Power Module replacement.
  • Hot-swappable batteries: Ensures clean, uninterrupted power to protected equipment while batteries are being replaced
  • Power Modules connected in parallel: Enhances availability by allowing immediate, seamless recovery from isolated module failures.
  • Battery modules connected in parallel: Delivers higher availability through redundant batteries.
  • Automatic internal bypass: Supplies utility power to the connected loads in the event of a UPS overload condition or fault.
  • Automatic restart of loads after UPS shutdown: Automatically starts up the connected equipment upon the return of utility power.
  • Power conditioning: Protects connected loads from surges, spikes, lightning, and other power disturbances.

In our configuration, the UPS is operating at 45% capacity, provides 15 minutes of operation (while the stand-by generator starts) and provides N+3 power module redundancy.

We sincerely apologize for the interruption of service this outage caused you. We take great pride in the reliability of our infrastructure. You can be confident that we will continue to work on improving our infrastructure and procedures to ensure highly available service delivery to our valued customers.

If you have specific questions or concerns, please do not hesitate to contact me directly.

Sincerely,

David Placko

Chief Technology Officer
PEAK Internet
david.placko@peakinternet.com

PEAK provides Wi-Fi access for da Vinci Days festival

It’s a true indicator of our digital dependence when an outdoor festival has a requirement for Wi-Fi access.  But then again, when you consider the theme for da Vinci Days – “Oregon’s premier Art, Science, and Technology festival,“ it starts to make sense that undertaking an innovative approach to providing Wi-Fi access in a park is a great idea, and also a technological feat worthy of display.

The rooftop of Callahan Hall was used as a transmission point to reach the festival grounds and the main PEAK office

The rooftop of Callahan Hall was used as a transmission point to reach the festival grounds and the main PEAK office

Enter PEAK Internet’s team of industrious and resourceful wireless engineers to answer the call and build a custom network that spans several city blocks to connect the festival grounds with PEAK Internet’s Corvallis office.  Custom is the keyword; this connection is chained together using different radio frequencies and several different pieces of equipment.  In the end, providing a reliable service that hundreds of users can access throughout the festival for productivity or leisure. Here is the summary of the PEAK project provide Wi-Fi service for the da Vinci Days Festival:

  • Equipment was staged at four locations to get bandwidth from the PEAK office on Western Blvd. to the festival location adjacent to Oregon State University.
  • The bandwidth was backhauled from PEAK over a 3.65 GHz link at 10 Mbps to a broadcast station on the top of Callahan Hall on the Oregon State University Campus.  From that station, the signal was broadcast over 5 GHz to two field units placed on the festival grounds.  Festival users access the service from two omni-directional antenna’s discretely attached to tents.
  • Over the course of the festival, approximately 320 users accessed the service, drawing over 50 GB of data.
  • There were no reported outages and average speeds for end-users were between 4-8 Mbps.

This is the second year that PEAK has provided pervasive Wi-Fi to the festival, where attendees and exhibitors use the access for various tasks that enrich the festival.  For example, this year many festival attendees participated in the photo safari and leveraged the Wi-Fi to upload pictures to the online service, Flickr.  This activity was in addition to the annually occurring usage, such as GPS mapping and “Green Town” exhibition.

Approximately 320 users accessed the network over the weekend consuming over 50 GB's of data

Approximately 320 users accessed the network over the weekend consuming over 50 GB's of data

This service was done in part as a sponsorship to the festival, but more so to support a local event that has a need for reliable internet access.  PEAK is a proud supporter of the da Vinci Days festival and a technology leader in the community.  This effort was completed in support of this community-driven notion and as a way to display the expertise of PEAK’s engineering staff.

   Next Page »