radevita at mejeticks.com
Wed Dec 30 01:17:21 UTC 2020
AT&T Disaster Recovery Team is probably the best in the business. The resources they can bring to the table are unmatched. This would have been 100x worse if it hit a carrier neutral datacenter. They don’t have nearly the same resources to restore something like this. They usually do a road show (pre Covid). If you get a chance it’s definitely something you should go check out. Very impressive.
Founder & CEO
e. radevita at mejeticks.com
From: NANOG <nanog-bounces+radevita=mejeticks.com at nanog.org> on behalf of Eric Kuhnke <eric.kuhnke at gmail.com>
Sent: Tuesday, December 29, 2020 5:06:00 PM
To: Sean Donelan <sean at donelan.com>
Cc: NANOG <nanog at nanog.org>
Subject: Re: Nashville
>From a few days ago. Obviously centralizing lots of ss7/pstn stuff all in one place has a long recovery time when it's physically damaged. Something to think about for entities that own and operate traditional telco COs and their plans for disaster recovery.
Here is the latest update: 6:46AM 12/27:
Work continues restoring service to the CRS routers in the Nashville Central Office. One router remains out of service and the other is in service with some links remaining out of service.
The working bridge will reconvene at 08:00 CT with the following action plan:
Additional cabling added to the first portable generator to enable full load capabilities (08:00 CT)
Pigtails with camlocks installed for easy swap; investigate possibility to land generator on the emergency service board to give the site N+1 with a manual ability to choose anyone. (08:00 CT)
check small power plants on floors 4 and 6 (08:00 CT)
Investigate water damage on 1st floor and energize if safe (08:00 CT
Air handlers for floors 4,5 and 6 (09:00 CT)
complete all transport work
Turn up SS7
Turn up 911 service - Approximately noon or after)
Turn up switching service.
TDM Switching team will reconvene at 09:00 CT and the Signaling team will reconvene at 11:00 CT on 12/27/2020.
DMS equipment on the 1st floor will be assessed for water damage. Switching teams will monitor power and HVAC restoration and will begin switch restoration as soon as the go ahead is provided by the power team.
1. 4th & 5th floors (Specify transport equipment needed to clear MTSO SS7 isolation & Datakit needed for Local Switch restoration). Transport SMEs currently working to turn up transport equipment
2. 6th floor (ESINET Groomers)
3. 10th and 8th floors (N4E) – Trunks
4. 1st floor (DMS: DS1, 5E: DS3) - Local POTS
5. 1st floor (DMS: DS0, DS2 | 5E: DS6) – Trunks
6. 11th floor (DMS: 01T) – Trunks
7. 4th floor (STP and SCP with mates up in Donelson)
The next update will be issued at approximately 09:00 CT on December 27.
As of 09:00 CT: Teams worked through the night to restore service and improve conditions at the Nashville 2nd Ave Central Office. Since the initial service impact, over 75% of the Out of Service Mobility Sites have been restored. Certain call flows may be limited and should improve as additional restoration activities complete.
The generator that is currently powering equipment on the 2nd and 3rd floor, was refueled and ran with no issues through the night. Overnight, the batteries connected to it, continued to charge. Teams have placed additional power cables, which once connected, will allow the working generator, to better handle the load in the building. In order to accomplish this, the generator will need to be shut down for 15-30 minutes this morning, so teams can connect the new cables to the system. The power team reports they are still on target to restore power and cooling to the 5th and 6th floor by approximately 12:00 CT. Also, a portable chiller will be delivered this morning and strategically placed, in case it is needed to assist in cooling the office.
There is a Call Center at 333 Commerce, in Nashville that does not have network or phone services available. Corporate Real Estate (CRE) reports there is some damage to that office, but the extent of the damage will not be known until they can gain access to the site. Because of this, the impacted Call Center ceased operations until further notice.
DMS switching equipment on the 1st floor will be assessed for water damage. Switching teams will monitor power and HVAC restoration. Equipment power ups will begin, as soon as the go ahead is provided by the power team.
Two SatCOLTs remain positioned on the East and West sides of the NSVLTNMT Central Office providing critical communication for teams working restoration efforts. There are 17 assets deployed in the field- 15 are on air (the 2 at the CO and 13 supporting FN Customer Requests) and 2 are in hot-standby for FN Customers where macro service recently recovered. There is 1 asset staged at a deployment site in KY where macro service restored, and 8 additional assets are on route to Nashville today to fulfill pending FN Customer requests. Incoming requests continue to be triaged. The ones in areas where service looks to have been restored, are being held, while the others are being prioritized to be dispatched upon.
The next update will be issued at approximately 14:00 CT, unless there is a significant change in status.
AT&T Nashville update below, received at 3:35PM 12/27.
Since the initial service impact, over 95% of the Out of Service Mobility Sites have been restored. Certain call flows may be limited and should improve as additional restoration activities complete.
Electricians have installed the additional power cables from the generator, to the emergency bus. These new cables will allow the generator to support more of the load, of the building. The portable chiller requested, has arrived on-site, and is available to assist in cooling, if needed. Generally speaking, there are four (4) phases of restoration per floor (Air Handler restoral, Power restoral, Transport Equipment restoral, and Switch/Application Equipment restoral). Teams report that Air Handlers are up and running, and all power plants are on floors 2 through 7 are online. Given significant progress made, floors 2 through 7, are ready for technology turn up. Relative to Priority Transport related equipment, approximately 90% of the elements have been turned up on floors 2 through 7. The Power team is currently working on Floors 8 through 11 (N4E). The first floor is not accessible, at this time. Once access is granted by federal and local authorities, further assessment and restoration efforts will begin.
The generator is currently supporting approximately 50% of its capacity, and alternative plans are being considered to handle the full load of the building. Teams continue to work proactively in effort to identify potential issues and are actively engaged working to restore services and repair infrastructure.
AT&T Network Disaster Recovery (NDR) has eleven (11) SatCOLTs in service (TN, AL, GA). Two (2) of the eleven (11) are deployed at the Nashville, TN Central Office to provide coverage for the AT&T response teams as well as FirstNet (FN) customers. One (1) COLT is in hot-standby (TN). Six (6) COLTs are en-route to deployment sites in TN and AL. Three (3) COLTs are being demobilized in Alabama and coming back to Nashville for new assignments and five (5) additional COLTs are en-route to the Nashville area to support additional requests.
The next update will be issued at approximately 19:00 CT, unless there is a significant change in status.
On Mon, Dec 28, 2020, 5:59 PM Sean Donelan < sean at donelan.com <mailto:sean at donelan.com> > wrote:
AT&T statement says nearly all services have been restore in Nashville as
of Monday, 5pm CST
They are working on permanent repairs.
AT&T's Network Disaster Recovery group faces management questions nearly
every year to justifying their budget. While no one wants disasters,
business continuity has to be part of the business. There are also mutual
aid agreements between companies, but I don't know how many were invoked
for this incident.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NANOG