Tải bản đầy đủ (.pdf) (5 trang)

Networking: A Beginner’s Guide Fifth Edition- P36 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (104.07 KB, 5 trang )

157
Chapter 12
Network Disaster
Recovery
158
Networking: A Beginner’s Guide
N
etwork servers contain vital resources for a company, in the form of information,
knowledge, and invested work product of the company’s employees. If they
were suddenly and permanently deprived of these resources, most companies
would not be able to continue their business uninterrupted and would face losing
millions of dollars, both in the form of lost data and the effects of that loss. Therefore,
establishing a network disaster recovery plan and formulating and implementing the
network’s backup strategy are the two most important jobs in network management.
In this chapter, you learn about the issues that you should address in a disaster
recovery plan, and also about network backup strategies and systems. Before getting
into these topics, however, you should read about the City of Seattle’s disaster recovery
experiences.
Notes from the Field: The City of Seattle
The technical editor of the first through third editions of this book, Tony Ryan, had a
personal experience with network disaster recovery. Tony worked in the IT department
for the City of Seattle. On February 28, 2001, Seattle experienced an earthquake that
caused the city’s disaster recovery plans to be tested. What follows is Tony’s discussion
about Seattle’s disaster recovery operations and how it handled the problems that
occurred in the wake of the earthquake. This is an excellent example of why you need
a disaster recovery plan that encompasses all possible events that could occur during a
disaster.
Notes on the Seattle 2001 Earthquake and Its Disaster Recovery
By Tony Ryan
Seattle has seen some very unusual and attention-grabbing events over the past
few years. Notable among them were the World Trade Organization (WTO)


conference of 1999 and the violent demonstrations that accompanied it, which
were broadcast worldwide on television and the Internet. Also, riots broke out
during Mardi Gras celebrations in 2000. However, nothing compared to the
potential and realized damage wrought by the 6.8 magnitude earthquake that
struck Wednesday, February 28, 2001.
The EOC Situation
The City of Seattle has an Emergency Operations Center, or EOC, which is
activated during any event or crisis that has a potential impact on public safety,
or that might otherwise affect any number of services provided by the city to its
citizens. Sometimes that EOC can be activated ahead of time; for example, for
159
Chapter 12: Network Disaster Recovery
the Y2K event and the anniversary of the WTO demonstrations. Looking at the
preparation made for those events and comparing it to what happens during
unplanned events such as the earthquake helps to illustrate some important
principles about IT disaster recovery and disaster preparedness.
Never Assume
During the preparation for Y2K, members of my staff were asked to augment
the staff normally assigned to support the EOC’s desktop and laptop PCs, and
printers. The staff members who normally support the EOC are from a different IT
organization than ours, and as can be expected, their way of doing things differed
from ours for a number of valid reasons. However, once my staff members had
a chance to look at the EOC’s environment, they were able to share some new
perspectives and methods that were welcomed and adopted by EOC support
staff, and all involved had a new idea of what would be expected to be the
“standard” way of configuring EOC PCs. Examples ranged from hard-coding
certain models of PC network interface cards (NICs) to run better on the switches
in their wiring closet to developing and implementing a base image for all the
laptops to be deployed in the building. The Y2K event, as a result, was lauded
as an example of ideal cooperation between IT groups and excellent preparation

overall. It was a very calm Saturday morning!
Change Management?
Between events, however, there was a great deal of time and opportunity for
things to change. The facility might have been used for other business purposes;
equipment such as laptops might have been loaned out, or customers could
have come in and used the equipment; and other IT groups besides ours might
have assisted the staff and performed alterations to the configurations that went
undocumented or were not communicated to all involved.
The Results
Whatever it was that might have happened remains unknown. What we did
discover following the earthquake was that when customers who normally use
the EOC in emergency situations went to use the equipment, in some cases the
machines did not work as expected. Software could not be loaded on this PC; that
laptop would not connect to the network anymore; some PCs were not the same
or had been swapped for less-powerful processors. Things had changed, and the
result was that some of the emergency work IT professionals such as web support
technicians, had to perform took more time than we had anticipated. Ironically, the
Web played a crucial role in our overall communications “strategy.” The impact
of that equipment not immediately working was not yet evident; however, the
following events illustrate how they might have been.
(Continued)
160
Networking: A Beginner’s Guide
A few minutes after the earthquake struck, several of the downtown buildings
in which Seattle employees worked were evacuated due to fear of structural
damage. No one was injured, and amazingly only two keyboards were broken
throughout all the buildings in which we provide support. But imagine a couple
thousand very frightened and concerned people streaming onto the sidewalks and
streets, flooding cellular telephone networks in frantic attempts to contact loved
ones, and looking for any possible focus for communication—especially managers

such as myself and other supervisory staff, all possessing varying levels of training
in disaster preparedness.
Luckily, the mayor’s office had sent representatives to the gathering sites
indicated for staff to walk to in such events, and informed everyone in the
core buildings that were directly affected that they were to go home. With that
announcement, the CTO announced to all to “check the Web” for information,
meaning the city’s internal web site. But what if the EOC PC had been swapped
out (let’s say) for a Pentium 133 with 64MB RAM and that PC could not run
Microsoft’s FrontPage 2000? If that web site had to be updated with news
and official information on a routine basis, the results could have been at best
inconvenient and confusing.
Contingency and Costs
Because we are a publicly funded entity, we are very careful about how we
spend our customers’ money, as it is subject to great scrutiny (and rightfully so).
Customers often do not have the funds to afford both modern PC equipment
to run the latest version of Windows and a spare PC to sit in the closet, “just in
case.” After the earthquake, a couple of buildings were temporarily unavailable
for occupancy until inspectors had a chance to examine the damage to see if the
buildings were safe for employees. One of those buildings actually houses a lot
of our IT staff, and as a result, not only were we trying to find “spare PCs” for
our customers to use (while they looked for office space), but as IT support staff,
we found ourselves doing the same thing. The direct impact was that we found
it difficult in a few cases to support our customers as quickly as our service-level
agreements (SLAs) required, especially since we could not immediately reenter
our building to gather our PCs or other necessary equipment.
Lesson Learned: Keep Spares … At Least a Few
So it seems that you either pay up front or pay later. It makes sense to keep
a percentage of PCs available for these rainy-day events; 10 to 15 percent of
replaceable inventory should work. Consider that businesses of any kind are
obligated in such situations to perform a kind of “triage” as to which of their

business functions are most critical and which can be postponed—until their
entire stock of equipment can be reconnected or replaced—and 10 to 15 percent is
justified.
161
Chapter 12: Network Disaster Recovery
Have a Plan for Communications and How You Will Communicate
Following the CTO’s announcement, some asked, “What about those who don’t
have web access at home?” As IT staff, we asked, “What if the web servers
themselves had all been destroyed?” (In fact, ceiling debris in the room in which
they were housed fell very close to them, but the servers were not damaged
and the service was never down.) Still others asked, “What about those who
missed the message and don’t know to check the Web? These questions, as well
as “What to do in the event of …?” could be addressed with a clear, ever-ready
communications plan. Ironically, such plans had been developed down to the
last detail for other events, but in the case of a real “emergent” event, we as a
department had not identified a plan to follow. A priority for our department now
is to reexamine that situation and develop a plan, using communications plans
developed for the Y2K event and the like as models.
Another point: As previously mentioned, our staff is not responsible for
supporting the EOC on a routine basis. We are more than happy to be directed
to assist in that support, and as evidenced, have done so on a few occasions.
Almost immediately following the earthquake, I received a page indicating that
I was to dispatch technicians to the EOC to support the city officials who report
there during emergencies. While our team was under no agreement with the EOC
to provide support even “on demand,” I immediately asked two of my senior
technicians, who had worked at the EOC in the past, to respond. They reported
for duty there and supported the facility until the assigned staff arrived. There
was never a doubt that we would pitch in whenever asked, but I made it a point
to ask our divisional director if developing some clearer expectations, or even an
SLA, between our staff and the EOC would be appropriate, and he agreed. I did

find out that those in the EOC are granted power by legislation to use “all” city
resources in the event of an emergency, but a clear agreement could also permit
me to identify a rotating on-call staff person who could be proactive and call the
EOC in such instances.
I must point out that none of these preparations can substitute for dedicated,
intelligent people. The shining example is one of my technicians who supports
programmers responsible for the city’s payroll application. He had the presence of
mind to come early to work the day after the quake, and he somehow persuaded
the construction crew and inspectors to permit him access to the building. He
walked up 13 flights of stairs, picked up a PC and peripherals, carried it back down
the stairs and to another building, and configured it to work on the segment in
the new building. This made it possible for the programmer to run the operations
necessary for the city’s payroll run that weekend, and employees received their
checks on time, as expected. You cannot ask for more than that.

×