Responsible Network Management Guidelines

Thu Sep 25 03:27:17 UTC 1997

On Wed, Sep 24, 1997 at 07:29:38PM -0500, Sean Donelan wrote:
> In addition to any substantive comments, now is the time to correct
> the grammer and spelling nits.  I plan on throwing this into the
> Informational RFC process before the next IETF meeting.

Here goes.  Didn't realize it was that small...

(Warning: I got about halfway through, and realized I was editing, rather
than just copyediting -- feel free to ignore those parts if you see fit.)

> Operational Requirements Area                                 S. Donelan
> INTERNET DRAFT                                                       DRA
> <draft-donelan-rnmg-01.txt>                               September 1997
> 
> 
>                Responsible Network Management Guidelines
> 
> Status of this Memo
> 
>    This document is an Internet-Draft.  Internet-Drafts are working
>    documents of the Internet Engineering Task Force (IETF), its areas,
>    and its working groups.  Note that other groups may also distribute
>    working documents as Internet-Drafts.
> 
>    Internet-Drafts are draft documents valid for a maximum of six months
>    and may be updated, replaced, or obsoleted by other documents at any
>    time.  It is inappropriate to use Internet- Drafts as reference
                                                ^
>    material or to cite them other than as ``work in progress.''
> 
>    To learn the current status of any Internet-Draft, please check the
>    ``1id-abstracts.txt'' listing contained in the Internet- Drafts
                                                             ^
>    Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
>    munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
>    ftp.isi.edu (US West Coast).
> 
> Rational and Scope
  Rationale

>    This document provides Responsible Network Management personnel of

All three of those words likely should not be capitalized; you're using
the term generically, not as a job title.

>    Internet Service Providers (ISPs) and Internet Service Customers

I know you had to make _something_ up to call them there... but I always
have a vague, unallocated unease about new initialisms.  Might you just say
"their customers"?

>    (ISCs) with guidelines for network management when the following
>    conditions arise:
> 
>        - Routine Maintenance Activity
>        - Problem Reporting and Referral
>        - Escalation
>        - End-to-End Testing
>        - Customer Notification
>        - Emergency Communications
>        - Network Service Interuption Measurement
> 
>    Specific procedures will require negotiations between the
>    organizations involved.  These guidelines do not replace or supersede
                                               ^^^^^^
"are not intended to"?

>    agreements or any other legally binding documents.
> 
> Responsible Internet Service Provider
> 
>    A more familar term in Internet Standards is an Autonomous System.
>    Since this document has additional requirements than an entity
>    represented by an Autonomous System or Systems, this document creates a
>    new entity.

"has...than" is a clumsy construct at best.  Are you trying to say

Since this document defines requirements additional to those customarily
expected of the operators of an Autonomous System, it must define a new
entity, encompassing AS's and also other organizations.

?

>    The Responsible Internet Service Provider (RISP) has overall
>    responsibility for Internet service between its Internet Service
>    Customers and other Internet Service Providers making up the
>    Internet.

Ok, so, basically, a RISP is a repository for a contact?

>    An Internet Network, Autonomous System or group of Autonomous Systems
>    may designate another entity to act on its behalf as its Responsible
>    Internet Service Provider.  In this document, Internet Service
>    Customer (ISC) shall refer to the collective network, Autonomous
>    System or Systems which designated the Responsible Internet Service
>    Provider as their agent.

Roughly.  An agent, in legal terms.

>    The Responsible Internet Service Provider is responsible for:
> 
>    -- Providing a contact that is readily accessible 24 hours a day, 7
>    days a week.
> 
>    -- Providing trained personnel.
> 
>    -- Acting as the Internet Service Customer's (ISC) primary contact in
>    all matters involving Internet Service between Internet Providers.
> 
>    -- Accept problem reports from Internet Service Customers and casual

        Accepting

>    end users or other parties receiving Internet Service problem
>    reports.  The RISP may prioritize problem reports from its own ISCs,
>    or refer casual end users to their primary RISP, if known.

This graf sounds like it's making an assumption that _I_, at least,
apparently am not equipped to make, as I fell off a couple turns back.

The first sentence could use to be recast.

>    -- Advising the ISC when there is an ISP failure affecting the ISC
> 
>    -- Isolating problems to determine if the reported trouble is in the
>    ISP's facilities or in other providers' service.
> 
>    -- Testing cooperatively, when necessary, with other providers to
>    further identify a problem when it has been isolated to another
>    provider's service.

Suggest moving the parenthetical after "providers".

>    -- Keeping its ISC advised of the status of the trouble repair.
> 
>    -- Maintaining complete and accurate records of its own customers and

So, basically, a RISC is an administrative and technical Point of Contact
designee?

> Routine Maintenance Activity
> 
>    Responsible Internet Service Providers should perform routine
>    maintenance work during hours of minimum traffic to impact the least
>    number of customers.  In most areas, the period of lowest Internet
>    traffic is between 1am and 6am local time.  Trans-contential and
>    inter-contential connections should consider the local time on each
>    end of the connection.

It's worthy of note (it was in one of the last 4 RISKS Digests) that, for
some things -- backbone gear, NAP's, webfarms, etc -- there _is_ _no_
good time to do maintenance.  The audience is world wide and,
statistically, you simply can't find a good hour to do it.  It might be
suggested that each category of operators ought to keep their own
traffic logs, to roughly hourly granularity, maybe, to facilitate the
determination of "the best time to down the router".

>    Activities which may affect other Internet Service Providers should
>    be coordinated with the affected providers.

Channels should be designed in advance for this sort of communication
(email, voice, pager, etc.), and tested regularly?

> Problem Reporting and Referral
> 
>    The Responsible Internet Service Provider is responsible for
>    performing all the necessary tests to determine the nature of the
>    problem detected, or reported by its customers or by referral from
>    other ISPs.  If the trouble is isolated to an ISC or another ISP, the
>    RISP will report the trouble to the appropriate ISC or ISP point of
>    contact.
> 
>    An example of the information exchanged in the problem referral
>    report:
> 
>    -- Description of the problem, including source address/name,
>    destination address/name, application or protocol involved, when it
>    last worked, when it stopped working, and any diagnostic messages or
>    test data (i.e. ping, traceroute).
> 
>    -- Customer reported problem severity
> 
>    -- RISP determination of problem severity
> 
>    -- The name and contact information of the person referring the
>    problem
> 
>    -- The referee's trouble ticket number, and origination date/time
> 
>    -- The name of the person accepting the report
> 
>    -- The acceptor's trouble ticket number, and acceptance data/time

Oh, _ghod_ if we could design a standardized trouble ticket interchange
format.  Excuse me, I feel an RFC coming on.  :-)

>    Periodic status reports shall occur when the problem has been
>    isolated, when there is a significant change in the status of the
>    problem, and when negotiated time intervals expire.  Escalation will
>    be according to negotiated procedures.

And prior negotiation should probably take place to decide on
equivalencies of severity levels and escalation justifications, etc.

Sorry; I'm a systems designer by trade; the stuff just runs out of my
fingertips.  :-)

>    Problem isolation may require cooperative testing between the ISC and
>    ISP(s), which shall be provided when requested.  The provider making
>    the test is responsible for coordination.
> 
>    When the problem has been cleared, the ISP/ISP or ISP/ISC shall
>    advise the other the problem has been cleared.  When closing a problem
>    report between ISP/ISP or ISP/ISC, the disposition should be
>    furnished by the organization closing the ticket.

Are thos slashed abbreviations _correct_?  I guess I missed something; I
don't have an expansion ready to hand that fits.

>    An example of the information exchanged in the problem disposition:
> 
>    -- Trouble ticket number
> 
>    -- Referral datetime
> 
>    -- Returned datetime
> 
>    -- Trouble identified as
> 
>    -- Resolution details
> 
>    -- Service charges, if the ticket resulted in a service charge
> 
>    If there is a disagreement about the disposition of a problem ticket,
>    the parties involved should document their respective positions and
>    the names of the individuals involved.  Escalation will be made
>    according to each organizations escalation procedures.

Glad this is in here... :-)

> Escalation
> 
>    Each ISP and ISC shall establish procedures for timely escalation of
>    problems to successive levels of management.  The procedures should
>    include the provision of status reports to the other provider or
>    customer regarding the ticket status.  Both technical and management
>    contacts should be included in the escalation procedures.

I suspect that's not enough... but we'll see...

> End-to-End Testing
> 
>    Networks may experience problems which cannot be isolated by each
>    provider individually testing and maintaining its own services.  Each
>    providers' service may appear to perform correctly, but trouble
>    appears on an end-to-end service.  The ISC's RISP should coordinate
>    end-to-end testing with each sectional provider by problem referral
>    through their Responsible Internet Service Provider.  Each Internet
             ^^^^^
Pronoun without a referent.  Whose?  The ISC?  The RISP?  The sectional
provider?  (There's another new piece of terminology.)

>    Service Provider should accept the referral request for end-to-end
>    testing coordination, and provide the contact information for the
>    next sectional provider to the original requestor.

This assumes to some extent that the customers -- even though they're
paying for the lines -- can actually _get_ the information from the
vendors... something which isn't always true.  Perhaps a statement
encouraging that?

> Customer Notification
> 
>    During a major outage a potential concern is customer goodwill and
                          ,
>    network congestion caused by repeated customer attempts to access the
>    down network.  An informed customer can reduce customer frustration,
>    and network congestion.
> 
>    Pre-planning for quick notification can be most beneficial in
>    alerting customers.
> 
>    Some example methods to notify customers include:
> 
>    -- If operational, network access equipment can display an alert when
>    customers connect.  The alert should be displayed before the customer
>    logs into the network.  If the network fails during or after
>    attempting to validate the access information, the alert should not
>    compromise any authentication information.

Particularly consumer software _really_ ought to have provision for a
messaging system, like the motd and/or wall.  The lack of this on, say,
Win95 drives me up a tree...

>    -- Customer service calls increase dramatically during network
>    failures.  An informed customer representative can advise the
>    customer on the best course of action.  A method to quickly instruct
>    customer service representatives on the available options should be
>    implemented.

Putting known outages on the automated attendant, like the cable
companies do, would be nice.  I know good engineering will _never_ win
out over paranoid management, but if I'm paying for a service, I don't
wanna _guess_ when it's broken.  I don't _care_ if the announcements
make life harder for the sales team.  Maybe they won't have so many
outages...

>    -- The media, radio or television, can be used to inform the public.
>    Pre-arrangements, and planning are needed to ensure only designated
>    contacts are made with the media.

Is there _any_ part of the net that's this globally critical?

>    -- Other automated announcements, such as World Wide Web pages or e-
>    mail distribution lists with backup through other providers, recorded
>    telephone status lines, or broadcast FAX/Pager notifications.
> 
>    Public notifications, when utilized, should not make reference by
>    name to the organization believed causing the problem unless the
                                      ^ to be
>    organization causing the problem has been confirmed.  Internet
>    network problems can be difficult to isolate, and can give misleading
>    indications to their true origin.

Confirmed is a sticky concept.  I wouldn't _ever_ announce it, myself.

Unless that party did, and "who's allowed to say you can announce it" is
something you need to track.

> Emergency Communications
> 
>    Recognizing that all Responsible Internet Service Providers have a
>    responsibility to provide an adequate level of support for their
>    service and/or products, it is recommended they participate in an
>    backup emergency communications system.

Like having valid whois(1) info?  :-)

>    The backup emergency communications system should not depend on the
>    operation of the primary network for obtaining contact,
>    authentication, or other communications information during a network
>    problem.  Each RISP is responsible for providing a Emergency Point Of
>    Contact.  It is recommended each Emergency POC have at least one
>    out-of-band contact method, such as an internationally dialable (non
>    1-800) voice and/or fax telephone number.  Each RISP should pre-
>    arrange a method for verifying the identity of the Emergency Point of
>    Contacts using alternative communications methods, such as a

     Contact

>    challange/response code-word or call-back to a known telephone

     challenge

>    number.

Note that this isn't always good enough, if the problem is an attack.
Call-forwarding and butt-sets, doncha know.

>    Each RISP should maintain a current off-line copy of the emergency
>    contact procedures for each gateway inter-connection.  Each RISP
>    should establish procedures for keeping the off-line emergency
>    contact procedures updated.  Each RISP shall test and verify its own
>    emergency POC procedures are accurate and functioning on a regular
>    basis, no less than once a year.

On the net?  Monthly...

> Network Service Interuption Measurement
> 
>    Each ISP/ISC should maintain accurate records about service
>    interruptions to measure and develop trend analysis of their network
>    availability.
> 
> Security Considerations

You may wish to choose a different section title.  "Security
Considerations" is customarily used to mean "...of implementation of the
procedures in this RFC", which is, I think, not what you mean here...

>    -- Maintain a complete and accurate record of a RISP's own customers
>    and inter-provider gateways.
> 
>    -- Public notifications, when utilized, should not make reference by
>    name to the organization believed causing the problem.
> 
>    -- If the network fails during or after attempting to validate the
>    access information, the alert should not compromise any
>    authentication information.
> 
>    -- Each RISP should pre-arrange a method for verifying the identity
>    of Point of Contacts using alternative communications methods, such
>    as a challange/response code-word or call-back to a known telephone

	  challenge
>    number.
> 
> 
> Author's Address
> 
>    Sean Donelan
>    Data Research Associates, Inc.
>    1276 North Warson Road
>    Saint Louis, MO 63132
> 
>    Phone: +1-314-432-1100
>    EMail: sean at DRA.COM

Not bad.  But, from down here in the trenches, I think it could use
another round of flogging.  How much commentary have you gotten on it?

Cheers,
-- jr 'will stick fingers in others' RFCs for food' a
-- 
Jay R. Ashworth                                                jra at baylink.com
Member of the Technical Staff             Unsolicited Commercial Emailers Sued
The Suncoast Freenet      "People propose, science studies, technology
Tampa Bay, Florida          conforms."  -- Dr. Don Norman      +1 813 790 7592