NANOG Digest, Vol 26, Issue 106

Sun Mar 21 00:02:06 UTC 2010

4**

Sent from my Windows Mobile® phone.

-----Original Message-----
From: nanog-request at nanog.org <nanog-request at nanog.org>
Sent: Saturday, March 20, 2010 8:00 AM
To: nanog at nanog.org <nanog at nanog.org>
Subject: NANOG Digest, Vol 26, Issue 106

Send NANOG mailing list submissions to
	nanog at nanog.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://mailman.nanog.org/mailman/listinfo/nanog
or, via email, send a message with subject or body 'help' to
	nanog-request at nanog.org

You can reach the person managing the list at
	nanog-owner at nanog.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of NANOG digest..."

Today's Topics:

   1. Re: ISC DHCP server failover (Mike)
   2. Re: CRS-3 (Steve Meuse)
   3. Re: CRS-3 (jim deleskie)
   4. Re: ISC DHCP server failover (sthaug at nethelp.no)
   5. Help with a 3561 debug (Jess Kitchen)

----------------------------------------------------------------------

Message: 1
Date: Fri, 19 Mar 2010 17:10:04 -0700
From: Mike <mike-nanog at tiedyenetworks.com>
Subject: Re: ISC DHCP server failover
To: "David W. Hankins" <David_Hankins at isc.org>
Cc: nanog at nanog.org
Message-ID: <4BA4125C.2090309 at tiedyenetworks.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

David W. Hankins wrote:
> On Wed, Mar 17, 2010 at 09:22:06AM -0500, Dan White wrote:
>   
>>   The servers stop balancing their addresses, and one server starts to
>> exhibit 'peer holds all free leases' in its logs, in which case we need to
>> restart the dhcpd process(es) to force a rebalance.
>>     
>
> If restarting one or both dhcpd processes corrects a pool balancing
> problem, then I suspect what you're looking at is a bug where the
> servers would fail to schedule a reconnection if the failover socket
> is lost in a particular way.  Because the protocol also uses a message
> exchange inside the TCP channel to determine if the socket is up
> (rather than just TCP keepalives) this can sometimes happen even
> without a network outage during load spikes or other brief hiccups on
>   
<long explanation snipped>

With all due respect and acknowledgment of the tremendous contributions 
of ISC and you yourself Mr. Hankins, I have to comment that failover in 
isc-dhcp is broken by design because it requires the amount of 
handholding and operator thinking in the event of a failure that you 
explained to us at length is required. Failure needs to be handled 
automatically and without any intervention at all, otherwise you might 
as well not have it and I think most network operators would agree.

I am certainly not prepared to develop proof of concept code or go the 
full route of developing such a server myself, however, I belive firmly 
that a failover implementation in dhcp could be designed as a 
counterpoint to the current implementation that is reliable, simple, 
scalable and requiring no special procedures once a 'break' occurs. The 
method used by isc-dhcpd, I think, creates the problem of the potential 
for unreliable failover because it's not designed for the 'right' 
problem. But there are example implementations - such as vrrp/carp - 
that would form the basis of trustworthy dhcp failover protocol. Your 
key issues are a) broadcast discovery packets, which every listening 
host on the lan segment (such as 1 or more slaves) can easily respond 
to, and b) unicast frames from relay agents and others, which could 
easily be handled by a virtual mac/shared ip address by a group of 
slaves. This means that redundancy of more than 2 hosts is already 
possible. The last pieces are protocol for servers to join and leave the 
pool of hosts serving dhcp, a master election protocol that 
pre-determines the order of slaves to fail over to in order to avoid the 
half-brain syndrome, a sanity checking protocol to ensure the elected 
master is sane and kicking (eg: the slaves all hit the master with, what 
else, dhcp requests), and a well defined group database update protocol 
over the network so that leases hit some fixed storage somewhere, sometime.

Just my $0.02 worth.

Mike-

------------------------------

Message: 2
Date: Fri, 19 Mar 2010 21:30:20 -0400
From: Steve Meuse <smeuse at mara.org>
Subject: Re: CRS-3
To: Paul Ferguson <fergdawgster at gmail.com>
Cc: "nanog at nanog.org list" <nanog at nanog.org>
Message-ID: <20100320013020.GA1574 at mara.org>
Content-Type: text/plain; charset=us-ascii

Paul Ferguson expunged (fergdawgster at gmail.com):

> -----BEGIN PGP SIGNED MESSAGE-----
> >
> > Anyone have any idea how much a fully configured CRS-3 would cost?  Or
> > how much power it would consume?  Or how much heat it would generate?
> >
> 
> Admittedly, my information on these topics comes from NPR these days. :-)
> 
> They said  it costs ~US$90k, and that AT&T was in trails.

$90k is the price of the special lift jack you need to move them around :) 

-Steve

------------------------------

Message: 3
Date: Fri, 19 Mar 2010 22:37:48 -0300
From: jim deleskie <deleskie at gmail.com>
Subject: Re: CRS-3
To: Steve Meuse <smeuse at mara.org>
Cc: "nanog at nanog.org list" <nanog at nanog.org>
Message-ID:
	<ffcec29f1003191837n5b177b0exbeede90529eefbda at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Thats funny, not sure if Cisco sells one or not but back in the day, I
worked @ Avici, and we did in fact have a special jack used to move
the chassis around :)

-jim

On Fri, Mar 19, 2010 at 10:30 PM, Steve Meuse <smeuse at mara.org> wrote:
> Paul Ferguson expunged (fergdawgster at gmail.com):
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> >
>> > Anyone have any idea how much a fully configured CRS-3 would cost? ?Or
>> > how much power it would consume? ?Or how much heat it would generate?
>> >
>>
>> Admittedly, my information on these topics comes from NPR these days. :-)
>>
>> They said ?it costs ~US$90k, and that AT&T was in trails.
>
> $90k is the price of the special lift jack you need to move them around :)
>
> -Steve
>
>
>
>

------------------------------

Message: 4
Date: Sat, 20 Mar 2010 09:43:41 +0100 (CET)
From: sthaug at nethelp.no
Subject: Re: ISC DHCP server failover
To: mike-nanog at tiedyenetworks.com
Cc: nanog at nanog.org
Message-ID: <20100320.094341.74713325.sthaug at nethelp.no>
Content-Type: Text/Plain; charset=us-ascii

> With all due respect and acknowledgment of the tremendous contributions 
> of ISC and you yourself Mr. Hankins, I have to comment that failover in 
> isc-dhcp is broken by design because it requires the amount of 
> handholding and operator thinking in the event of a failure that you 
> explained to us at length is required. Failure needs to be handled 
> automatically and without any intervention at all, otherwise you might 
> as well not have it and I think most network operators would agree.

Note that this method of handling failover is inherent in the original
DHCP failover design. See

     http://tools.ietf.org/id/draft-ietf-dhc-failover-12.txt

Specifically, quoting from the above draft,

"While this technique works in some domains, having the only server to
which a DHCP client can communicate voluntarily shut itself down seems
like something worth avoiding.

The failover protocol will operate correctly while both servers are
unable to communicate, whether they are both running or not.  At some
point there may be resource contention, and if one of the servers is
actually down, then the operator can inform the operational server and
the operational server will be able to use all of the failed server's
resources."

I certainly cannot speak for "most network operators". However, I will
note that I have been aware of this behavior of the IDC DHCP server
as long as I have been running it in failover mode.

> I am certainly not prepared to develop proof of concept code or go the 
> full route of developing such a server myself, however, I belive firmly 
> that a failover implementation in dhcp could be designed as a 
> counterpoint to the current implementation that is reliable, simple, 
> scalable and requiring no special procedures once a 'break' occurs.

And which implements failover protocol in the IETF draft?

Steinar Haug, Nethelp consulting, sthaug at nethelp.no

------------------------------

Message: 5
Date: Sat, 20 Mar 2010 11:27:44 +0000 (GMT)
From: Jess Kitchen <jess.kitchen at adjacentnetworks.net>
Subject: Help with a 3561 debug
To: nanog at nanog.org
Message-ID:
	<alpine.BSF.2.00.1003201122110.72001 at beaujolais.extremis.net>
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII

Hello,

If anyone is single homed via Savvis AS3561 that could spare a minute to 
help with a couple of mtr/tcptraceroute/iperfs that would be great- trying 
to drill down a peculiar and intermittent issue that has been occurring 
since some time Thursday (packets indescriminately dropped on the floor 
but only on particular paths)

Please mail offlist, thanks

-- 
Jess Kitchen <jess.kitchen at adjacentnetworks.net>

------------------------------

_______________________________________________
NANOG mailing list
NANOG at nanog.org
https://mailman.nanog.org/mailman/listinfo/nanog

End of NANOG Digest, Vol 26, Issue 106
**************************************