Broadband Subscriber Management
steve at ibctech.ca
Thu Apr 23 18:50:12 CDT 2009
Leigh Porter wrote:
> Could you have two instances of RADIUS, one for the middle-man and
> ignore the accounting from that server?
First I'd like to thank all of those who responded off-list. To not
waste everyone's time, I'd like to throw out there that this message can
technically be pruned to PPPoE DSL ops.
For completeness sake, I'll describe the problem (in more detail), and
provide further info, as I think that we've got it solved. I'd
appreciate feedback if anyone notices a flaw in my thinking, because as
I've said, we auth users on DSL... we do not operate the DSL infrastructure.
We have (from my unconfirmed understanding):
Bell BAS/LAC---DSL LNS---ISP LNS----Me
We were receiving auth requests from the ISP LNS. We were receiving acct
requests from both the DSL LNS and the ISP LNS. The packets from both
ISP and DSL are over trinary Internet paths, and don't rely on each
other for us to receive them (or respond to them).
I don't know whether it was the NASs themselves that were sending the
RADIUS packets, or whether they were sent from a RADIUS server. I'm not
familiar with those inner workings. My RADIUS logs would show the
requests coming from a DNS name that included "lns" in both cases.
Two problems were apparent. The first cosmetic, the second affected
- the duplicate acct packets (one from ISP and a second from DSL) were
doubling up our accounting data for each user authentication
- users who were ``kicked'' from the ISP (according to RADIUS logs)
would not attempt to re-auth, causing a major helpdesk issue (sync, no conn)
A colleague and I went to work on the issue, essentially trying to
reverse engineer the problem, as we have no access to the intermediary
gear, and as such, no way to access logs and/or details.
We have found so far that it appears as though a user is authenticated
once via our RADIUS server (as expected). We would then receive standard
RADIUS acct packets from BOTH LNSs, which our RADIUS server merrily ack'd.
When the connection between DSL and ISP broke, the ISP would see our
connection as down, and terminate the session with a STOP packet.
However, it appears as though the DSL provider would continue to send
interim update acct packets to our RADIUS server, and it would never
learn about the STOP. The CPE continues to think the session is still
active (as a matter of fact, in the case of gw capable CPE, the IP info
would still be retained).
So, in conclusion, I'm thinking this:
- the auth was accepted once, which allowed the session
- the accounting packets have/had operational relevance to both the ISP,
and the DSL providers
- once I had the DSL provider turn off acct to my RADIUS servers and the
sync-no-conn went away, the START/STOP packets are important to DSL
- we have multiple realms, and have tested on almost all of them. Each
time a realm was removed from the DSL providers config, and only allowed
via the ISP, things went back to normal
- this type of setup may have unwittingly had a network op reset
numerous (hundreds) of users on the ISP LNS, not realizing that the
users would never reconnect (even though traditional experience would
know that the user wouldn't notice a thing)
- that this type of setup should be scrutinized a bit, because if this
RADIUS acct packet issue could really be the cause of all of our recent
issues, I'm glad I have 1k DSL users, not 1M.
Does this RADIUS accounting packet 'keepalive' sound reasonable?..*off
to print some RADIUS RFC's for review*.
ps. A few people mentioned filtering out packets to RADIUS from the
unwanted sources. I was thinking about this a few days ago, but didn't
understand the operational impact.At the switch date, we had numerous
realms, and from what I have seen today, blocking RADIUS accounting
packets from the "DSL" provider may have disconnected ALL of our users.
This migration to having the intermediary ISP came **very** quickly.
Feedback/operational experience requested...
More information about the NANOG