plea for comcast/sprint handoff debug help

Christopher Morrow morrowc.lists at gmail.com
Fri Nov 6 18:48:04 UTC 2020


On Fri, Nov 6, 2020 at 5:47 AM Randy Bush <randy at psg.com> wrote:
>
> > Admittedly someone (randy) injected a pretty pathological failure
> > mode into the system
>
> really?  could you be exact, please?  turning an optional protocol off
> is not a 'failure mode'.

I suppose it depends on how you think you are serving the data.
If you thought you were serving it on both protocols, but 'suddenly' the RRDP
location was empty that would be a failure.

Same if your RRDP location's tls certificate dies...
One of my points was that it appeared that the software called 'bad
tls cert' (among other things I'm sure)
a failure, but not 'empty directory' (or no diff file). It's possible
that ALSO 'no diff' is considered a failure
but that swapping to  alternate transport after a few failures was not
implemented. (I don't know, I have not looked
at that part of the code, and I don't think alex/tim said either way).

I don't think alex is wrong in stating that 'ideally the operator
monitors/alerts on health of their service', I
think it's shockingly often that this isn't actually done though. (and
isn't germaine in the case of the test / research in question)

My suggestion is that checking the alternate transport is helpful.

-chris


More information about the NANOG mailing list