An appeal for more bandwidth to the Internet Archive

Denys Fedoryshchenko nuclearcat at nuclearcat.com
Wed May 13 08:43:20 UTC 2020


On 2020-05-13 11:00, Mark Delany wrote:
> On 13May20, Denys Fedoryshchenko allegedly wrote:
>> What about introducing some cache offloading, like CDN doing? (Google,
>> Facebook, Netflix, Akamai, etc)
> 
>> Maybe some opensource communities can help as well
> 
> Surely someone has already thought thru the idea of a community CDN?
> Perhaps along the lines of pool.ntp.org? What became of that
> discussion?
> 
> Maybe a TOR network could be repurposed to cover the same ground.
> 
> 
> Mark.
I believe tor is not efficient at all for this purposes. Privacy have 
very high overhead.

Several schemes exist:
1)ISP announce in some way subnets he want to be served from his cache.
1.A)Apple cache way - just HTTP(S) request will turn specific IP to ISP 
cache. Not secure at all.
1.B)BGP + DNS, most common way. ISP does peering with CDN, CDN will 
return ISP cache nodes IP's to DNS requests.
It means for example content.archive.org will have local node A/AAAA 
records (btw where is IPv6 for archive?) for
customers of ISP with this node, or anybody who is peering with it.
Huge drawback - archive.org will need to provide TLS certificates for 
web.archive.org each local node, this is bad and probably no-go.
Yes, i know some schemes exist, that certificate is not present on local 
node, but some "precalculated" result used, but it is too complex.
1.C)BGP + HTTP redirect. If ISP has peering with archive.org, to all 
subnets announced users will get 302 or some HTTP redirect.
Next is almost same and much better, but will require small 
modifications of content engine or frontend balancers.
1.D)BGP + HTTP rewrite. If ISP <*same as before*> URL is rewritten 
within content
e.g. 
http://web.archive.org/web/20200511193226/https://git.kernel.org/torvalds/t/linux-5.7-rc5.tar.gz 
will appear as
http://emu.st.node.archive.org/web/20200511193226/https://git.kernel.org/torvalds/t/linux-5.7-rc5.tar.gz
or
http://archive-org.proxy.emu.st/web/20200511193226/https://git.kernel.org/torvalds/t/linux-5.7-rc5.tar.gz
In second option ISP can handle SSL certificate by himself.
2)BGP announce of archive.org subnets locally. Prone to leaks, require 
TLS certificates and etc, no-go.

You can still modify some schemes, and make other options that no one 
has yet implemented.
For example, to do everything through javascript (CDNs cannot afford it, 
because of way they work),
and for example, website generate content links dynamically, for that 
client request some /config.json file
(which is dynamically generated and cached for a while), so we give it 
to IPs that have a local node - URL of the local node, for the rest -
default url.





More information about the NANOG mailing list