Joe St Sauver
JOE at OREGON.UOREGON.EDU
Wed May 7 01:48:51 UTC 2003
# Howdy, if this is off-topic I certainly apologize however I
#believe that running an NNTP server is usually part of a 'network
#operations' sphere of influence.
Dunno about that, but I'll chime in a couple of ideas just because the volumes
of NNTP traffic involved have gotten to the point where the traffic alone is
probably operationally interesting, everything else aside.
#I have a few basic questions. Does anyone
#know off hand how much disk is needed for a fairly respectable NNTP server
#for a full feed?
Daily Usenet volumes are extremely sensitive to decisions with respect to
carrying (or not carrying) even single groups. See, for example
http://www.newsadmin.com/top100bytes.htm which shows that the top half dozen
groups (by bytes posted) had daily volume running:
Binary Newsgroup Bytes % Total
1 alt.binaries.dvdr 30,304,095,023 5.893
2 alt.binaries.cd.image.xbox 25,796,723,944 5.017
3 alt.binaries.dvd 19,428,583,576 3.778
4 alt.binaries.multimedia 17,783,671,185 3.459
5 alt.binaries.cd.image.games 15,303,064,035 2.976
6 alt.binaries.svcd 14,780,524,967 2.874
[commas added to byte counts for improved legibility] Yes, carrying or not
carrying a single group can have a 30GB/day impact.
Yes, daily traffic for a fullish feed *has* peaked in excess of 600GB and
3 million articles/day. If you want to carry "everything," you could
multiply ~0.6TB/day times the number of day's retention you want to keep,
however note that over time your retention will drift downward as volumes
continue to increase.
Also note that this is is just raw article storage space, and does not
include space for article overview data, history files, etc.
Daily Usenet volumes (in bytes) are also exceptionally sensitive to maximum
article size, with the 80/20 rule roughly holding for byte traffic and article
count (e.g., 80% of the articles by article count will require just 20% of
the transfer bandwidth). If your goal is to live within a given bandwidth
budget, or to efficiently utilize a particular size disk array, you can
readily adjust your total article payload/day by dialing down the maximum
article size you elect to accept.
In case you doubt these volume stats, a couple of sites with publicly
accessible daily traffic summaries include:
I would note that most "full" feeds today really *AREN'T* full, however.
#Also is IDE still too slow/unreliable for this type of
#operation? I know back when we got our current server IDE was very slow it
#has sped up a bit since then.
Choice of file system, and storage methodology can be as or more critical
than whether or not you're using IDE or SCSI. The days where traditional
article-per-file spools in UFS file systems would work are definitely
gone -- cyclical news file systems on top of ReiserFS are a popular recipe
#The reason I am asking is because it has come
#time for the old NNTP server to be buried somewhere in the mountains and for
#me to procure a new one. Currently we are running a P3 600 /w about 200 GB
#of storage on Solaris and Typhoon, the reason we are replacing this server
#is for the poor performance and its abhorrent retention.
If you're planning to work with a full feed, you won't regret getting as much
CPU, memory, disk, and network connectivity as you can afford. I don't want
to get into hardware/OS/server religious wars, so I'll skip any specifics
here (although feel free to contact me offlist if you're interested in talking
about some starting points for hardware options that seem to work okay).
Joe St Sauver (joe at oregon.uoregon.edu)
University of Oregon Computing Center
More information about the NANOG