Scalable Mail solution with NAS
Sebastien Berube
sberube at zeroknowledge.com
Wed Jan 31 19:49:12 UTC 2001
On Wed, 31 Jan 2001, Matthew Zito wrote:
> On Wed, 31 Jan 2001, Eric Sobocinski wrote:
> > At 11:06 AM -0500, 01/31/2001, Sebastien Berube wrote:
> > >One way to fix this
> > >issue would be to use a hashing scheme to split the amount of actual
> > >mailboxes into a subdirectory structure. You could get something like
> > >
> > >johndoe at yourdomain.com would have his mailbox in
> > >
> > >/export/mailboxes/j/o/h/n/johndoe.mbox
> > >
> > >so in /export/mailboxes, in order to find the j directory, you only have
> > >about 36 directories entries or so.
> > >
> > >Although this example is not good in the case where you accept usernames
> > >with 3 or less characters.
> >
> > It's not hard to right-pad any short usernames before hashing. For
> > instance, the username "bo" might hash as "bo__" and thus would end up in
> > the directory "/export/mailboxes/b/o/_/_/bo.mbox". If you allow
> > non-alphanumerics you'll want to translate those to something innocuous as
> > well, or a name such as "bo.lee" will cause problems.
>
> Well, hashing like that works well from the standpoint that it's very easy
> for the software to find the mailbox. It's going to make things like backups
> very costly, though, because of all the recursive directories. Also, you're
> going to end up with some directories very imbalanced, since there are more
> frequently occurring names.
In order to remedy this rather easily, you can always run the username
through a hashing function and use the first 'n' letters of the hash to
figure what directory the mail(box|dir) is in. That also prevents
problems with non-alphanumerical characters such as "."
>
> If you're going to use NFS, you probably want to use something like maildir
> format. - which is nfs-safe but becomes very costly as the number of messages
> increase. A lot of that has to do with the performance of the remote nfs
> server - the underlying filesystem's performance in reading large directories
> will make a BIG difference as far as that goes. Netapps have excellent
> large-directory performance, fwiw.
>
> If you're looking for large scalability AND high performance, my preferred
> solution would be to have a relational database as the backend, but don't
> store any messages in it - simply pointers to their location on disk. Then
> store the messages without regard to intended username in a hashed directory
> structure. The pop3 server then gets the list of new messages from the
> database server, which could just be a list of filenames. Then, the pop3
> server simply has to open the message to return it - it doesn't have to do an
> opendir(). Also, if you use the filename as the UIDL returned, there's no
> need to even stat() the file, again saving you a whole nfs call. The
> obvious downside is that you can't do a :
>
> rm -f /users/j/o/h/n/johndoe.mbx
>
> But, with 200k mailboxes, you should have an automated way to do that anyway.
It also makes backups a nightmare. In that case, you'll have to shutdown
the entire mail system before you can backup or you'll have a database
image which won't represent the actual data you have on your NAS.
>
> Thanks,
> Matt
>
>
--
Sebastien Berube
Operation Center Systems Administrator
sberube at zeroknowledge.com
In Gary we trust.
More information about the NANOG
mailing list