Scalable Mail solution with NAS

Wed Jan 31 19:12:54 UTC 2001

On Wed, 31 Jan 2001, Eric Sobocinski wrote:
> At 11:06 AM -0500, 01/31/2001, Sebastien Berube wrote:
> >One way to fix this
> >issue would be to use a hashing scheme to split the amount of actual
> >mailboxes into a subdirectory structure.  You could get something like
> >
> >johndoe at yourdomain.com would have his mailbox in
> >
> >/export/mailboxes/j/o/h/n/johndoe.mbox
> >
> >so in /export/mailboxes, in order to find the j directory, you only have
> >about 36 directories entries or so.
> >
> >Although this example is not good in the case where you accept usernames
> >with 3 or less characters.
>
> It's not hard to right-pad any short usernames before hashing.  For
> instance, the username "bo" might hash as "bo__" and thus would end up in
> the directory "/export/mailboxes/b/o/_/_/bo.mbox".  If you allow
> non-alphanumerics you'll want to translate those to something innocuous as
> well, or a name such as "bo.lee" will cause problems.

Well, hashing like that works well from the standpoint that it's very easy 
for the software to find the mailbox.  It's going to make things like backups 
very costly, though, because of all the recursive directories.  Also, you're 
going to end up with some directories very imbalanced, since there are more 
frequently occurring names.  

If you're going to use NFS, you probably want to use something like maildir 
format. - which is nfs-safe but becomes very costly as the number of messages 
increase. A lot of that has to do with the performance of the remote nfs 
server - the underlying filesystem's performance in reading large directories 
will make a BIG difference as far as that goes.  Netapps have excellent 
large-directory performance, fwiw.

If you're looking for large scalability AND high performance, my preferred 
solution would be to have a relational database as the backend, but don't 
store any messages in it - simply pointers to their location on disk.  Then 
store the messages without regard to intended username in a hashed directory 
structure.   The pop3 server then gets the list of new messages from the 
database server, which could just be a list of filenames.  Then, the pop3 
server simply has to open the message to return it - it doesn't have to do an 
opendir().  Also, if you use the filename as the UIDL returned, there's no 
need to even stat() the file, again saving you a whole nfs call.   The 
obvious downside is that you can't do a :

rm -f /users/j/o/h/n/johndoe.mbx

But, with 200k mailboxes, you should have an automated way to do that anyway.

Thanks,
Matt

-- 
Matthew J. Zito
Systems Engineer
Register.com, Inc., 11th Floor, 575 8th Avenue, New York, NY 10018
Ph: 212-798-9205
PGP Key Fingerprint: 4E AC E1 0B BE DD 7D BC  D2 06 B2 B0 BF 55 68 99