Paul A Vixie
paul at vix.com
Tue Apr 23 04:38:19 UTC 1996
> The thing I wonder now is that, if we have essentially unlimited
> scalability of the DNS both technically and administratively, should
> ISPs even _care_ about what names people choose to use?
People usually run out of brain cells before computers run out of memory.
So it is in this case; recall the longish message I sent to the IETF list
last year when .COM was getting a lot of air time, wherein I said:
>Who can tell the difference between ACTIVELIFE.COM and ACTIVELIFESTYLE.COM?
>Why is there an INTERNETMCI.COM and an MCI.COM? Will I ever get a piece of
>mail from TAMPAX.COM? Are AFTERMIDNIGHT.COM and AFTERMIDNITE.COM the same
>company? What about AFTERMARKET.COM and AFTMKT.COM?
(I'll include the full text at the end for those who missed it the first time.)
> Moreover, if time needs buying for you development wizard(s), how
> much do you need and what should we be telling people in the interim?
This battle has essentially been lost. .COM will not come back to usefulness
no matter what we do. The YMBK proposal from the second NSF workshop will
almost certainly be the way of the future, and we will see a lot more TLD's
and hopefully no one of them will ever be as ugly as .COM is now.
What ISP's can do is stop registering trash domains. Tell your users to put
their WWW pages in a domain park of some kind, rather than allocating a TLD
for every one-person "company" whose scope of operations is a local city or
Anyway, here's the full text of my IETF article from last year:
From: paul at vix.com (Paul A Vixie)
Subject: Re: draft-isoc-dns-role-00.txt
Date: 25 Nov 1995 01:10:29 -0800
Organization: Vixie Enterprises
Sender: daemon at vix.com
Message-ID: <9511250849.AA11703 at wisdom.home.vix.com>
X-Received: by gw.home.vix.com id AA23854; Sat, 25 Nov 95 01:10:26 -0800
X-Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa07057;
25 Nov 95 3:49 EST
X-Received: from [220.127.116.11] by IETF.CNRI.Reston.VA.US id aa07001;
25 Nov 95 3:49 EST
X-Received: from gw.home.vix.com by CNRI.Reston.VA.US id aa04311;
25 Nov 95 3:49 EST
X-Received: by gw.home.vix.com id AA22525; Sat, 25 Nov 95 00:49:00 -0800
X-Btw: vix.com is also gw.home.vix.com and vixie.sf.ca.us
X-Received: by wisdom.home.vix.com id AA11703; Sat, 25 Nov 1995 00:49:00 -0800
X-To: ietf at CNRI.Reston.VA.US
X-In-Reply-To: Your message of "Fri, 24 Nov 1995 09:08:00 PST."
<m0tJ1cJ-00050gC at roam.psg.com>
I promised that I'd answer Randy's question before I dropped out of this
discussion. A few more messages have come in that make a good backdrop,
so I'll answer in a batch.
> > .COM was full a year ago.
> Uh, could you point me to the meter?
Bill Manning says that the bigz mailing list answered this question and I
hope that any interested parties will follow up the reference he gave. I've
my own metrics and therefore my own thresholds, which I'll try to explain.
Giving an entity or object a name is better than not giving it one, since
the name is useful in quite a lot more circumstances than the object itself
would be. Without being able to call Randy "Randy" I would be limited to
attribute-based specifications ("That guy up in the Pacific Northwest who
runs part of RAINnet and is the DNSIND chairman" is quite a mouthful, eh?)
or pointing and grunting (which is impossible except when Randy and I are
in the same place at the same time, like an IETF meeting or some such.)
If we must assign names, how then shall we recognize a good one as being
better than a bad or mediocre one? We could argue from aesthetics and say
simply that Randy is a "prettier" name than Mxtlplk (though readers of classic
Superman comics will probably come forward to debate me on this point);
however, aesthetics are usually held and practiced subjectively, and I
dispair of setting forth an objective system of aesthetics in this crowd.
We could argue instead from utility but we would have to discard "Randy"
since I often become confused as to which "Randy" I'm hearing about. With
human objects, there are usually additional qualifiers that can be used when
and only when ambiguity would otherwise result -- thus "Randy Bush" vs.
In DNS this "occasional qualification" isn't possible since the various
members of the class whose name would make a good second level name don't
want to share a domain and use subdomains. People want their own domain
names "so they can be as good as everybody else who has their own domain."
It's a problem in sociology and it's one we're not equipped to deal with,
partly because we are mostly not sociologists here and partly because the
die is cast and the culture set: we will not get folks to go along with
longer domain names unless we leave them no option. (I call this "domain
envy" and it's inversely proportional to the length of one's, um, thing.)
A name is good if it has uniqueness in the context where it is used, and
a name is better if it also conveys some information about the object being
named. DNS names almost always map to real world objects and those real
world objects almost always had their own names before the domain was
created. The ability to encode the real world name into the domain name
is seen as "good" because it allows the DNS name to convey some information
about the object being named. But herein lies the rub: real world objects
whose names are identical to other real world object names and who experience
no collisions in the real world due to being in different industries or
different locations or both, have to fight it out for the right to own the
DNS name that maps the "collision name." Is PAULS.COM "Paul's Cafe" or is
it "Paul's Auto Repair"? Surely we don't want PAULSAUTOREPAIR.COM but that's
where things are headed -- which is doubly sad since there are probably 75
auto repair businesses named "Paul's Auto Repair" but since they are in
different cities nobody has ever cared before.
The relationship between real world names and DNS names is crucial to the
understanding of why .COM is full. DNS names are not "proper nouns", except
in cases like HOME.NET where the company name actually is a domain name. A
DNS name is a second-level "handle" on a real world object's name. That is,
a DNS name describes not a real world object, but rather the name of a real
world object -- a name the real world object had before the DNS name was
I'll come back to this in a minute. Right now let's all take a breather and
take in some humour from Andrew:
> > continuing to find meaningful names in .COM is increasingly impossible.
> No kidding.
> Some real domains follow...
The reason these names are so ugly is that there are too many trademark
attorneys in the world and they have created demand for their own services
by hiring out to companies who then sue each other over things which don't
matter to anyone but the trademark attorneys themselves. It turns out that
if you use your trademarked company or product name with incorrect punctuation
then it weakens your hold on the name. Punctuation creates what the trademark
weenies call "distinction" and you want to be very careful not to have any
distinction among your uses of your own marks. Gag me. There oughta be a law.
> > many domain names have zero people using them.
> Many, if not most, corporations are shells, too. There still aren't
> even one for every 20 U.S. inhabitants, and we are one of the most
> densely packed nests of pro forma incorporations in the world. My gut
> is that the experience is going to be identical -- that is, that most
> domain names will be dummies, but that most people won't have domains.
Let's use a smaller number since my point is as valid with 20,000,000 names
as it is with 250,000,000. I think the 250,000,000 number is reasonable since
we have so many companies registering tripe like BATMANFOREVER.COM or
MOPAR.COM or whatever. But we've gotten sidetracked arguing about the total
number and that wasn't my intention at all.
When there are 20,000,000 or even 2,000,000 names in .COM, then statistically
speaking none of them would have any "name value," a term I'll come back to.
> You forget that people like us getting personal domains are
> ultra-eccentric hacker types who actually know how to run and use
> them. I also happen to have a bunch of corporate personae. In both
> respects I'm an oddity. Most people living in this country or
> worldwide aren't going to do that sort of thing because its a pain and
> they won't care, just as most people don't get vanity license plates.
My experience differs significantly. PIERMONT.COM (and PSG.COM and VIX.COM)
all have some A RR's in them. Not so the average. Having a personal vanity
domain with just an MX RR pointing at a service provider is now _far_ more
common in my personal experience than technogeeks with a handful of hosts.
(Last week I refused to do business with someone simply because they were
polluting .COM with a vanity name that didn't need to be there -- and if
anyone with a stronger stomach would like a consulting lead, ask me for it.)
> > many .COM domain names are not of incorporated entities,
> > or even commercial entities. .COM is a cess pool and the
> > sewage runneth over.
> Again, I don't see it as a huge problem at the moment.
We may have to agree to disagree about this, but maybe not. You sent the
above before Andrew sent his "ugly name" list, and it may be that the numbers
below will help to change your mind. (I'm a little aghast, as it's hard to
imagine anyone looking at .COM and thinking there's no problem, but this won't
be the first time Perry and I will have seen things differently.)
> > .COM was full a year ago. And the 100,000 ugly names we have now are only
> > a molecule in a bucket compared to the 250,000,000 names we'll need by 2005
> Great! Someone willing to tell the rest of us the correct
> way to tell when a zone is "full". :-)
Yes. I hope that :-) doesn't indicate that your question was rhetorical,
since I don't think it is at all.
> I think that we can agree that from a technical perspective,
> its tough to "fill" a zone. After all, labels is just labels.
> Its when we attach semantic meaning that things get dicey.
Doubly right. The trick to attaining "name goodness" in the form of "unique
within the context where it is used" is designing your context properly.
Having everything live in .COM (the other top levels are too small to be
mathematically significant) is a sloppy context, and that's why we have so
little uniqueness within that context.
"Name value" can work two ways. If you can often (more than half the time)
deduce an object's name by knowing something about the object, you're winning.
When the Internet was small this was easy. You took a company's name (or its
initials if the name was really long) and added ".COM" to it and if you didn't
win, it meant the company wasn't on the Internet at all. Thus "DEC.COM" or
"APPLE.COM." Occasionally you'd get a false positive, like the poor sods who
looked for Apple Records (UK) on the net some years back and got instead some
computer company in Cupertino. But back in what I longingly think of as the
good old days, false negatives and false positives made up an insignificant
portion of the total results of "domain name guessing." These days the
false results of guessing (positive and negative combined) are gaining on the
other result categories (true negatives, true positives) and will soon be
about even with them. This means names aren't as guessable as they used to
be and soon won't be guessable at all.
The other way "name value" can work is that if you can look at an object's
name and deduce something about the object itself, you're winning. Again,
things aren't as smooth as they used to be. Very few Internet folks have
ever looked at APPLE.COM and thought of the record company in the UK -- but
when most folks see EXAMINER.COM they think of the newspaper called "Examiner"
in their own city, not the one called "Examiner" in San Francisco. And when
I see "ASA.COM" I think of my computer hardware supplier (whose domain name
is in fact ASACOMPUTERS.COM since ASA.COM was taken), not the American Sailing
Association in Marina del Rey (who really ought to be in .ORG, anyway.)
"Name value" in the New Internet (sort of like New Coke?) means camping onto
a name that folks are likely to guess and hoping to get some business from
those guesses. A "good name" in this scheme is one that users will associate
with your product and which your competitors will wish they'd guessed first.
800-555-1212.COM comes to mind. Also 801-, 802-, etc all through 888-. Ick.
I guess I don't want to talk about MICROS0FT.COM other than to mention it.
Who can tell the difference between ACTIVELIFE.COM and ACTIVELIFESTYLE.COM?
Why is there an INTERNETMCI.COM and an MCI.COM? Will I ever get a piece of
mail from TAMPAX.COM? Are AFTERMIDNIGHT.COM and AFTERMIDNITE.COM the same
company? What about AFTERMARKET.COM and AFTMKT.COM?
.COM is being treated as "the stone tablets of the Internet" and it is being
used (with suboptimal results) as a directory service. Whois++ is a directory
service. God help us all, even whois and finger are directory services. And
+1 xxx 555 1212 (or just "411" locally) on your telephone is a directory
service. But DNS? DNS is _not_ a directory service, never was one, never
will be one. The essence of my proposal (I've sent the URL around several
times for those who want to see the PostScript(tm) file of my recent paper)
is to _devalue_ these names, but to do it more quickly than evolution will
otherwise do it, since I would like to move proactively toward a naming
scheme ("name use context") that will not draw so many half baked marketroids
and trademark attorneys to the conclusion that having a name under .COM is
somehow the Internet equivilent of "official existence."
There are ~140,000 .COM names today. Laid end to end they take 2MB to store.
The table below (produced with Perl on my lovely 64-bit 266MHz Alpha) shows
the number of collisions (and the percentage of the total) for each prefix
length in the set of domains under .COM.
Chars #/Coll %/Coll
----- ------ ------
1 140583 100.0%
2 139605 99.3%
3 127635 90.8%
4 93747 66.7%
5 64092 45.6%
6 41667 29.6%
7 24911 17.7%
8 13340 9.5%
9 6765 4.8%
10 3145 2.2%
11 1415 1.0%
12 660 0.5%
13 321 0.2%
14 154 0.1%
15 66 0.0%
16 30 0.0%
17 12 0.0%
18 3 0.0%
19 2 0.0%
At 20 characters, there were no collisions. The percentages suffer
from print truncation errors but they are substantially correct.
I call .COM "full" because it takes eight (8) characters of typing
before a user has narrowed her possibilities to under 10% of the
total. If using a command interpreter (``shell'' to old timers)
with electric filename completion, one would say that a directory
with the above prefix distribution needed to be split into multiple
subdirectories because it was pretty much useless the way it was.
Randy asked "where's the meter" and I promised to try to answer him.
It's a rule of thumb as biased by all of my earlier definitions and
prose. If by typing the number of characters that the average user
is comfortable "just rattling off" quickly and from memory, you can
reach 90% of your destinations, a domain is not full. If by typing
that many characters you can only reach 50% of your destinations,
then the domain is quite full. Somewhere in between we have shades
of gray like "it's pretty full but I think I can cram another one
in there" and "this really hurts a lot but I'm going to eat one more
.COM is full. We can argue about how full and what shade of gray.
With 2,000,000 (or 20,000,000 or 250,000,000) names, it will be
absolutely absurd. Neither wise men (anybody?) or fools (me?) would
dare to venture in that direction.
We can do PVM's MES but it will solve the registry's economic problems
without doing any good at all against the real sociological problems
that _lead_ to the registry's economic problems (and a lot of other
problems as well.)
There is no way to scale up another couple of O(mag)'s without pain.
<URL:ftp://ftp.vix.com/pri/vixie/dns-badnames.psf.gz> is my position in full.
More information about the NANOG