Non-English Domain Names Likely Delayed
Neil Harris
neil at tonal.clara.co.uk
Mon Jul 18 15:40:35 UTC 2005
Michael, your idea of mapping confusable characters to a single "master"
character was one of the options which was considered, but rejected.
To see why, consider the Turkish dotless-i in your second example. Now,
to most non-Turkish readers, dotless-i is a homograph of the more common
dotted-i character. If we map both to ASCII code 105, we've eliminated
the homograph for non-Turkish users, but we then deny Turkish users the
useful distinction between the two letters. Adding epicycles to this
scheme with character-set tags, or filter rules based on locale setting
on the client unfortunately make things worse not better.
This example actually illustrates rather nicely why it is so important
that different TLDs, particularly ccTLDs, should be able to have
different rules. For example, it's possible (I don't know Turkish) that
there may be some pair of names in Turkish for which may be
distinguished entirely by the difference between dotted and dotless-i.
Any procedure for preventing spoofing must bear in mind the fact that
registries process vast numbers of registrations daily, and human
oversight is not generally possible in the general case.
Bundling using confusables-tables, with appropriate considerations for
cultural variations in what is confusable, is a much more effective
approach, and allows subtle distinctions to be retained for those labels
for which they are useful.
For example, the example of registering a dotless-i in a name registered
in .fr could be easily dealt with by bundling, even if for French
purposes dotted and dotless-i were normalized to the same equivalence
set of confusable characters, provided that no potentially confusable
French name had been registered first.
-- Neil
More information about the NANOG
mailing list