Non-English Domain Names Likely Delayed

Mon Jul 18 15:40:35 UTC 2005

Michael, your idea of mapping confusable characters to a single "master" 
character was one of the options which was considered, but rejected.

To see why, consider the Turkish dotless-i in your second example. Now, 
to most non-Turkish readers, dotless-i is a homograph of the more common 
dotted-i character. If we map both to ASCII code 105, we've eliminated 
the homograph for non-Turkish users, but we then deny Turkish users the 
useful distinction between the two letters. Adding epicycles to this 
scheme with character-set tags, or filter rules based on locale setting 
on the client unfortunately make things worse not better.

This example actually illustrates rather nicely why it is so important 
that different TLDs, particularly ccTLDs, should be able to have 
different rules. For example, it's possible (I don't know Turkish) that 
there may be some pair of names in Turkish for which may be 
distinguished entirely by the difference between dotted and dotless-i.

Any procedure for preventing spoofing must bear in mind the fact that 
registries process vast numbers of registrations daily, and human 
oversight is not generally possible in the general case.

Bundling using confusables-tables, with appropriate considerations for 
cultural variations in what is confusable, is a much more effective 
approach, and allows subtle distinctions to be retained for those labels 
for which they are useful.

For example, the example of registering a dotless-i in a name registered 
in .fr could be easily dealt with by bundling, even if for French 
purposes dotted and dotless-i were normalized to the same equivalence 
set of confusable characters, provided that no potentially confusable 
French name had been registered first.

-- Neil