Network Working Group                                         J. Klensin
Request for Comments: 4290                                 December 2005
Category: Informational
                Suggested Practices for Registration of
                  Internationalized Domain Names (IDN)

Status of This Memo


This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.


Copyright Notice


Copyright (C) The Internet Society (2005).




This RFC is not a candidate for any level of Internet Standard. The IETF disclaims any knowledge of the fitness of this RFC for any purpose and notes that the decision to publish is not based on IETF review apart from IESG review for conflict with IETF work. The RFC Editor has chosen to publish this document at its discretion. See RFC 3932 for more information.

このRFCはインターネットStandardのどんなレベルの候補ではありません。 IETFは、いかなる目的のために、このRFCのフィットネスの知識を否認し、公開する決定がIETF仕事との競合のためのIESGレビューとは別にIETFレビューに基づいていないことを指摘しています。 RFC Editorはその裁量でこの文書を公開することを選択しました。詳細については、RFC 3932を参照してください。



This document explores the issues in the registration of internationalized domain names (IDNs). The basic IDN definition allows a very large number of possible characters in domain names, and this richness may lead to serious user confusion about similar-looking names. To avoid this confusion, the IDN registration process must impose rules that disallow some otherwise-valid name combinations. This document suggests a set of mechanisms that registries might use to define and implement such rules for a broad range of languages, including adaptation of methods developed for Chinese, Japanese, and Korean domain names.


Table of Contents


   1. Introduction ....................................................3
      1.1. Background .................................................3
      1.2. The Nature and Status of these Recommendations .............4
      1.3. Terminology ................................................5
         1.3.1. Languages and Scripts .................................5
         1.3.2. Characters, Variants, Registrations, and Other
                Issues ................................................6
         1.3.3. Confusion, Fraud, and Cybersquatting ..................9
      1.4. A Review of the JET Guidelines .............................9
         1.4.1. JET Model .............................................9
         1.4.2. Reserved Names and Label Packages ....................10
      1.5. Languages, Scripts, and Variants ..........................11
         1.5.1. Languages versus Scripts .............................11
         1.5.2. Variant Selection ....................................13
      1.6. Variants are not a Universal Remedy .......................14
      1.7. Reservations and Exclusions ...............................14
         1.7.1. Sequence Exclusions for Valid Characters .............14
         1.7.2. Character Pairing Issues .............................15
      1.8. The Registration Bundle ...................................15
         1.8.1. Definitions and Structure ............................15
         1.8.2. Application of the Registration Bundle ...............16
   2. Some Implications of This Approach .............................17
   3. Possible Modifications of the JET Model ........................18
   4. Conclusions and Recommendations About the General Approach .....18
   5. A Model Table Format ...........................................19
   6. A Model Label Registration Procedure: "CreateBundle" ...........20
      6.1. Description of the CreateBundle Mechanism .................21
      6.2. The "no-variants" Case ....................................22
      6.3. CreateBundle and Nameprep Mapping .........................22
   7. IANA Considerations ............................................23
   8. Internationalization Considerations ............................24
   9. Security Considerations ........................................24
   10. Acknowledgements ..............................................25
   11. Informative References ........................................26
1. Introduction
1. はじめに
1.1. Background
1.1. バックグラウンド

The IDNA (Internationalized Domain Names in Applications) specification [RFC3490] defines the basic model for encoding non-ASCII strings in the DNS. Additional specifications [RFC3491] [RFC3492] define the mechanisms and tables needed to support IDNA. As work on these specifications neared completion, it became apparent that it would be desirable for registries to impose additional restrictions on the names that could actually be registered (e.g., see [IESG-IDN] and [ICANN-IDN]) to reduce potential confusion among characters that were similar in some way. This document explores these IDN (international domain name) registration issues and suggests a set of mechanisms that IDN registries might use.

IDNA(アプリケーションにおける国際化ドメイン名)の仕様[RFC3490]はDNSで非ASCII文字列を符号化するための基本的なモデルを定義します。追加の仕様[RFC3491] [RFC3492] IDNAをサポートするために必要な仕組みとテーブルを定義します。これらの仕様に関する作業が完了に近づいたとして、それはレジストリが実際に登録することができ名に追加の制限を課すことが望ましいであろうことが明らかになった混乱を減らすために(例えば、[IESG-IDN]と[ICANN-IDN]を参照)何らかの形で類似していた文字の中。この文書では、これらのIDN(国際化ドメイン名)の登録の問題を探求し、IDNレジストリを使用する可能性のある一連のメカニズムを示唆しています。

Registration restrictions are part of a long tradition. For example, while the original DNS specifications [RFC1035] permitted any string of octets in a DNS label, they also recommended the use of a much more restricted subset. This subset was derived from the much older "hostname" rules [RFC952] and defined by the "LDH" convention (for the three permitted types of characters: letters, digits, and the hyphen). Enforcement of this restricted subset in registrations was the responsibility of the registry or domain administrator. The definition of the subset was embedded in the DNS protocol itself, although some applications protocols, notably those concerned with electronic mail, did impose and enforce similar rules.


If there are no constraints on registration in a zone, people can register characters that increase the risk of misunderstandings, cybersquatting, and other forms of confusion. A similar situation existed even before the introduction of IDNA, as exemplified by domain names such as and (note that the latter domain contains the digit "1" instead of the letter "l").


For non-ASCII names (so-called "internationalized domain names" or "IDNs"), the problem is more complicated. In the earlier situation that led to the LDH (hostname) rules, all protocols, hosts, and DNS zones used ASCII exclusively in practice, so the LDH restriction could reasonably be applied uniformly across the Internet. Support for IDNs introduces a very large character repertoire, different geographical and political locations, and languages that require different collections of characters. The optimal registration restrictions are no longer a global matter; they may be different in different areas and, hence, in different DNS zones.

非ASCII名(いわゆる「国際化ドメイン名」または「IDNの」)の場合、問題はより複雑です。 LDH(ホスト名)のルールにつながった以前の状況では、すべてのプロトコル、ホスト、およびDNSゾーンは、実際に独占的にASCIIを使用するので、LDH制限は合理的にインターネットを介して均一に適用することができます。 IDNのためのサポートは非​​常に大きな文字レパートリ、異なる地理的、政治的な場所、および文字の異なるコレクションを必要とする言語を紹介します。最適な登録の制限は、もはや世界的な問題でありません。彼らは別のDNSゾーンで、それゆえ、さまざまな分野で異なっていてもよく。

For some human writing systems, there are characters and/or strings that have equivalent or near-equivalent usages. If a name can be registered with such a character or string, the registry might want to automatically associate all of the names that have the same meaning with the registered name. The registry might also decide whether the names that are associated with, or generated by, one registration should, as a group or individually, go into the zone or should be blocked from registration by different parties.


To date, the best-developed system for handling registration restrictions for IDNs is the JET Guidelines for Chinese, Japanese, and Korean [RFC3743], the so-called "CJK" languages. The JET Guidelines are limited to the CJK languages and, in particular, to their common script base. Those languages are also the best-known and most widely-used examples of writing systems constructed on "ideographic" or "pictographic" principles. This document explores the principles behind the JET guidelines. It then examines some of the issues that might arise in adapting them to alphabetic languages, i.e., to languages whose characters primarily represent sounds rather than meanings.

現在までに、のIDNの登録制限を処理するための最も開発されたシステムは、中国語、日本語、韓国語[RFC3743]、いわゆる「CJK」言語用のJETのガイドラインです。 JETのガイドラインは、彼らの共通のスクリプトベースに、特に、CJK言語に限定しています。これらの言語はまた、最もよく知られており、書記体系の中で最も広く使用されている例は、「表意文字」または「絵文字」の原則に構築しました。この文書では、JETのガイドラインの原理を探ります。その後、その文字主な音ではなく、意味を表す言語に、つまりアルファベットの言語、それらを適応に発生する可能性がある問題のいくつかを調べます。

This document describes five things:


1. The general background and considerations for non-ASCII scripts in names.


2. Suggested practices for describing character variants.

3. A method for using a zone's character variants to determine which names should be associated with a registration.


4. A format for publishing a zone's table of character variants; Such tables are referred to below simply as "language tables" or simply "tables".


5. A model algorithm for name registration given the presence of language tables.


1.2. The Nature and Status of these Recommendations
1.2. これらの勧告の自然とステータス

The document makes recommendations for consideration by registries and, where relevant, by those who coordinate them, and by those who use their services. None of the recommendations are intended to be normative. Instead, the intent of the document is to illustrate a framework for developing variations to meet the needs of particular registries and their processing of particular languages. Of course, if registries make similar decisions and utilize similar tools, costs and confusion may be reduced -- both between registries and for users and registrars who have relationships with more than one domain.

文書には、それらを調整する人によって、そのサービスを利用する人々によって、関連する場合、レジストリによる検討のための勧告を行うと。勧告はいずれも、規範的であることを意図されていません。代わりに、文書の意図は、特定のレジストリおよび特定の言語のそれらの処理のニーズを満たすためにバリエーションを開発するためのフレームワークを示すことです。レジストリは同様の意思決定を行い、同様のツールを活用場合はもちろん、コストや混乱を減少させることができる - レジストリの間と、複数のドメインと関係を持っているユーザーと登録機関のための両方。

Just as the JET Guidelines contain some suggestions that may not be applicable to alphabetic scripts, some of the suggestions here, especially the more specific ones, may be applicable to some scripts and not others.


1.3. Terminology
1.3. 用語
1.3.1. Languages and Scripts
1.3.1. 言語とスクリプト

This document uses the term "language" in what may be, to many readers, an odd way. Neither this specification, nor IDNA, nor the DNS are directly concerned with natural language, but only with the characters that make up a given label. In some respects, the term "script", used in the character coding community for a collection of characters, might be more appropriate. However, different subsets of the same script may be used with different languages, and the same language may be written using different characters (or even completely different scripts) in different locations, so "script" is not precisely correct either.


Long-standing confusion has also resulted from the fact that most scripts are, informally at least, named after one of the languages written in them. "Chinese" describes both a language and a collection of characters that are also used in writing Japanese, Korean, and, at least historically, some other languages. "Latin" describes a language, the characters used to write that language, and, often, characters used to write a number of contemporary languages that are derived from or similar to those used to write the Latin language. The script used to write the Arabic language is called "Arabic", but it is also used (typically with some additions or deletions) to write a number of other languages. Situations in which a script has a clearly-defined name that is independent of the name of a language are the exception, rather than the rule; examples include Hangul, used to write Korean, Katakana and Hiragana, used to write Japanese, and a few others. Some scholars have historically used "Roman" or "Roman-derived" for the script in an attempt to distinguish between a script and the Latin language.

長年の混乱はまた、ほとんどのスクリプトはそれらで書かれた言語の一つにちなんで名付けられた、非公式少なくとも、あるという事実に起因しています。 「中国の」言語と、少なくとも歴史的に日本、韓国、そして、いくつかの他の言語を書面で使用されている文字の集合の両方を説明します。 「ラテン語」は、多くの場合、言語、その言語を記述するために使用される文字を説明し、そして、文字が由来やラテン語を書くために使用されるものと類似している現代の言語の数を書き込むために使用されます。アラビア語を書くために使用するスクリプトは、「アラビア語」と呼ばれているが、他の言語の数を書くために(典型的には、いくつかの追加または削除を含む)にも使用されています。スクリプト言語の名前から独立して明確に定義された名前を持つという状況ではなく、ルールよりも、例外です。例として、韓国語を書くのに使用ハングル、カタカナやひらがな、日本語を書くために使用され、いくつかの他のものを含みます。一部の学者は、歴史的にスクリプトとラテン語を区別するための試みでスクリプトの「ローマ」または「ローマの由来」を使用していました。

The term "language" is therefore used in this document in the informal sense of a written language and is defined, for this purpose, by the characters used to write it, i.e., as a language-specific subset of a script. In this context, a "language" is defined by the combination of a code (see Section 1.4.1) and an authority that has chosen to use that code and establish a character-listing for it. Authorities are normally TLD (top-level domain) registries; see Section 7 and [IANA-language-registry]. However, it is expected that TLD registries will find appropriate experts and that advice from language and script experts selected by international neutral bodies will also become part of the registration system. In addition, as discussed below in Section 7, registries may conclude that the best interests of registrants, stakeholders, and the Internet community would be served by constructing "language tables" that mix scripts and characters in ways that conform to no known language. Conventions should be developed for such registrations that do not misleadingly reflect specific language codes.


1.3.2. Characters, Variants, Registrations, and Other Issues
1.3.2. キャラクター、バリアント、登録、およびその他の問題

1. Characters in this document are specified by their Unicode codepoints in U+xxxx format, by their official names, or both.

この文書の1文字は、その正式名称、またはその両方によって、U + xxxxの形式でのUnicodeのコードポイントによって指定されています。

2. The following terms are used in this document.

* String


A "string" is an sequence of one or more characters.


* Base Character


This document discusses characters that may have equivalent or near-equivalent characters or strings. A "base character" is a character that has zero or more equivalents. In the JET Guidelines, base characters are referred to as "valid characters". In a table with variants, as described in Section 5, the base characters occupy the first column. Normally (and always, if the recommendation of Section 6.3 is adopted), the base characters will be the characters that appear in registration requests from registrants; any other character will invalidate the registration attempt.

この文書では、同等またはほぼ同等の文字または文字列を有することができる文字について説明します。 「基本文字は」ゼロまたはそれ以上の当量を持っているキャラクターです。 JETのガイドラインでは、ベース文字は「有効な文字」と呼ばれています。セクション5に記載されるように変異体を持つテーブルでは、ベース文字が最初の列を占めます。 (6.3項の勧告を採用した場合には常に、)通常、ベース文字は、登録から登録要求に表示された文字であろう。他の文字は、登録の試みが無効になります。

* Native Script


Native script is the form in which the relevant string would normally be represented. For example, it might use Lower Slobbovian characters and the glyphs normally used to write them. It would not be punycode as a presentation form.


* Variant Characters/Strings


The "variant(s)" are character(s) and/or string(s) that are treated as equivalent to the base character. Note that these might not be exactly equivalent characters; a particular original character may be a base character with a mapping to a particular variant character, but that variant character may not have a mapping to the original base character. Indeed, the variant character may not appear in the base character list, and hence may not be valid for use in a registration. Usually, characters or strings to be designated as variants are considered either equivalent or sufficiently similar (by some registry-specific definition) that confusion between them and the base character might occur.


* Base Registration


The "base registration" is the single name that the registrant requested from the registry. The JET Guidelines use the term "label string" for this concept.

「ベース登録は、」登録者がレジストリから要求された単一の名前です。 JETのガイドラインは、このコンセプトのための用語「ラベル文字列」を使用します。

* Registered, Activated


A label (or "name") is described as "registered" if it is actually entered into a domain (i.e., into a zone file) by the registry, so that it can be accessed and resolved using standard DNS tools. The JET Guidelines describe a "registered" label as "activated". However, some domains use a slightly different registration logic in which a name can be registered with the registrar (if one is involved) and with the registry, but not actually entered into the zone file until an additional activation or delegation step occurs. This document does not make that distinction, but is compatible with it.

ラベル(または「名」)は、それが実際にレジストリによって(即ち、ゾーンファイルに)ドメインに入力された場合、それはアクセスと標準DNSツールを使用して解決することができるように、「登録」と記載されています。 JETのガイドラインは、「活性化」として「登録」のラベルを記述する。しかし、いくつかのドメイン名は、(一方が関連している場合)、レジストラに登録され、レジストリとが、追加の活性化又は委任ステップが発生するまで、実際のゾーンファイルに入力されないことが可能な若干異なる登録ロジックを使用します。この文書では、その区別をしますが、それと互換性がありません。

As specified in the IDNA Standard, the name actually placed in the zone file is always the internal ("punycode") form. There is no provision for actually entering any other form of an IDN into the DNS. It remains controversial, with different registrars and registries having adopted different policies, as to whether the registration, as submitted by the registrant, is in the form of:


o The native-script name, either in UTF-8 or in some coding specified by the registrar, or


o the internal-form ("punycode") name, or


o both forms of the name together, so that the registrar and registry can verify the intended translation.


If any of the approaches defined in this document is used, it is almost certain to be necessary that the native-script form of the requested string be available to the registry.


* Registration Bundle


A "registration bundle" is the set of all labels that come from expanding the base characters for a single name into their variants. The presence of a label in a registration bundle does not imply that it is registered. In the JET Guidelines, a registration bundle is called an "IDN Package".

「登録バンドルには、」それらの変形に単一の名前のための基本文字の拡大から来るすべてのラベルのセットです。登録バンドル中の標識の存在は、それが登録されていることを意味するものではありません。 JETのガイドラインでは、登録バンドルは「IDNパッケージ」と呼ばれています。

* Reserved Label


A "reserved label" is a label in a registration bundle that is not actually registered.


* Registry"


A "registry" is the administrative authority for a DNS zone. The registry is the body that enforces, and typically makes, policies that are used in a particular zone in the DNS.


* Coded Character Set


A "Coded Character Set" (CCS) is a list of characters and the code positions assigned to them. ASCII and Unicode are CCSs.

「コード化文字セット」(CCS)は、文字とそれらに割り当てられたコード位置のリストです。 ASCIIおよびUnicodeはCCSSです。

* Language


A "language" is something spoken by humans, independent of how it is written or coded. ISO Standard 639 and IETF BCP 47 (RFC 3066) [RFC3066] list and define codes for identifying languages.

「言語」は、それが書かまたはコード化されているかの独立した人間によって話さ何か、です。 ISO標準639及びIETF BCP 47(RFC 3066)[RFC3066]リストと言語を識別するためのコードを定義します。

* Script

* 脚本

A "script" is a collection of characters (glyphs, independent of coding) that are used together, typically to represent one or more languages. Note that the script for one language may heavily overlap the script for another. This does not imply that they have identical scripts.


* Charset


"Charset" is an IETF-invented term to describe, more or less, the combination of a script, a CCS that encodes that script, and rules for serializing encoded bytes that are stored on a computer or transmitted over the network.


The last four of these definitions are redundant with, but deliberately somewhat less precise than, the definitions in [RFC3536], which also provides sources. The two sets of definitions are intended to be consistent.


1.3.3. Confusion, Fraud, and Cybersquatting
1.3.3. 混乱、詐欺、およびサイバースクワッティング

The term "confusion" is used very generically in this document to cover the entire range from accidental user misperception of the relationship between characters with some characteristic in common (typically appearance, sound, or meaning) to cybersquatting and (other) deliberately fraudulent attempts to exploit those relationships based on the nature of the characters.


1.4. A Review of the JET Guidelines
1.4. JETガイドラインの見直し
1.4.1. JET Model
1.4.1. JETモデル

In the JET Guidelines model, a prospective registrant approaches the registry for a zone (perhaps through an intermediate registrar) with a candidate base registration -- a proposed name to be registered -- and a list of languages in which that name is to be interpreted. The languages are defined according to the fairly high-resolution coding of [RFC3066] or, if the registry considers it more appropriate, a coding based on scripts such as those in [LTRU-Registry]. In this way, Chinese as used on the mainland of the People's Republic of China ("zh-cn") can, at registry option, consist of a somewhat different list of characters (code points) and be represented by a separate table compared to Chinese as used in Taiwan ("zh-tw").

登録する提案名 - - と、その名前が解釈されるべきである言語のリストJETガイドラインモデルでは、将来の登録者は、候補基地登録と(恐らく中間レジストラを介して)ゾーンのレジストリに接近します。言語がかなり高解像度符号化に従って定義される[RFC3066]、または、レジストリは、それがより適切と考えるならば、このような[LTRU-レジストリ]におけるもののようなスクリプトに基づく符号化。このように、中国、レジストリオプションで、文字(コードポイント)の多少異なるリストで構成することができ、中国の人民共和国(「ZH-CN」)の本土で使用されるようにとに比べて、別のテーブルで表現すること台湾( "ZH-TW")で使用される中国人。

The design of the JET Guidelines took one important constraint as a basis: IDNA was treated as a firm standard. A procedure that modified some portion of the IDNA functions, or was a variant on them, was considered a violation of those standards and should not be encouraged (or, probably, even permitted).


Each registry is expected to construct (or obtain) a table for each language it considers relevant and appropriate. These tables list, for the particular zone, the characters permitted for that language. If a character does not appear as a base character (called a "valid code point" in the JET document) in that table, then a name containing it cannot be registered. If multiple languages are listed for the registration, then the character must appear in the tables for each of those languages.


The tables may also contain columns that specify alternate or variant forms of the valid character. If these variants appear, they are used to synthesize labels that are alternatives to the original one. These labels are all reserved and can be registered or "activated" (placed into the DNS) only by the action or request of the original registrant; some (the "preferred variant labels") are typically registered automatically. The zone is expected to establish appropriate policies for situations in which the variant forms of one label conflict with already-reserved or already-registered labels.


Most of these concepts were introduced because of concerns about specific issues with CJK characters, beginning from the requirement that the use of Simplified Chinese by some registrants and Traditional Chinese by others not be permitted to create confusion or opportunities for fraud. While they may be applicable to registry tables constructed for alphabetic scripts, the translation should be done with care, since many analogies are not exact.


Some of the important issues are discussed in the sections that follow, especially Section 3. The JET model may be considered as a variation on, and inspiration for, the model and method presented by the rest of this document, although the JET model has been completely developed only for CJK characters. Other languages or scripts, especially alphabetic ones, may require other variations.


1.4.2. Reserved Names and Label Packages
1.4.2. 予約名とラベルパッケージ

A basic assumption of the JET model is that, if the evolution of specific characters or the properties of Unicode [Unicode] [Unicode32] or IDNA cause two strings to appear similar enough to cause confusion, then both should be registered by the same party or one of them should become unregisterable. The definition of "appear similar enough" will differ for different cultures and circumstance, and hence DNS zones, but the principle is fairly general. In the JET model, all of the variant strings are identified, some are registered into the DNS automatically, and others are simply reserved and can be registered, if at all, only by the original registrant. Other zones might find other policies appropriate. For example, a zone might conclude that having similar strings registered in the DNS was undesirable. If so, the list of variant strings would be used only to build a list of names that would be reserved and prohibited from being registered.

JETモデルの基本的な前提がある特定の文字の進化やユニコード[UNICODE] [Unicode32]またはIDNAの特性は、2つの文字列が混乱を引き起こすほど似て見えることがあれば、両方とも同じパーティによって登録されなければならない、ということかそのうちの一つは、登録できないとなるべき。定義は、異なる文化や環境、ひいてはDNSゾーンごとに異なります「十分似て見える」が、原則はかなり一般的です。 JETモデルでは、文字列が識別される変異体のすべてが、いくつかは自動的にDNSに登録され、そして他のものは、単に予約されており、すべてであれば、唯一のオリジナル登録によって、登録することができます。他のゾーンは、適切な他のポリシーを見つけるかもしれません。たとえば、ゾーンがDNSに登録された類似文字列を持つことが望ましくなかったと結論することがあります。その場合、バリアント文字列のリストが登録されてから予約し、禁止されている名前のリストを構築するためにのみ使用されます。

1.5. Languages, Scripts, and Variants
1.5. 言語、スクリプト、およびバリアント
1.5.1. Languages versus Scripts
1.5.1. スクリプト言語対

Conversations about scripts -- collections of characters associated with particular languages -- are common when discussing character sets and codes. However, the boundaries between one script and another are not well-defined. The Unicode Standard ([Unicode], [Unicode32]), for example, does not define script boundaries at all, even though it is structured in terms of usually-related blocks of characters. The issue is complicated by the common origin of most alphabetic scripts in use in the world today (see, for example, [Drucker] or the more scholarly [Daniels]).

特定の言語に関連付けられた文字の集合 - - スクリプトについての会話は、文字セットとコードを議論する際に共通しています。しかし、ひとつのスクリプトと他の間の境界は明確に定義されていません。 Unicode標準([UNICODE]、[Unicode32])は、例えば、それは文字の通常関連ブロック単位で構成されていても、まったくスクリプト境界を定義していません。問題は、世界での使用、今日のほとんどのアルファベットスクリプト([ドラッカー]以上の学術[ダニエル]、例えば、参照)の共通の起源によって複雑になります。

Because of that history, certain characters (or, more precisely, symbols representing characters) appear in the scripts associated with multiple languages, sometimes with very different sounds or meanings. This differs from the CJK situation in which, if a character appears in more than one of the relevant languages, it will usually have the same interpretation in each one. For the subset of characters that actually are ideographs or pictographs, pronunciation is expected to vary widely while meaning is preserved. At least in part because of that similarity of meaning, it made sense in the JET case to permit a registration to specify multiple languages, to verify that the characters in the label string (the requested "Base registration") were valid for each, and then to generate variant labels using each language in turn. For many alphabetic languages, it may be more sensible to prohibit the label string submitted for registration from being associated with more than one language. Indeed, "one label, one language" has been suggested as an important barrier against common sources of "look-alike" confusion. For example, the imposition of that rule in a zone would prevent the insertion of a few Greek or Cyrillic characters with shapes identical to the Latin ones into what was otherwise a Latin-based string. For a particular table, the list of base characters may be thought of as the script associated with the relevant language, with the understanding that the table design does not prevent the same character from appearing in the tables for multiple languages.


Indeed, this notion of a script that is local and specifically identified can be turned around: so-called "language tables" are associated with languages only insofar as thinking about the character structure and word forms associated with a given language helps to inform the construction of the table. A country like Finland, for example, might select among:


o One table each for Finnish, Swedish, and English characters and conventions, permitting a string to be registered in one, two, or all three languages. However, a three-language registration would necessarily prohibit any characters that did not appear in all three languages, since the label would make little sense otherwise.


o One table each, but with a "one label, one language" rule for the zone.


o A combined table based on the observation that all three writing systems were based on Roman characters and that the possibilities for confusion of interest to the registry would not be reduced by "language" differentiation. This option raises an interesting issue about language labeling as described in Section 1.4.1; see the discussion in Section 7 below.

すべての3つの書き込みシステムはローマ字に基づいていたという観察に基づいて合成テーブルOおよびレジストリへの関心の混乱の可能性は「言語」分化によって減少することはないだろうと。 1.4.1項で説明したように、このオプションは、言語のラベル付けについての興味深い問題を提起します。以下のセクション7での議論を参照してください。

Regardless of what decisions were made about those languages and scripts, they might have a separate table for registration of labels containing Cyrillic characters. That table might contain some Roman-derived characters (either as base characters or as variants), just as some CJK tables do. See also Section 2, below.


Tables that present multiple languages, as described above, have introduced confusion and discomfort among those who have failed to understand these definitions. The consequence of these definitions is that use of a language or script code in a registration is a mnemonic, rather than a normative statement about the language or script itself. When that confusion is likely to occur, it is appropriate to simply use the registry identifier and a sequence number to identify the registration.


As the JET Guidelines stress, no tables or systems of this type -- even if identified with a language as a means of defining or describing the table -- can assure linguistic or even syntactic correctness of labels with regard to that language. That assurance may not be possible without human intervention or at least dictionary lookups of complete proposed labels. It may even not be desirable to attempt that level of correctness (see Section 2).

JETガイドライン応力として、このタイプのないテーブルまたはシステム - テーブルを定義または説明の手段として言語を用いて同定としても - はその言語に関してラベルの言語あるいは構文正しさを保証することはできません。その保証は人間の介入または完全な提案ラベルの少なくとも辞書検索せずにできない場合があります。それも正確にそのレベルを試行することが望ましくないかもしれない(セクション2を参照)。

Of course, if any language-based tests or constraints, including "one label, one language", are to be applied to limit the associated sources of confusion, each zone must have a table for each language in which it expects to accept registrations. The notion of a single combined table for the zone is, in the general case, simply unworkable. One could use a single table for the zone if the intent were to impose only minimal restrictions, e.g., to force alphabetic and numeric characters only, excluding symbols and punctuation. That type of restriction might be useful in eliminating some problems, such as those of unreadable labels, but it would be unlikely to be very helpful with, e.g., confusion caused by similar-looking characters.


1.5.2. Variant Selection
1.5.2. バリアント選定

The area of character variants is rife with difficulties (and perhaps opportunities). There is no universal agreement about which base characters have variants, or if they do, what those variants are. For example, in some regions of the world and in some languages, LATIN SMALL LETTER O WITH DIAERESIS (U+00F6) and LATIN SMALL LETTER O WITH STROKE (U+00F8) are variants of each other, while in other regions, most people would think that LATIN SMALL LETTER O WITH STROKE has no variants. In some cases, the list of variants is difficult to enumerate. For example, it required several years for the Chinese language community to create variant tables for use with IDNA, and it remains, at the time of this writing, questionable how widely those tables will be accepted among users of Chinese from areas of the world other than those represented by the groups that created them.

文字バリアントの面積は困難(そしておそらく機会)でいっぱいです。ベース文字がバリアントを持っているか、彼らがしなければ、これらの変異体がどのようなものかについての普遍的な合意はありません。他の地域の間、例えば、世界のいくつかの地域では、いくつかの言語で、LATIN SMALL LETTERダイエレシスWITH O(U + 00F6)とストロークでLATIN SMALL LETTER O(U + 00F8)は、ほとんどの人同士の変異体でありますLATIN SMALL LETTER O WITHストロークが何のバリアントを持っていないと思うだろう。いくつかのケースでは、変異体のリストが列挙することは困難です。例えば、これらのテーブルは、他の世界の地域から中国のユーザーの間で受け入れられますか広くそれはIDNAで使用するバリアントのテーブルを作成するために、中国の言語コミュニティのための数年を必要とし、それが残って、この記事の執筆時点では、疑わしいですそれらを作成したグループで表されるものより。

Thus, the first thing a registry should ask is whether or not any of the characters that they want to permit to be used have variants. If not, the registry's work is much simpler. This is not to say that a registry should ignore variants if they exist: adding variants after a registry has started to take registrations will be nearly as difficult administratively as removing characters from the list of acceptable characters. That is, if a registry later decides that two characters are variants of each other, and there are actively-used names in the zones that differ only on the new variants, the registry might have to transfer ownership of one of the names to a different owner, using some process that is certain to be controversial.


This situation in likely to be much easier for areas and zones that use characters that previously did not occur in the DNS at all than it will be for zones in which non-English labels have been registered in ASCII characters for some time, presumably because the language of interest uses additional "Latin" characters with some conventions when only ASCII is available. In the former case, the rules and conventions can be established before any registrations occur. In the latter, there may be conflicts or opportunities for confusion between existing registrations and now-permitted Roman-based characters that do not appear in ASCII. For example, a domain name might exist today that uses the name of a city in Canada spelled as "Montreal". If the zone in which it occurs changes its rules to permit the use of the character LATIN SMALL LETTER E WITH ACUTE (U+00E9), does the name of the city, spelled (correctly) using that character, conflict with the existing domain name registration?

おそらくので、それが英語以外のラベルがいくつかの時間のためにASCII文字で登録されたゾーンになりますより以前のすべてのDNSで発生していない文字を使用する地域やゾーンの方がはるかに簡単になりそうで、このような状況興味のある言語は、ASCIIのみが利用可能ないくつかの規則で追加の「ラテン」の文字を使用しています。任意の登録が発生する前に前者の場合には、ルールや規則を確立することができます。後者では、既存の登録およびASCIIに表示されていない今許可ローマンベースの文字の間の混乱のための競合や機会があるかもしれません。たとえば、ドメイン名は「モントリオール」と綴らカナダの都市の名前を使用して、今日存在する可能性があります。それが発生したゾーンが文字ACUTE(U + 00E9)WITH LATIN SMALL LETTER Eの使用を可能にするために、そのルールを変更した場合、都市の名前は、その文字を使用して(正しく)綴らない、既存のドメイン名と競合登録?

Certainly, if both are permitted, and permitted to be registered by separate parties, there are many opportunities for confusion.


Of course, zone managers should inform all current registrants when the registration policy for the zone changes. This includes the times when IDN characters are first allowed in the zone, when additional characters are permitted, and when any change occurs in the character variant tables.


Many languages contain two variants for a character, one of which is strongly preferred. A registry might restrict the base registration to the preferred form, or it might allow any form for the base registration. If the variant tables are created carefully, the resulting bundles will be the same, but some registries will give special status to the base registration such as its appearance in "Whois" databases.


1.6. Variants are not a Universal Remedy
1.6. 変異体は、ユニバーサル対策ではありません

It is worth stressing that there are many obvious opportunities for confusion that variant systems, by virtue of being based on processing of individual characters, cannot address. For example, if a language can be written with more than one script, or transliterations of the language into another script are common, variant models are insufficient to prevent conflicting registration of the related forms. Avoiding those types of problems would require different mechanisms, perhaps based on phonetic or natural language processing techniques for the entire proposed base registration.


1.7. Reservations and Exclusions
1.7. 予約と除外
1.7.1. Sequence Exclusions for Valid Characters
1.7.1. 有効な文字のシーケンスの除外

The JET Guidelines are based on processing only single characters. Pairs or longer sequences of characters can, at the option of the registry, be handled through what the Guidelines describe as "additional processing". These registry-specific string processing procedures are specifically permitted by the guidelines to supplement the per-character processing that generates the variants.


A different zone with different needs could use a modified version of the table structure, or different types of additional processing, to prohibit particular sequences of characters by marking them as invalid, and to accept characters by marking them as valid. Other modifications or extensions might be designed to prevent certain letters from appearing at the beginning or end of labels. The use of regular expressions in the "valid characters" column might be one way to implement these types of restrictions, but there has been no experience so far with that approach.

異なるニーズを持つ異なるゾーンは無効としてそれらをマークすることにより、文字の特定のシーケンスを禁止するために、テーブル構造、又は追加の処理の異なるタイプの修正バージョンを使用することができ、そしてそれらを有効なものとしてマークすることによって文字を受け入れます。その他の変更や拡張は、ラベルの最初または最後に現れるから特定の文字を防ぐように設計されることがあります。 「有効な文字」列内の正規表現の使用を制限、これらのタイプを実装する1つの方法かもしれませんが、そのアプローチで、これまで全く経験がなかったです。

In particular, in some scripts derived from Roman characters, sequences that have historically been typographically represented by single "ligature" or "digraph" characters may also be represented by the separate characters (e.g., "ae" for U+00E6 or "ij" for U+0133). If it is desired to either prohibit these, or to treat them as variants, some extensions to the single-character JET model may be needed. Some careful thinking about IDNA (especially nameprep) may also be needed, since some of these combinations are excluded there).

具体的には、いくつかのスクリプトでは(例えば、「AE」U + 00E6またはローマ字、歴史的に活字も、別の文字で表すことができるシングル「合字」または「有向グラフ」の文字で表されている配列に由来する「IJ」 )U + 0133のために。それはこれらを禁止するか、または変種として扱うことが望まれる場合には、単一文字のJETモデルにいくつかの拡張が必要な​​場合があります。これらの組み合わせのいくつかがそこに除外されているので、IDNA(特にNAMEPREP)に関するいくつかの慎重な思考も)、必要かもしれません。

1.7.2. Character Pairing Issues
1.7.2. 文字ペアリングの問題

Some character pairings -- the use of a character form (glyph) in one language and a different form with the same properties in a related one -- closely approximate the issues with mapping between Traditional and Simplified Chinese, although the history is different. For example, it might be useful to have "o" with a stroke (U+00F8) as a variant for "o" with diaeresis above it (U+00F6) (and the equivalent upper-case pair) in a Swedish table, and vice versa in a Norwegian one, or to prohibit one of these characters entirely in each table. In a German table, U+00F8 would presumably be prohibited, while U+00F6 might have "oe" as a variant. Obviously, if the relevant language of registration is unknown, this type of variant matching cannot be applied in any sensible way.

いくつかの文字のペア - 1つの言語との関連1で同じプロパティを持つ別の形で文字形式(グリフ)の使用は - 歴史が異なるが密接に、繁体字と簡体字中国語の間のマッピングの問題を近似します。例えば、「o」はスウェーデン語テーブルに上記分音符号を有する「O」(U + 00F6)(および同等の大文字対)のバリアントとしてストローク(U + 00F8)と有することが有用であるかもしれませんその逆もノルウェーのいずれかに、または完全に各テーブルにこれらの文字のいずれかを禁止します。 U + 00F6はバリアントとして「OE」を持っているかもしれないが、ドイツの表では、U + 00F8はおそらく、禁止されるだろう。登録の該当する言語が不明の場合は明らかに、変異型マッチングのこのタイプは、任意の賢明な方法で適用することはできません。

1.8. The Registration Bundle
1.8. 登録バンドル
1.8.1. Definitions and Structure
1.8.1. 定義と構造

As one of its critical innovations, the JET model defines an "IDN package", known in this document as a "registration bundle", which consists of the primary registered string (which is used as the name of the bundle), the information about the language table(s) used, the variant labels for that string, and indications of which of those labels are registered in the relevant zone file ("activated" in the JET terminology). Registration bundles are also atomic -- one can not add or remove variant labels from one without unregistering the entire package. A label exists in only one registration bundle at a time; if a new label is registered that would generate a variant that matches one that appears in an existing package, that variant simply is not included in the second package. A subsequent de-registration of the first package does not cause the variant to be added to the second. While it might be possible to change this in other models, the JET conclusion was that other options would be far too complex to implement and operate and would cause many new types of name conflicts.

その重要な技術革新の一つとして、JETモデル(バンドルの名前として使用されている)一次登録の文字列で構成され、「登録バンドル」として本書で知られている「IDNパッケージ」に関する情報を定義します言語テーブル(単数または複数)を使用し、これらのラベルの関連するゾーンファイルに登録された変異体、その文字列のラベル、及び適応症は(JET用語で「活性化」)。登録バンドルはまた、原子ます - 1は、全体のパッケージの登録を解除することなく、1からバリアントのラベルを追加または削除することはできません。ラベルは、一度に1つの登録バンドルに存在します。新しいラベルは、既存のパッケージに表示されたものと一致するバリアントを生成することが登録されている場合、その変異体は単に第二のパッケージに含まれていません。最初のパッケージのその後の登録解除は、変異体が第二に追加されることはありません。他のモデルでこれを変更することは可能かもしれませんが、JETの結論は、他のオプションを実装して動作するようにあまりにも複雑になり、名前の競合の多くの新しい種類を引き起こすというものでした。

1.8.2. Application of the Registration Bundle
1.8.2. 登録バンドルの応用

A registry has three options for handling the case where the registration bundle contains more than one label. The policy options are:


o Register and resolve all labels in the zone, making the zone information identical to that of the registered labels. This option will allow end users to find names with variants more easily, but will result in larger zone files. For some language tables, the zone file could become so large that it could negatively affect the ability of the registry to perform name resolution. If the base registration contains several characters that have equivalents, the owner could end up having to take care of large numbers of zones. For instance, if DIGIT ONE is a variant of LATIN SMALL LETTER L, the owner of the domain name will have to manage 32 zones. If the intent is to keep the contents of those zones identical, the owner may then face a significant administrative problem. If other concerns dictate short times to live and absolute consistency of DNS responses, the challenges may be nearly impossible.

Oレジスタと登録されたラベルのようにゾーン情報が同一作り、ゾーン内のすべてのラベルを解決します。このオプションは、エンドユーザーがより簡単に変形して名前を見つけることができるようになりますが、より大きなゾーンファイルになります。いくつかの言語テーブルの場合、ゾーンファイルは、それがマイナスの名前解決を行うために、レジストリの能力に影響を与えることができるように大きくなる可能性があります。ベースの登録は同等のものがあり、いくつかの文字が含まれている場合、所有者は、ゾーンの多数の世話をするために持ってしまう可能性があります。 DIGIT ONEはLATIN SMALL LETTER Lの変形である場合たとえば、ドメイン名all-lollypops.example.comの所有者は、32ゾーンを管理する必要があります。目的は、同一のこれらのゾーンの内容を維持することである場合は、所有者は、重要な行政の問題に直面する可能性があります。他の懸念が生きるために短い時間とDNS応答の絶対的な一貫性を決定した場合、課題はほぼ不可能かもしれません。

o Block all labels other than the registered label so they cannot be registered in the future. This option does not increase the size of the zone file and provides maximum safety against false positives, but it may cause end users to not be able to find names with variants that they would expect. If the base registration contains characters that have equivalents, Internet users who do not know what base characters were used in the registration will not know what character to type in to get a DNS response. For instance, if DIGIT ONE is a variant of LATIN SMALL LETTER L, and LATIN SMALL LETTER L is a variant of DIGIT ONE, the user who sees "" will not know whether to type a "1" or a "l" after the "pa" in the first label.

Oブロックので、彼らが将来的に登録することはできません登録ラベル以外のすべてのラベル。このオプションは、ゾーンファイルのサイズを大きくし、誤検出に対して最大の安全性を提供しますが、それは、エンドユーザーが期待するの変種で名前を見つけることができないために発生することがありません。ベースの登録は同等を持つ文字が含まれている場合は、登録時に使用されたものをベース文字を知らないインターネットユーザーは、DNSの応答を取得するために入力するどのような文字を知ることができません。 DIGIT ONEはLATIN SMALL LETTER Lのバリアント、およびLATIN SMALL LETTERのLであれば例えば、DIGIT ONE、見ているユーザーの変異体である「」「1」かを入力するかどうかを知ることができません最初のラベルの「PA」の後に「L」。

o Resolve some labels and block some other labels. This option is likely to cause the most confusion with users because including some variants will cause a name to be found, but using other variants will cause the name to be not found. For example, even if people understood that DIGIT ONE and LATIN SMALL LETTER L were variants, a typical DNS user wouldn't know which character to type because they wouldn't know whether this pair were used to register or block the labels. However, this option can be used to balance the desires of the name owner (that every possible attempt to enter their name will work) with the desires of the zone administrator (to make the zone more manageable and possibly to be compensated for greater amounts of work needed for a single registration). For many circumstances, it may be the most attractive option.

Oいくつかのラベルを解決し、いくつかの他のラベルをブロックします。いくつかの亜種を含むことは発見される名前が発生しますが、他の変異体を使用すると、名前が見つからなかったことになりますので、このオプションは、ユーザーと最も混乱を引き起こす可能性があります。例えば、人々は、一般的なDNSのユーザーは、彼らがこのペアは、登録やラベルをブロックするために使用されたかどうかはわかりませんので、入力した文字を知っているではないだろう、DIGIT ONEおよびLATIN SMALL LETTER Lが変形したものと理解しても。ただし、このオプションは、ゾーンを管理しやすくするために(ゾーン管理者の欲望と(自分の名前を入力するには、すべての可能な試みが動作すること)名の所有者の欲望のバランスをとるために使用することができ、おそらくより多くの量のために補償されます単一の登録のために必要な作業)。多くの状況では、それが最も魅力的な選択肢かもしれません。

In all cases, at least the registered label should appear in the zone. It would be almost impossible to describe to name owners why the name that they asked for is not in the zone, but some other name that they now control is. By implication, if the requested label is already registered, the entire registration request must be rejected.


2. Some Implications of This Approach

Historically, DNS labels were considered to be arbitrary identifier strings, without any inherent meaning. Even in ASCII, there was no requirement that labels form words. Labels that could not possibly represent words in any Romance or Germanic language (the languages that have been written in "Latin" scripts since medieval times or earlier) have actually been quite common. In general, in those languages, words contain at least one vowel and do not have embedded numbers. As a result, a string such as "bc345df" cannot possibly be a "word" in these languages. More generally, the more one moves toward "language"-based registry restrictions, the less it is going to be possible to construct labels out of fanciful strings. While fanciful strings are terrible candidates for "words", they may make very good identifiers. To take a trivial example using only ASCII characters, "rtr32w", "rtr32x", and "rtr32z" might be very good DNS labels for a particular zone and application. However, given the embedded digits and lack of vowels, they, like the "bc345df" example given above, would fail even the most superficial of tests for valid English (or German or French (etc.)) word forms.

歴史的に、DNSラベルは任意の固有の意味せず、任意の識別文字列であると考えられました。でもASCIIで、フォームの言葉をラベル必要はありませんでした。おそらく任意のロマンスやゲルマン語の単語を表すことができなかったラベル(中世以前以来、「ラテン」のスクリプトで書かれている言語は)実際には非常に一般的となっています。一般的には、これらの言語では、単語が少なくとも一つの母音を含み、埋め込まれた数字を持っていません。その結果、「bc345df」などの文字列は、おそらくこれらの言語で「言葉」にすることはできません。より一般的には、「言語」ベースのレジストリ制限に向けてより多くの1点の移動は、あまり架空の文字列のうち、ラベルを構築することが可能になるだろう。架空の文字列は「言葉」のためにひどい候補であるが、それらは非常に良いの識別子を行うことができます。 ASCII文字のみを使用して簡単な例を取るために、「rtr32w」、「rtr32x」、および「rtr32z」とは、特定のゾーンやアプリケーションのための非常に良いDNSラベルであるかもしれません。しかし、埋め込まれた数字と母音の欠如与えられ、彼らは、上記の「bc345df」の例のように、も、最も有効な英語のテストの表層(またはドイツ語やフランス語(その他))語形を失敗していました。

It is worth noting that several DNS experts have suggested that a number of problems could be solved by prohibiting meaningful names in labels, requiring instead that the labels be random or nonsense strings. If methods similar to those discussed in this document were used to force identifiers to be closer to meaningful words in real languages, the result would be directly contradictory to those "random name" approaches.


Interestingly, if one were trying to develop an "only words" system, a rather different -- but very restrictive -- model could be developed using lookups in a dictionary for the relevant language and a listing of valid business names for the relevant area. If a string did not appear in either, it would not be permitted to be registered. Models that require a prior national business listing (or registration) that is identical to the proposed domain name label have historically been used to restrict registrations in some country-code top level domains, so this is not a new idea. On the other hand, if look-alike characters are a concern, even that type of rule (or restriction) would still not avoid the need to consider character variants.

1は「言葉だけ」システムを開発しようとしていた場合は興味深いことに、かなり異なった - しかし、非常に制限 - モデルは、関連する言語及び関連するエリアの有効なビジネス名のリストについては、辞書で検索を使用して開発することができます。文字列はどちらかに表示されていなかった場合は、登録することが許可されません。提案されたドメイン名のラベルと同じである前に、国家事業リスト(または登録)が必要なモデルは歴史的に、いくつかの国コードトップレベルドメインでの登録を制限するために使用されているので、これは新しいアイデアではありません。一方、そっくりの文字が懸念される場合には、規則(または制限)のも、その種類は、まだ文字の変形を検討する必要性を回避しません。

Consequently, registries applying the principles outlined in this document should be careful not to apply more severe restrictions than are reasonable and appropriate while, at the same time, being aware of how difficult it usually is to add restrictions at a later time.


3. Possible Modifications of the JET Model

The JET model was designed for CJK characters. The discussion above implies that some extensions to it may be needed to handle the characteristics of various alphabetic scripts and the decisions that might be made about them in different zones. Those extensions might include facilities to process:


o Two-character (or more) sequences, such as ligatures and typographic spelling conventions, as variants.

このような変異体として合字とタイポグラフィのスペルの規則として、O 2文字(またはそれ以上)のシーケンス、。

o Regular expressions or some other mechanism for dealing with string positions of characters (e.g., characters that must, or must not, appear at the beginning or end of strings).


o Delimiter breaks to permit multiple languages to be used, separately, within the same label. E.g., is it possible to define a label as consisting of two or more components, each in a different language, with some particular delimiter to define the boundaries of the components?


4. Conclusions and Recommendations About the General Approach

After examining the implications of the potential use of the full range of characters permitted by IDNA in DNS labels, multiple groups, including IESG [IESG-IDN] and ICANN [ICANN-IDN] [ICANN-IDN2], have concluded that some restrictions are needed to prevent many forms of user confusion about the actual structure of a name or the word, phrase, or term that it appears to spell out. The best way to approach such restrictions appears to draw from the language and culture of the community of registrants and users in the relevant zone: if particular characters are likely to be surprising or unintelligible to both of those groups, it is probably wise to not permit them to be used in registrations. Registration restrictions can be carried much further than restricting permitted characters to a selected Unicode subset. The idea of a reserved "bundle" of related labels permits probably-confusing combinations or sets of characters to be bound together, under the control of a single registrant. While that registrant might still use the package in a way that confused his or her own users (the approach outlined here will not prevent either ill-though-out ideas or stupidity), the possibility of turning potential confusion into a hostile attack would be considerably reduced.

DNSラベルにIDNAによって許可文字の完全な範囲の潜在的な使用の影響を検討した後に、IESG [IESG-IDN]とICANN [ICANN-IDN] [ICANN-IDN2]を含む複数のグループは、いくつかの制限があると結論づけています名前や単語、フレーズ、またはそれを綴るように思われる用語の実際の構造に関するユーザーの混乱の多くの形態を防止するために必要。そのような制限をアプローチする最良の方法は、関連するゾーンでの登録者とユーザーのコミュニティの言語と文化から描くように表示されます。特定の文字は、これらのグループの両方に驚くべきか理解不能である可能性が高い場合、それは許さないし、おそらく賢明ですそれらの登録に使用されます。登録制限が選択されたUnicodeのサブセットに許可された文字を制限するよりもはるかにさらに実施することができます。関連するラベルの予約「バンドル」の考え方は、おそらく、混乱の組み合わせや文字のセットは、単一の登録者の制御下で、一緒に結合することを可能にします。その登録者はまだ彼または彼女自身のユーザーを混乱方法でパッケージを使用するかもしれませんが、敵対的な攻撃に混乱を回すの可能性がかなりだろう(ここで説明するアプローチは、どちらか悪いかのアウトアイデアや愚かさを防ぐことはできません)減少。

At the same time, excessive restrictions may make DNS identifiers less useful for their original purpose: identifying particular hosts and similar resources on the network in an orderly way. Registries creating rules and policies about what can be registered in particular zones -- whether those are based on the JET Guidelines or the suggestions in this document -- should balance the need for restrictions against the need for flexibility in constructing identifiers.

同時に、過度な制限が彼らの本来の目的のためのDNS識別子はあまり有用なものにすることがあります。秩序ある方法で、ネットワーク上の特定のホストと同様のリソースを特定します。レジストリの特定のゾーンに登録することができるかについてのルールやポリシーを作成する - これらは、この文書に記載されているJETのガイドラインや提案に基づいているかどうかは、 - 識別子を構築する上での柔軟性の必要性に対する制限の必要性のバランスをとる必要があります。

The discussion above provides many options that could be selected, defined, and applied in different ways in different registries (zones). Registrars and registrants would almost certainly prefer systems in which they can predict, at least to a first order approximation, the implications of a particular potential registration. Predictability of that sort probably requires more standards, and less flexibility, than the model itself might suggest.


5. A Model Table Format
5. Aモデル表のフォーマット

The format of the table is meant to be machine-readable but not human-readable. It is fairly trivial to convert the table into one that can be read by people.


Each character in the table is given in the "U+" notation for Unicode characters. The lines of the table are terminated with either a carriage return character (ASCII 0x0D), a linefeed character (ASCII 0x0A), or a sequence of carriage return followed by linefeed (ASCII 0x0D 0x0A). The order of the lines in the table may or may not matter, depending on how the table is constructed.

表中の各文字は、Unicode文字の「U +」表記で与えられています。テーブルの行は改行文字(ASCII 0x0Dの)、改行文字(ASCIIは0x0A)、または改行(ASCII 0x0Dのは0x0A)に続いてキャリッジリターンの配列のいずれかで終了されます。テーブル内の行の順序は、またはテーブルが構築される方法に応じて、重要ではない場合があります。

Comment lines in the table are preceded with a "#" character (ASCII 0x2C).

表中のコメント行は「#」文字(ASCII 0x2C)で先行しています。

Each non-comment line in the table starts with the character that is allowed in the registry and expected to be used in registrations, which is also called the "base character". If the base character has any variants, the base character is followed by a vertical bar character ("|", ASCII 0x7C) and the variant string. If the base character has more than one variant, the variants are separated by a colon (":", ASCII 0x3A). Strings are given with a hyphen ("-", ASCII 0x2D) between each character. Comments beginning with a "#" (ASCII 0x2C), and may be preceded by spaces (" ", ASCII 0x20).

表の各非コメント行は、レジストリに許可され、また、「基本文字」と呼ばれている登録、で使用されることが期待された文字で始まります。 (「|」、ASCII 0x7C)とバリアント文字列ベースの文字は任意のバリアントを持っている場合は、基本文字が垂直バーの文字が続いています。基本文字が複数のバリアントを持っている場合は、変異体は、コロン(「:」、ASCII 0x3A)で区切られます。各文字の間に - 文字列はハイフン(ASCII 0x2D「」)で与えられています。 "#"(ASCII 0x2C)、およびスペースが先行することができる(「」、アスキーの0x20)で始まるコメント。

The following is an example of how a table might look. The entries in this table are purposely silly and should not be used by any registry as the basis for choosing variants. For the example, assume that the registry:


o allows the FOR ALL character (U+2200) with no variants

oはなしバリエーションを持つすべての文字(U + 2200)が可能になります

o allows the COMPLEMENT character (U+2201) which has a single variant of LATIN CAPITAL LETTER C (U+0043)

oはLATIN CAPITAL LETTER C(U + 0043)の単一変異を有する補体文字(U + 2201)ことができ

o allows the PROPORTION character (U+2237) which has one variant which is the string COLON (U+003A) COLON (U+003A)

Oは、文字列、コロン(U + 003A)コロン(U + 003A)である一の変形を有する比例文字(U + 2237)を可能にします

o allows the PARTIAL DIFFERENTIAL character (U+2202) which has two variants: LATIN SMALL LETTER D (U+0064) and GREEK SMALL LETTER DELTA (U+03B4)

ラテン小文字D(U + 0064)、ギリシャSMALL LETTER DELTA(U + 03B4):Oは、二つの変異体を有する偏微分文字(U + 2202)を可能にします

The table contents (after any required header information, see [IANA-language-registry] and the discussion in Section 7 below) would look like:


       # An example of a table
       U+2237|U+003A-U+003A # Note that the variant is a string
       U+2202|U+0064:U+03B4 # Two variants for the same character

Implementers of table processors should remember that there are tens of thousands of characters whose codepoints are greater than 0xFFFF. Thus, any program that assumes that each character in the table is represented in exactly six octets ("U", "+", and four octets representing the character value) will fail with tables that use characters whose value is greater than 0xFFFF.


6. A Model Label Registration Procedure: "CreateBundle"

This procedure has three inputs:


1. the proposed base registration,

2. the language (or script, if the registration is script-based, but "language" is used for convenience below) for the proposed base registration, and


3. the processing table associated with that language.

The output of the process is either failure (the base registration cannot be registered at all), or a registration bundle that contains one or more labels (always including the base registration). As described earlier, the registration bundle should be stored with its date of creation so that issues with overlapping elements between bundles can later be resolved on a first-come, first-served basis.


There are two steps to processing the registration:


1. Check whether the proposed base registration exists in any bundle. If it does, stop immediately with a failure.


2. Process the base registration with the mechanism described as "CreateBundle" in Section 6.1, below.


Note that the process must be executed only once. The process must not be performed on any output of the process, only on the proposed base registration.


6.1. Description of the CreateBundle Mechanism
6.1. CreateBundleメカニズムの説明

The CreateBundle mechanism determines whether a registration bundle can be created and, if so, populates that bundle with valid labels.


During the processing, a "temporary bundle" contains partial labels, that is, labels that are being built and are not complete labels. The partial labels in the temporary bundle consist of strings.


The steps are:


1. Split the base registration into individual characters, called "candidate characters". Compare every candidate character against the base characters in the table. If any candidate character does not exist in the set of base characters, the system must stop and not register any names (that is, it must not register either the base registration or any labels that would have come from character variants).


2. Perform the steps in IDNA's ToASCII sequence for the base registration. If ToASCII fails for the base registration, the system must stop and not register any label (that is, it must not register either the base registration or labels that might have been created from variants of characters contained in it). If ToASCII succeeds, place the base registration into the registration bundle.


3. For every candidate character in the base registration, do the following: o Create the set of characters that consists of the candidate character and any variants.


       o  For each character in the set from the previous step,
          duplicate the temporary bundle that resulted from the previous
          candidate character, and add the new character to the end of
          each partial label.

4. The temporary bundle now contains zero or more labels that consist of Unicode characters. For every label in the temporary bundle, do the following:


       o  Process the label with ToASCII to see if ToASCII succeeds.  If
          it does, add the label to the registration bundle.  Otherwise,
          do not process this label from the temporary bundle any
          further; it will not go into the registration bundle.

The result of the processing outlined above is the registration bundle with the base registration and possibly other labels.


6.2. The "no-variants" Case
6.2. 「非変異体」のケース

It is clear that, for many scripts, registries will choose to create tables without variants, either because variants are clearly not necessary or because they are determined to cause more confusion and overhead than is justified by the circumstances. For those situations the table model of Section 5 becomes a trivial listing of base characters and only the first two steps of CreateBundle (verifying that all candidate character are in the base ("valid") character list and verifying that the resulting characters will succeed in the ToASCII operation) are applicable. Even the second of those steps becomes pro forma if the advice in the next subsection is followed.


6.3. CreateBundle and Nameprep Mapping
6.3. CreateBundleとNAMEPREPマッピング

One of the functions of Nameprep, and IDNA more generally, is to map a large number of Unicode characters (code points) into a smaller number to avoid a different but overlapping set of confusion problems. For example, when a non-ASCII script makes distinctions between "upper case" and "lower case", nameprep maps the upper case characters to the lower case ones in order to simulate the DNS protocol's rule that ASCII characters are interpreted in a case-insensitive way. Unicode also contains many code points that are typographic variants on each other (e.g., forms with different widths and code points that designate font variations for mathematical uses), the Unicode standard explicitly identifies them that way, and Nameprep maps these onto base characters.

NAMEPREP、およびIDNAの機能の一つは、より一般的に、混乱の問題の、異なるが重複セットを避けるために、より少ない数にUnicode文字(コードポイント)の多数をマッピングすることです。例えば、非ASCIIスクリプトが「大文字」と「小文字の」間の区別を行う場合、NAMEPREPは、ASCII文字が大文字に解釈されることDNSプロトコルのルールをシミュレートするために、小文字のものに大文字にマップ小文字を区別しない方法。 Unicodeは、互いに(数学的な使用のためのフォントのバリエーションを指定異なる幅とコードポイントと、例えば、フォーム)にタイポグラフィ変異体である多くのコードポイントを含む、ユニコード規格は、明示的にそのように識別し、NAMEPREPベース文字の上にこれらをマッピングします。

While having these mapping functions available during lookup may be quite helpful to users who type equivalent forms, registrations are probably best performed in terms of the IDNA base characters only, i.e., those characters that nameprep will not change. This will have two advantages.


o Registrants will never find themselves in the rather confusing position of having submitted one string for registration and finding a different string in the registry database (which could otherwise occur even if the relevant language table does not contain variants).


o Those who are interested in what characters are permitted by a given registry will only need to examine the relevant tables, rather than simulating the IDNA algorithm to determine the result of processing particular characters.


7. IANA Considerations
7. IANAの考慮事項

Under ICANN (not IETF) direction and management, the IANA has created a registry for language variant tables. The authoritative documentation for that registry is in [IANA-language-registry]. Since the registry exists and is being managed under ICANN direction, the material that follows is a review of the theory of this registry, rather than new instructions for IANA.


As described above and suggested in the JET Guidelines, the registration rules generally require only that:


o The application be submitted or endorsed by a TLD registry, to ensure that someone cares about the particular table.


o The table be identified by the following:


* the name -- usually the top-level domain name -- of the submitting or endorsing registry;

*名 - 通常はトップレベルドメイン名 - 提出やレジストリを支持します。

* one of: a language designation (consistent with [RFC3066] or with some other system approved by the IANA), a script designation, a combination of the two, or a sequence number acceptable to IANA for this purpose;


* a version number; and


* a date.

* デート。

o Characters listed in the table be identified by Unicode code points, as discussed above.


o The table format may correspond to that identified in [RFC3743], or in Section 5 above, or may be some variation on those themes appropriate to the local processing model (with or without variants).


This raises some issues that will need to be worked out as experiences accumulate. For example, more standardization of table formats would be desirable to allow processing by the same computer tools for different registries and languages. But standardization seems premature at this time due to differences in languages, processing, and requirements and lack of experience with them. Similarly, if a registry concludes that it should use a table that contains characters from several scripts, it is not clear how such a table should be designated. Identifying it with a language code (either according to [RFC3066] or an independent code registered with IANA) is likely to just introduce more confusion, especially given other Internet uses of the language codes. It appears that some other convention will be needed for those cases, and it should be developed (if it has not already been established by the time this document is published).

これは、経験が蓄積として働いたする必要がありますいくつかの問題を提起します。例えば、テーブル形式の複数の標準化は、異なるレジストリや言語で同じコンピュータツールで処理を可能にすることが望ましいであろう。しかし、標準化が原因の言語、処理、および要件と彼らと経験不足の違いに、この時点で早すぎるようです。レジストリは、それはいくつかのスクリプトから文字を含むテーブルを使用する必要があると結論した場合同様に、そのようなテーブルが指定されなければならないかは明らかではありません。 ([RFC3066]またはIANAに登録されている独立したコードに応じていずれか)言語コードとそれを識別することだけより混乱、言語コードの特に指定された他のインターネットの使用を導入する可能性があります。他のいくつかの大会は、そのような場合のために必要とされるであろうと思われる、と(それがすでにこの文書が公開された時点で確立されていない場合)には、開発されるべきです。

8. Internationalization Considerations

This document specifies a model mechanism for registering Internationalized Domain Names (IDNs) that can be used to reduce confusion among similar-appearing names. The proposal is designed to facilitate internationalization while permitting a balance between internationalization concerns and concerns about keeping the Internet global and domain name system references unique in the perception of the user as well as in practice.


9. Security Considerations

Registration of labels in the DNS that contain essentially unrestricted sequences of arbitrary Unicode characters may introduce opportunities for either attacks or simple confusion. Some of these risks, such as confusion about which character (of several that look alike) is actually intended, may be associated with the presentation form of DNS names. Others may be linked to databases associated with the DNS, e.g., with the difficulty of finding an entry in a "Whois file" when it is not clear how to enter or to search for the characters that make up a name. This document discusses a family of restrictions on the names that can be registered. Restrictions of the type described can be imposed by a DNS zone ("registry"). The document also describes some possible tools for implementing such restrictions.


While the increased number and types of characters made available by Unicode considerably increases the scale of the potential problems, the problems addressed by this document are not new. No plausible set of restrictions will eliminate all problems and sources of confusion: for example, it has often been pointed out that, even in ASCII, the characters digit-one ("1") and lower case L ("l") can easily be confused in some display fonts. But, to the degree to which security may be aided by sensible risk reduction, these techniques may be helpful.


10. Acknowledgements

Discussions in the process of developing the JET Guidelines were vital in developing this document and all of the JET participants are consequently acknowledged. Attempts to explain some of the issues uncovered there to, and feedback from, Vint Cerf, Wendy Rickard, and members of the ICANN IDN Committee were also helpful in the thinking leading up to this document.

JETガイドラインの開発プロセスでの議論は、この文書の開発に不可欠であり、JET参加者のすべてが結果的に認知されています。 ICANN IDN委員会のためにそこに明らかになった問題のいくつかを説明しようとし、フィードバックから、ヴィントン・サーフ、ウェンディリカード、そしてメンバーも、この文書に至るまでの思考のに役立ちました。

An effort by Paul Hoffman to create a generic specification for registration restrictions of this type helped to inspire this document, which takes a somewhat different, more language-oriented, approach than his initial draft. While the initial version of that draft indicated that multiple languages (or multiple language tables) for a single zone were infeasible, more recent versions [Hoffman-reg] shifted to inclusion of language-based approaches. The current version of this document incorporates considerable text, and even more ideas, from those drafts, with Paul Hoffman's generous permission.


Feedback was provided by several registry operators (of both country code and generic TLDs), including Edmon Chung and Ram Mohan of Afilias, and by ICANN and IANA staff, notably Tina Dam and Theresa Swinehart. This feedback about issues encountered in registering tables and designing IDN implementations resulted in the addition of significant clarifying text to the current version of the document.


The opinions expressed here are the sole responsibility of the author. Some of those whose ideas and comments are reflected in this document may disagree with the conclusions the author has drawn from them. The first draft version of this document was posted in June 2003.


11. Informative References

[Daniels] P.T. Daniels and W. Bright, The World's Writing Systems, Oxford: Oxford University Press: 1996.

【ダニエルズ] P.T.ダニエルズとW.明るい、世界ライティングシステム、オックスフォード:オックスフォード大学出版:1996。

[Drucker] Drucker, J., "The Alphabetic Labyrinth: The Letters in History and Imagination", 1995.


[Hoffman-reg] Hoffman, P., "A Method for Registering Internationalized Domain Names", Work in Progress, October 2003.


[IESG-IDN] Internet Engineering Steering Group, IETF, "IESG Statement on IDN", IESG Statement available from, February 2003.

[IESG-IDN]インターネット工学運営グループ、IETF、 "IDNのIESG声明"、から入手IESG声明、2003年2月。

[ICANN-IDN] Internet Corporation for Assigned Names and Numbers (ICANN), "Guidelines for the Implementation of Internationalized Domain Names, Version 1.0", June 2003.


[ICANN-IDN2] Internet Corporation for Assigned Names and Numbers (ICANN), "Guidelines for the Implementation of Internationalized Domain Names, Version 2.0", September 2005.


[IANA-language-registry] Internet Assigned Numbers Authority (IANA), "IDN Language Table Registry", April 2004.

[IANA-言語レジストリ] IANA(Internet Assigned Numbers Authority)に、 "IDN言語表レジストリ"、2004年4月。

[LTRU-Registry] Phillips, A., Ed. and M. Davis, Ed., "Tags for Identifying Languages", Work in Progress, October 2005.


[RFC952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD Internet host table specification", RFC 952, October 1985.

[RFC952] Harrenstien、K.、スタール、M.、およびE. Feinler、 "DoDのインターネットホストテーブル仕様"、RFC 952、1985年10月。

[RFC1035] Mockapetris, P., "Domain names - implementation and specification", STD 13, RFC 1035, November 1987.

[RFC1035] Mockapetris、P.、 "ドメイン名 - 実装及び仕様"、STD 13、RFC 1035、1987年11月。

[RFC3066] Alvestrand, H., "Tags for the Identification of Languages", BCP 47, RFC 3066, January 2001.

[RFC3066] Alvestrand、H.、 "言語識別のためのタグ"、BCP 47、RFC 3066、2001年1月。

[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003.

[RFC3490] Faltstrom、P.、ホフマン、P.、およびA.コステロ、 "アプリケーションにおける国際化ドメイン名(IDNA)"、RFC 3490、2003年3月。

[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003.

[RFC3491]ホフマン、P.とM.ブランシェ、 "NAMEPREP:国際化ドメイン名のためのstringprepプロフィール(IDN)"、RFC 3491、2003年3月。

[RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", RFC 3492, March 2003.

[RFC3492]コステロ、A.、 "ピュニコード:アプリケーションにおける国際化ドメイン名のUnicodeのブートストリングのエンコード(IDNA)"、RFC 3492、2003年3月。

[RFC3536] Hoffman, P., "Terminology Used in Internationalization in the IETF", RFC 3536, May 2003.

[RFC3536]ホフマン、P.、 "IETFでの国際化に使用される用語"、RFC 3536、2003年5月。

[RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint Engineering Team (JET) Guidelines for Internationalized Domain Names (IDN) Registration and Administration for Chinese, Japanese, and Korean", RFC 3743, April 2004.

[RFC3743]小西、K.、黄、K.、銭、H.、およびY.コ、 "国際化ドメイン名のための共同エンジニアリングチーム(JET)ガイドライン中国語、日本語、韓国語用(IDN)登録と管理"、 RFC 3743、2004年4月。

[Unicode] The Unicode Consortium, "The Unicode Standard -- Version 3.0", January 2000.

[ユニコード]のUnicodeコンソーシアム、 "Unicode標準 - バージョン3.0"、2000年1月。

[Unicode32] The Unicode Consortium, "Unicode Standard Annex #28: Unicode 3.2", March 2002.

[Unicode32]はUnicodeコンソーシアム、 "Unicode規格附属書#28:ユニコード3.2"、2002年3月。

Author's Address


John C Klensin 1770 Massachusetts Ave, #322 Cambridge, MA 02140 USA

ジョン・C Klensin 1770マサチューセッツアベニュー、#322ケンブリッジ、MA 02140 USA

Phone: +1 617 491 5735 EMail:

電話:+1 617 491 5735 Eメール

Full Copyright Statement


Copyright (C) The Internet Society (2005).


This document is subject to the rights, licenses and restrictions contained in BCP 78 and at, and except as set forth therein, the authors retain all their rights.

この文書では、BCP 78に及びに含まれる権利と許可と制限の適用を受けており、その中の記載を除いて、作者は彼らのすべての権利を保有します。


この文書とここに含まれている情報は、基礎とCONTRIBUTOR「そのまま」、ORGANIZATION HE / SHEが表すまたはインターネットソサエティおよびインターネット・エンジニアリング・タスク・フォース放棄すべての保証、明示または、(もしあれば)後援ISに設けられています。黙示、情報の利用は、特定の目的に対する権利または商品性または適合性の黙示の保証を侵害しない任意の保証含むがこれらに限定されません。

Intellectual Property


The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.

IETFは、本書またはそのような権限下で、ライセンスがたりないかもしれない程度に記載された技術の実装や使用に関係すると主張される可能性があります任意の知的財産権やその他の権利の有効性または範囲に関していかなる位置を取りません利用可能です。またそれは、それがどのような権利を確認する独自の取り組みを行ったことを示すものでもありません。 RFC文書の権利に関する手続きの情報は、BCP 78およびBCP 79に記載されています。

Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at


The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at

IETFは、その注意にこの標準を実装するために必要とされる技術をカバーすることができる任意の著作権、特許または特許出願、またはその他の所有権を持ってすべての利害関係者を招待します。 ietf-ipr@ietf.orgのIETFに情報を記述してください。



Funding for the RFC Editor function is currently provided by the Internet Society.

RFC Editor機能のための基金は現在、インターネット協会によって提供されます。