RFC 9969 - Report from the IAB Workshop on AI-CONTROL 日本語訳

原文URL : https://www.rfc-editor.org/rfc/rfc9969.html
タイトル : RFC 9969 - AI-CONTROLに関するIABワークショップ報告
翻訳編集 : 自動生成

[要約] RFC 9969は、2024年9月に開催された「AI-CONTROL」に関するIABワークショップの報告書です。大規模言語モデル（LLM）などのAIトレーニング目的でのWebクロールに対し、コンテンツ所有者がオプトアウトや利用設定を示す既存の手法（robots.txtや埋め込みメタデータ等）の課題を整理し、今後の標準化に向けた方向性や提案をまとめています。

Internet Architecture Board (IAB)                          M. Nottingham
Request for Comments: 9969                                              
Category: Informational                                      S. Krishnan
ISSN: 2070-1721                                                 May 2026

Report from the IAB Workshop on AI-CONTROL

AI-CONTROLに関するIABワークショップ報告

Abstract

概要

The AI-CONTROL Workshop was convened by the Internet Architecture Board (IAB) in September 2024. This report summarizes its significant points of discussion and identifies topics that may warrant further consideration and work.

AI-CONTROL ワークショップは、2024 年 9 月にインターネットアーキテクチャ委員会 (IAB) によって開催されました。このレポートは、その重要な議論点を要約し、さらなる検討と作業が必要と思われるトピックを特定します。

Note that this document is a report on the proceedings of the workshop. The views and positions documented in this report are those of the workshop participants and do not necessarily reflect IAB views and positions.

なお、本書はワークショップの議事録です。この報告書に記載されている見解と立場はワークショップ参加者の見解であり、必ずしも IAB の見解と立場を反映しているわけではありません。

Status of This Memo

本文書の状態

This document is not an Internet Standards Track specification; it is published for informational purposes.

この文書は Internet Standards Track 仕様ではありません。情報提供を目的として公開されています。

This document is a product of the Internet Architecture Board (IAB) and represents information that the IAB has deemed valuable to provide for permanent record. It represents the consensus of the Internet Architecture Board (IAB). Documents approved for publication by the IAB are not candidates for any level of Internet Standard; see Section 2 of RFC 7841.

この文書はインターネットアーキテクチャ委員会 (IAB) の成果物であり、IAB が永続的な記録として提供することが価値があると判断した情報を表しています。これは Internet Architecture Board (IAB) のコンセンサスを表しています。IAB によって公開が承認された文書は、どのレベルのインターネット標準の候補でもありません。RFC 7841 のセクション 2 を参照してください。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc9969.

この文書の現在のステータス、正誤表、およびそれに対するフィードバックの提供方法に関する情報は、https://www.rfc-editor.org/info/rfc9969 で入手できます。

Copyright Notice

著作権表示

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.

この文書は、BCP 78 およびこの文書の発行日に有効な IETF 文書に関する IETF トラストの法的規定 (https://trustee.ietf.org/license-info) の対象となります。これらの文書には、この文書に関するお客様の権利と制限が記載されているため、注意深くお読みください。

   1.  Introduction
     1.1.  Chatham House Rule
     1.2.  Views Expressed in This Report
   2.  Workshop Scope and Discussion
     2.1.  Crawl Time vs. Inference Time
       2.1.1.  Multiple Uses for Crawl Data
       2.1.2.  Application of Preferences
     2.2.  Trust
     2.3.  Attachment
       2.3.1.  robots.txt (and Similar)
       2.3.2.  Embedding
       2.3.3.  Registries
     2.4.  Vocabulary
   3.  Conclusions
     3.1.  Potential Standards Work
       3.1.1.  Out of Initial Scope
   4.  IANA Considerations
   5.  Security Considerations
   6.  Informative References
   Appendix A.  About the Workshop
     A.1.  Agenda
       A.1.1.  Thursday, 2024-09-19
       A.1.2.  Friday, 2024-09-20
     A.2.  Attendees
   IAB Members at the Time of Approval
   Acknowledgements
   Authors' Addresses

1. Introduction

1. はじめに

The Internet Architecture Board (IAB) holds occasional workshops designed to consider long-term issues and strategies for the Internet, and to suggest future directions for the Internet architecture. This long-term planning function of the IAB is complementary to the ongoing engineering efforts performed by working groups of the Internet Engineering Task Force (IETF).

Internet Architecture Board (IAB) は、インターネットの長期的な問題と戦略を検討し、インターネットアーキテクチャの将来の方向性を提案することを目的としたワークショップを不定期に開催します。IAB のこの長期計画機能は、インターネットエンジニアリングタスクフォース (IETF) のワーキンググループによって実行されている継続的なエンジニアリングの取り組みを補完するものです。

The Internet is one of the major sources of data used to train Large Language Models (LLMs) (or, more generally, Artificial Intelligence (AI)). Because this use was not envisioned by most publishers of information on the Internet, a means of expressing the owners' preferences regarding AI crawling has emerged, sometimes backed by law (e.g., in the European Union's AI Act [AI-ACT]).

インターネットは、大規模言語モデル (LLM) (より一般的には人工知能 (AI)) のトレーニングに使用される主要なデータソースの 1 つです。このような使用はインターネット上の情報発行者のほとんどによって想定されていなかったため、AI クローリングに関する所有者の好みを表現する手段が登場し、場合によっては法律に裏付けられています (例: 欧州連合の AI 法 [AI-ACT])。

The IAB convened the AI-CONTROL Workshop on 19-20 September 2024 to "explore practical opt-out mechanisms for AI and build an understanding of use cases, requirements, and other considerations in this space" [CFP]. In particular, the emerging practice of using the Robots Exclusion Protocol [RFC9309] -- also known as "robots.txt" -- has not been coordinated between AI crawlers, resulting in considerable differences in how they treat it. Furthermore, robots.txt may or may not be a suitable way to control AI crawlers. However, discussion was not limited to consideration of robots.txt, and approaches other than opt-out were considered.

IABは、「AIの実践的なオプトアウトメカニズムを検討し、この分野におけるユースケース、要件、その他の考慮事項についての理解を構築する」ことを目的として、2024年9月19～20日にAI-CONTROLワークショップを開催した[CFP]。特に、「robots.txt」としても知られるロボット排除プロトコル [RFC9309] を使用するという新たな慣行は、AI クローラー間で調整されておらず、その扱い方に大きな違いが生じています。さらに、robots.txt は AI クローラーを制御するのに適した方法である場合とそうでない場合があります。ただし、議論はrobots.txtの検討にとどまらず、オプトアウト以外のアプローチも検討されました。

To ensure many viewpoints were represented, the program committee invited a broad selection of technical experts, AI vendors, content publishers, civil society advocates, and policymakers.

多くの視点を確実に反映するために、プログラム委員会は技術専門家、AI ベンダー、コンテンツ発行者、市民社会の擁護者、政策立案者から幅広く選ばれた人々を招待しました。

1.1. Chatham House Rule

1.1. チャタムハウスルール

Participants agreed to conduct the workshop under the Chatham House Rule [CHATHAM-HOUSE], so this report does not attribute statements to individuals or organizations without express permission. Most submissions to the workshop were public and thus attributable; they are used here to provide substance and context.

参加者はチャタムハウス規則[CHATHAM-HOUSE]に基づいてワークショップを実施することに同意したため、この報告書は明示的な許可なしに発言を個人または組織に帰するものではありません。ワークショップへの提出物のほとんどは公開されており、そのため帰属が認められます。ここでは内容とコンテキストを提供するために使用されます。

Appendix A.2 lists the workshop participants, unless they requested that this information be withheld.

付録 A.2 には、この情報を差し控えることを要求しない限り、ワークショップの参加者がリストされています。

1.2. Views Expressed in This Report

1.2. この報告書で表明されている見解

This document is a report on the proceedings of the workshop. The views and positions documented in this report are expressed during the workshop by participants and do not necessarily reflect the IAB's views and positions.

本書はワークショップの議事録です。この報告書に記載された見解および立場は、ワークショップ中に参加者によって表明されたものであり、必ずしも IAB の見解および立場を反映するものではありません。

Furthermore, the content of the report comes from presentations given by workshop participants and notes taken during the discussions, without interpretation or validation. Thus, the content of this report follows the flow and dialog of the workshop but does not attempt to capture a consensus.

さらに、報告書の内容は、ワークショップ参加者によるプレゼンテーションやディスカッション中に取られたメモに基づくものであり、解釈や検証は行われていません。したがって、この報告書の内容はワークショップの流れと対話に沿ったものですが、合意を得ようとするものではありません。

2. Workshop Scope and Discussion

2. ワークショップの範囲とディスカッション

The workshop began by surveying the state of AI control.

ワークショップはAI制御の現状調査から始まりました。

Currently, Internet publishers express their preferences for how their content is treated for the purposes of AI training using a variety of mechanisms. These include declarative mechanisms, such as terms of service, embedded metadata, and robots.txt [RFC9309], as well as active mechanisms, such as use of paywalls and selective blocking of crawlers (e.g., by IP address or User-Agent).

現在、インターネットパブリッシャーは、さまざまなメカニズムを使用して、AI トレーニングの目的でコンテンツがどのように扱われるかについて好みを表明しています。これらには、利用規約、埋め込みメタデータ、robots.txt [RFC9309] などの宣言的メカニズムだけでなく、ペイウォールの使用やクローラーの選択的ブロック (IP アドレスやユーザーエージェントなどによる) などのアクティブなメカニズムも含まれます。

There was disagreement about the implications of AI opt-out overall. Research presented at the workshop [DECLINE] indicates that the use of such controls is becoming more prevalent, reducing the availability of data to AI (for purposes including training and inference-time usage). Some of the participants expressed concern about the implications of this -- although at least one AI vendor seemed less concerned by this, indicating that "there are plenty of tokens available" for training, even if many opt out. Others expressed a need to opt out of AI training because of how they perceive its effects on their control over content, seeing AI as usurping their relationships with customers and a potential threat to whole industries.

AI オプトアウト全体の影響については意見の相違がありました。ワークショップ [DECLINE] で発表された研究によると、そのような制御の使用がますます普及しており、AI が利用できるデータ (トレーニングや推論時の使用などの目的) が減少していることが示されています。参加者の中には、この影響について懸念を表明した人もいたが、少なくとも 1 つの AI ベンダーはこれについてそれほど懸念していないようで、たとえ多くがオプトアウトしたとしても、トレーニングに「利用可能なトークンは豊富にある」ことを示した。また、AI が顧客との関係を奪い、業界全体に対する潜在的な脅威であると考え、コンテンツの管理に対する AI の影響をどう認識しているかという理由から、AI トレーニングをオプトアウトする必要があると表明した人もいます。

However, there was quick agreement that both viewpoints were harmed by the current state of AI opt-out -- a situation where "no one is better off" (in the words of one participant).

しかし、AI オプトアウトの現状、つまり「誰もより良い状況にならない」（ある参加者の言葉）状況によって、両方の観点が損なわれるということですぐに合意が得られた。

Much of that dysfunction was attributed to the lack of coordination and standards for AI opt-out. Currently, content publishers need to consult with each AI vendor to understand how to opt out of training their products, as there is significant variance in each vendor's behavior. Furthermore, publishers need to continually monitor both new vendors and policy updates from the vendors they are aware of.

その機能不全の多くは、AI オプトアウトに関する調整と基準の欠如に起因していました。現在、コンテンツ発行者は各 AI ベンダーと相談して、自社製品のトレーニングをオプトアウトする方法を理解する必要があります。これは、各ベンダーの動作に大きな違いがあるためです。さらに、発行者は、新しいベンダーと、認識しているベンダーからのポリシー更新の両方を継続的に監視する必要があります。

Underlying those immediate issues, however, are significant constraints that could be attributed to uncertainties in the legal context, the nature of AI, and the implications of needing to opt out of crawling for it.

しかし、これらの差し迫った問題の根底には、法的文脈、AI の性質、および AI のクロールをオプトアウトする必要があることの影響における不確実性が原因と考えられる重大な制約があります。

2.1. Crawl Time vs. Inference Time

2.1. クロール時間と推論時間

Perhaps most significant is the "crawl time vs. inference time" problem. Statements of preference are apparent at crawl time, bound to content either by location (e.g., robots.txt) or embedded inside the content itself as metadata. However, the target of those directives is often disassociated from the crawler, either because the crawl data is not only used for training AI models or because the preferences could be applicable at inference time.

おそらく最も重要なのは、「クロール時間と推論時間」の問題です。好みのステートメントはクロール時に明らかになり、場所 (robots.txt など) によってコンテンツにバインドされるか、メタデータとしてコンテンツ自体の内部に埋め込まれます。ただし、これらのディレクティブのターゲットは、クロールデータが AI モデルのトレーニングにのみ使用されるわけではないため、または推論時に設定が適用される可能性があるため、多くの場合クローラーから切り離されます。

2.1.1. Multiple Uses for Crawl Data

2.1.1. クロールデータの複数の用途

A crawl's data might have multiple uses because the vendor also has another product that uses it (e.g., a search engine) or because the crawl is performed by a party other than the AI vendor. Both are very common patterns: Operators of many Internet search engines also train AI models, and many AI models use third-party crawl data. In either case, conflating different uses can change the incentives for publishers to cooperate with the crawler.

クロールのデータは、ベンダーがそれを使用する別の製品 (検索エンジンなど) も持っているか、クロールが AI ベンダー以外の当事者によって実行されるため、複数の用途がある可能性があります。どちらも非常に一般的なパターンです。多くのインターネット検索エンジンのオペレーターも AI モデルをトレーニングしており、多くの AI モデルはサードパーティのクロールデータを使用します。いずれの場合も、異なる用途を混同すると、パブリッシャーがクローラーと協力するインセンティブが変化する可能性があります。

Well-established uses of crawling, such as Internet searches, were seen by participants as at least partially aligned with the interests of publishers: They allow their sites to be crawled, and in return, they receive higher traffic and attention due to being in the search index. However, several participants pointed out that this symbiotic relationship does not exist for AI training uses -- with some viewing AI as hostile to publishers because it has the capacity to take traffic away from their sites.

インターネット検索など、確立されたクローリングの使用法は、少なくとも部分的にはパブリッシャーの利益と一致していると参加者はみなしていました。パブリッシャーによってサイトがクロールされるようになり、その見返りに、検索インデックスに登録されることでより多くのトラフィックと注目を得ることができます。しかし、数名の参加者は、この共生関係は AI トレーニングの用途には存在しないと指摘し、AI にはパブリッシャーのサイトからトラフィックを奪う能力があるため、パブリッシャーにとって敵対的な見方をする人もいます。

Therefore, when a crawler has multiple uses that include AI, participants observed that "collateral damage" was likely for non-AI uses, especially when publishers take more active control measures, such as blocking or paywalls, to protect their interests.

したがって、クローラーが AI を含む複数の用途に使用されている場合、特にパブリッシャーが自社の利益を保護するためにブロックやペイウォールなどのより積極的な管理措置を講じている場合には、AI 以外の用途でも「巻き添え被害」が発生する可能性が高いと参加者は観察しました。

Several participants expressed concerns about this phenomenon's effects on the ecosystem, effectively "locking down the Web" with one opining that there were implications for freedom of expression overall.

何人かの参加者は、この現象がエコシステムに与える影響について懸念を表明し、事実上「ウェブをロックダウン」し、表現の自由全体に影響を及ぼす可能性があるとの意見もあった。

2.1.2. Application of Preferences

2.1.2. プリファレンスの適用

When data is used to train an LLM, the resulting model does not have the ability to only selectively use a portion of it when performing a task because inference uses the whole model, and it is not possible to identify specific input data for its use in doing so.

データを使用して LLM をトレーニングする場合、推論ではモデル全体が使用され、その際に使用する特定の入力データを識別することができないため、結果のモデルにはタスクの実行時にその一部のみを選択的に使用する機能がありません。

This means that while publishers' preferences may be available when content is crawled, they generally are not when inference takes place. Those preferences that are stated in reference to use by AI -- for example, "no military uses" or "non-commercial only" -- cannot be applied by a general-purpose "foundation" model.

これは、コンテンツがクロールされるときにパブリッシャーの設定が利用できる場合がありますが、通常、推論が行われるときには利用できないことを意味します。AI による使用に関して述べられているこれらの設定 (たとえば、「軍事利用禁止」や「非営利のみ」) は、汎用の「基礎」モデルでは適用できません。

This leaves a few unappealing choices to AI vendors that wish to comply with those preferences. They can simply omit such data from foundation models, thereby reducing their viability. Or they can create a separate model for each permutation of preferences -- with a likely proliferation of models as the set of permutations expands.

このため、これらの好みに準拠したい AI ベンダーにとっては、魅力のない選択肢がいくつか残されています。このようなデータを基礎モデルから単純に省略してしまうと、その実行可能性が低下してしまいます。あるいは、好みの順列ごとに個別のモデルを作成することもできます。順列のセットが拡大するにつれて、モデルが急増する可能性があります。

Compounding this issue was the observation that preferences change over time, whereas LLMs are created over long time frames and cannot easily be updated to reflect those changes. Of particular concern to some was how this makes an opt-out regime "stickier" because content that has no associated preference (such as that which predates the authors' knowledge of LLMs) is allowed to be used for these unforeseen purposes.

この問題をさらに悪化させたのは、環境設定は時間の経過とともに変化するのに対し、LLM は長期間にわたって作成され、その変化を反映するように簡単に更新できないという観察でした。一部の人が特に懸念したのは、関連する優先設定を持たないコンテンツ (作成者が LLM について知る以前のものなど) がこれらの予期せぬ目的で使用できるようになるため、これによってオプトアウト制度がどのように「固執」するかということでした。

2.2. Trust

2.2. 信頼

Participants felt that the disconnection between the statement of preferences and its application contribute to a lack of trust in the ecosystem, along with the typical lack of attribution for data sources in LLMs, a lack of an incentive for publishers to contribute data, and finally (and most noted) a lack of any means of monitoring compliance with preferences.

参加者は、プリファレンスの記述とその適用との間の断絶が、LLM におけるデータソースの帰属の典型的な欠如、パブリッシャーがデータを提供するインセンティブの欠如、そして最後に (そして最も注目された) プリファレンスの遵守を監視する手段の欠如とともに、エコシステムへの信頼の欠如の一因となっていると感じました。

This lack of trust led some participants to question whether communicating preferences is sufficient in all cases without an accompanying way to enforce them, or even to audit adherence to them. Some participants also indicated that a lack of trust was the primary cause of the increasingly prevalent blocking of AI crawler IP addresses, among other measures.

この信頼の欠如により、一部の参加者は、好みを強制したり、好みの順守を監査したりする付随的な方法なしで、あらゆる場合に好みを伝えるだけで十分なのかと疑問を抱きました。一部の参加者は、AI クローラーの IP アドレスのブロックなどがますます広まっている主な原因は信頼の欠如であると指摘しました。

2.3. Attachment

2.3. アタッチメント

One of the primary focuses of the workshop was on _attachment_, i.e., how preferences are associated with content on the Internet. A range of mechanisms was discussed.

ワークショップの主な焦点の 1 つは、_attachment_、つまり、設定がインターネット上のコンテンツにどのように関連付けられるかということでした。さまざまなメカニズムについて議論されました。

2.3.1. robots.txt (and Similar)

2.3.1. robots.txt (および類似のもの)

The Robots Exclusion Protocol [RFC9309] is widely recognized by AI vendors as an attachment mechanism for preferences. Several deficiencies were discussed.

Robots Exclusion Protocol [RFC9309] は、好みの添付メカニズムとして AI ベンダーによって広く認識されています。いくつかの欠陥について議論されました。

First, it does not scale to offer granular control over large sites where authors might want to express different policies for a range of content (for example, YouTube).

まず、作成者がさまざまなコンテンツ (YouTube など) に対してさまざまなポリシーを表現する必要がある大規模なサイトを詳細に制御できるように拡張できません。

robots.txt is also typically under the control of the site administrator. If a site has content from many creators (as is often the case for social media and similar platforms), the administrator may not allow them to express their preferences fully, or at all.

robots.txt も通常はサイト管理者の管理下にあります。サイトに多数の作成者によるコンテンツが含まれている場合 (ソーシャルメディアや同様のプラットフォームではよくあることですが)、管理者は作成者が自分の好みを完全に表現することを許可しない場合があります。

If content is copied or moved to a different site, the preferences at the new site need to be explicitly transferred because robots.txt is a separate resource.

robots.txt は別のリソースであるため、コンテンツを別のサイトにコピーまたは移動する場合は、新しいサイトの環境設定を明示的に転送する必要があります。

These deficiencies led many participants to feel that robots.txt cannot be the only solution to opt-out: Rather, it should be part of a larger system that addresses its shortcomings.

これらの欠陥により、多くの参加者は、robots.txt がオプトアウトの唯一の解決策ではない、むしろ、robots.txt はその欠陥に対処するより大きなシステムの一部であるべきであると感じました。

Participants noted that other similar attachment mechanisms have been proposed. However, none appear to have gained as much attention or implementation (both by AI vendors and content owners) as robots.txt.

参加者は、他にも同様の取り付け機構が提案されていることに留意した。しかし、robots.txt ほど多くの注目や実装 (AI ベンダーとコンテンツ所有者の両方によって) を集めたものはないようです。

2.3.2. Embedding

2.3.2. 埋め込み

Another mechanism for associating preferences with content is to embed them into the content itself. Many formats used on the Internet allow this; for example, HTML has the <meta> tag, images have Extensible Metadata Platform (XMP) and similar metadata sections, and XML and JSON have rich potential for extensions to carry such data.

設定をコンテンツに関連付けるもう 1 つのメカニズムは、設定をコンテンツ自体に埋め込むことです。インターネットで使用される多くの形式ではこれが可能です。たとえば、HTML には <meta> タグがあり、画像には Extensible Metadata Platform (XMP) と同様のメタデータセクションがあり、XML と JSON にはそのようなデータを運ぶための拡張機能が豊富にあります。

Embedded preferences were seen to have the advantage of granularity, and of "traveling with" content as it is produced, when the content that embeds the preferences is moved from site to site or when it is stored offline.

埋め込み設定には粒度があり、設定を埋め込んだコンテンツがサイト間で移動する場合やオフラインで保存される場合に、作成されたコンテンツを「一緒に移動できる」という利点があることがわかりました。

However, several participants pointed out that embedded preferences are easily stripped from most formats. This is a common practice for reducing the size of a file (thereby improving performance when downloading it) and for assuring privacy (since metadata often leaks information unintentionally).

しかし、何人かの参加者は、埋め込まれた設定はほとんどの形式から簡単に削除されてしまうと指摘しました。これは、ファイルのサイズを削減し (これによりダウンロード時のパフォーマンスが向上します)、プライバシーを確保する (メタデータから意図せず情報が漏洩することが多いため) ための一般的な方法です。

Furthermore, some types of content are not suitable for embedding. For example, it is not possible to embed preferences into purely textual content, and web pages with content from several producers (such as a social media or comment feeds) cannot easily reflect preferences for each one.

さらに、一部の種類のコンテンツは埋め込みに適していません。たとえば、純粋にテキストのコンテンツに設定を埋め込むことはできません。また、複数のプロデューサーからのコンテンツ (ソーシャルメディアやコメントフィードなど) を含む Web ページに、それぞれの設定を簡単に反映することはできません。

Participants noted that the means of embedding preferences in many formats would need to be determined by or coordinated with organizations outside the IETF. For example, HTML and many image formats are maintained by external bodies.

参加者は、多くの形式にプリファレンスを埋め込む手段は、IETF 外部の組織によって決定されるか、IETF 外部の組織と調整される必要があると指摘しました。たとえば、HTML や多くの画像形式は外部団体によって維持されています。

2.3.3. Registries

2.3.3. レジストリ

In some existing copyright management regimes, it is already common to have a registry of works that is consulted upon use. For example, this approach is often used for photographs, music, and video.

一部の既存の著作権管理体制では、使用時に参照される作品のレジストリを作成することがすでに一般的です。たとえば、このアプローチは写真、音楽、ビデオによく使用されます。

Typically, registries use hashing mechanisms to create a "fingerprint" for the content that is robust to changes.

通常、レジストリはハッシュメカニズムを使用して、変更に対して堅牢なコンテンツの「フィンガープリント」を作成します。

Using a registry decouples the content in question from its location so that it can be found even if moved. It is also claimed to be robust against stripping of embedded metadata, which is a common practice to improve performance and/or privacy.

レジストリを使用すると、問題のコンテンツがその場所から切り離され、移動された場合でも見つけられるようになります。また、パフォーマンスやプライバシーを向上させるために一般的に行われている、埋め込みメタデータの削除に対して堅牢であるとも主張されています。

However, several participants pointed out issues with deploying registries at the scale of the Internet. While they may be effective for (relatively) closed and well-known ecosystems, such as commercial music publishing, applying them to a diverse and very large ecosystem like the Internet has proven problematic.

しかし、何人かの参加者は、インターネット規模でのレジストリの導入に関する問題を指摘しました。これらは、商業音楽出版など、（比較的）閉鎖的でよく知られたエコシステムには効果的かもしれませんが、インターネットのような多様で非常に大規模なエコシステムに適用すると、問題があることが判明しています。

2.4. Vocabulary

2.4. 語彙

Another major focus area for the workshop was on _vocabulary_ -- the specific semantics of the opt-out signal. Several participants noted that there are already many proposals for vocabularies, as well as many conflicting vocabularies already in use. Several examples were discussed, including where existing terms were ambiguous, did not address common use cases, or were used in conflicting ways by different actors.

ワークショップのもう 1 つの主要な焦点は、_語彙_、つまりオプトアウトシグナルの特定のセマンティクスにありました。何人かの参加者は、すでに多くの語彙が提案されており、矛盾する語彙もすでに使用されていると指摘した。既存の用語が曖昧である場合、一般的なユースケースに対応していない場合、または異なる主体によって矛盾した方法で使用されている場合など、いくつかの例が議論されました。

Although no conclusions regarding exact vocabulary were reached, it was generally agreed that a complex vocabulary is unlikely to succeed.

正確な語彙については結論が出ませんでしたが、複雑な語彙は成功する可能性が低いということで一般的に同意されました。

3. Conclusions

3. 結論

Participants generally agreed that on its current path, the ecosystem is not sustainable. As one remarked, "robots.txt is broken and we broke it".

参加者はおおむね、現在のやり方ではエコシステムは持続可能ではないことに同意した。ある人が述べたように、「robots.txt は壊れており、私たちがそれを壊しました」。

Legal uncertainty, along with fundamental limitations of opt-out regimes pointed out above, limit the effectiveness of any technical solution, which will be operating in a system unlike either robots.txt (where there is a symbiotic relationship between content owners and the crawlers) or copyright (where the default is effectively opt-in, not opt-out).

法的不確実性は、上で指摘したオプトアウト制度の基本的な制限に加えて、robots.txt (コンテンツ所有者とクローラーの間に共生関係がある) や著作権 (デフォルトがオプトアウトではなく事実上オプトインである) とは異なるシステムで運用される技術的ソリューションの有効性を制限します。

However, the workshop ended with general agreement that positive steps could be taken to improve the communication of preferences from content owners for AI use cases. In discussion, it was evident that the discovery of preferences from multiple attachment mechanisms is necessary to meet the diverse needs of content authors and, therefore, that defining how they are combined is important.

しかし、ワークショップは、AI ユースケースに対するコンテンツ所有者からの好みの伝達を改善するために積極的な措置を講じることができるという大まかな合意で終了しました。議論の中で、コンテンツ作成者の多様なニーズを満たすには、複数の添付メカニズムから好みを発見することが必要であり、したがって、それらをどのように組み合わせるかを定義することが重要であることが明らかでした。

We outline a proposed standard program below.

以下に提案されている標準プログラムの概要を示します。

3.1. Potential Standards Work

3.1. 潜在的な標準作業

The following items were identified as good starting points for IETF work:

以下の項目が IETF の作業の適切な開始点として特定されました。

* Attachment to websites by location (in robots.txt or a similar mechanism)

* 場所ごとの Web サイトへの添付ファイル (robots.txt または同様のメカニズム)

* Attachment via embedding in IETF-controlled formats (e.g., HTTP headers)

* IETF 制御形式 (例: HTTP ヘッダー) への埋め込みによる添付ファイル

* Definition of a common core vocabulary

* 共通の中心語彙の定義

* Definition of the overall regime, e.g., how to combine preferences discovered from multiple attachment mechanisms

* 全体的な体制の定義（複数のアタッチメントメカニズムから発見された好みをどのように組み合わせるかなど）

It would be expected that the IETF would coordinate with other Standards Development Organizations (SDOs) to define embedding in other formats (e.g., HTML).

IETF が他の標準開発機関 (SDO) と連携して、他の形式 (HTML など) での埋め込みを定義することが期待されます。

3.1.1. Out of Initial Scope

3.1.1. 初期範囲外

It was broadly agreed that it would not be useful to work on the following items, at least to begin with:

少なくとも最初から次の項目に取り組むことは有益ではないということで大方合意されました。

* Enforcement mechanisms for preferences

* プリファレンスの強制メカニズム

* Registry-based solutions

* レジストリベースのソリューション

* Identifying or authenticating crawlers and/or content owners

* クローラーおよび/またはコンテンツ所有者の識別または認証

* Audit or transparency mechanisms

* 監査または透明性のメカニズム

4. IANA Considerations

4. IANAの考慮事項

This document has no IANA actions.

この文書には IANA のアクションはありません。

5. Security Considerations

5. セキュリティに関する考慮事項

This document is a workshop report and does not impact the security of the Internet.

この文書はワークショップの報告書であり、インターネットのセキュリティには影響しません。

6. Informative References

6. 参考引用

   [AI-ACT]   European Parliament, "Regulation (EU) 2024/1689 of the
              European Parliament and of the Council of 13 June 2024
              laying down harmonised rules on artificial intelligence
              and amending Regulations (EC) No 300/2008, (EU) No
              167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139
              and (EU) 2019/2144 and Directives 2014/90/EU, (EU)
              2016/797 and (EU) 2020/1828 (Artificial Intelligence Act)
              (Text with EEA relevance)", 13 June 2024,
              <https://eur-lex.europa.eu/eli/reg/2024/1689/oj>.

   [CFP]      Internet Architecture Board, "IAB Workshop on AI-CONTROL",
              <https://datatracker.ietf.org/group/aicontrolws/about/>.

   [CHATHAM-HOUSE]
              Chatham House, "Chatham House Rule",
              <https://www.chathamhouse.org/about-us/chatham-house-
              rule>.

   [DECLINE]  Longpre, S., Mahari, R., Lee, A., and C. Lund, "Consent in
              Crisis: The Rapid Decline of the AI Data Commons", 2025,
              <https://www.ietf.org/slides/slides-aicontrolws-consent-
              in-crisis-the-rapid-decline-of-the-ai-data-commons-
              00.pdf>.

   [PAPERS]   Internet Architecture Board, "IAB Workshop on AI-CONTROL
              Materials",
              <https://datatracker.ietf.org/group/aicontrolws/
              materials/>.

   [RFC9309]  Koster, M., Illyes, G., Zeller, H., and L. Sassman,
              "Robots Exclusion Protocol", RFC 9309,
              DOI 10.17487/RFC9309, September 2022,
              <https://www.rfc-editor.org/info/rfc9309>.

Appendix A. About the Workshop

付録A. ワークショップについて

The AI-CONTROL Workshop was held on 2024-09-19 and 2024-09-20 at Wilkinson Barker Knauer in Washington, D.C., USA.

AI-CONTROL ワークショップは、2024 年 9 月 19 日と 2024 年 9 月 20 日に米国ワシントン D.C. の Wilkinson Barker Knauer で開催されました。

Workshop attendees were asked to submit position papers. These papers are published on the IAB website [PAPERS], unless the submitter requested it be withheld.

ワークショップの参加者はポジションペーパーの提出を求められた。これらの論文は、提出者が差し止めを要求しない限り、IAB Web サイト [PAPERS] で公開されます。

The workshop was conducted under the Chatham House Rule [CHATHAM-HOUSE], meaning that statements cannot be attributed to individuals or organizations without explicit authorization.

このワークショップは、チャタムハウス規則[CHATHAM-HOUSE]に基づいて実施されました。これは、明示的な許可なしに発言を個人または組織に帰すことはできないことを意味します。

A.1. Agenda

A.1. 議題

This section outlines the broad areas of discussion on each day.

このセクションでは、毎日の議論の広範な分野の概要を説明します。

A.1.1. Thursday, 2024-09-19

A.1.1. 木曜日, 2024-09-19

Setting the stage:

舞台設定:

An overview of the current state of AI opt-out, its impact, and existing work in this space

AI オプトアウトの現状、その影響、この分野における既存の取り組みの概要

Lightning talks:

ライトニングトーク:

A variety of perspectives from participants

参加者からのさまざまな視点

A.1.2. Friday, 2024-09-20

A.1.2. 2024 年 9 月 20 日金曜日

Opt-Out Attachment: robots.txt and beyond:

オプトアウト添付ファイル: robots.txt 以降:

Considerations in how preferences are attached to content on the Internet

インターネット上のコンテンツに設定を付加する方法に関する考慮事項

Vocabulary: what opt-out means:

語彙: オプトアウトの意味:

What information the opt-out signal needs to convey

オプトアウト信号で伝える必要がある情報

Discussion and wrap-up:

ディスカッションとまとめ:

Synthesis of the workshop's topics and how future work might unfold

ワークショップのトピックの総合と将来の作業の展開

A.2. Attendees

A.2. 出席者

Attendees of the workshop are listed with their primary affiliation. Attendees from the program committee (PC) and the Internet Architecture Board (IAB) are also marked.

ワークショップの参加者は主な所属とともにリストされます。プログラム委員会 (PC) およびインターネットアーキテクチャ委員会 (IAB) の出席者もマークされます。

* Jari Arkko, Ericsson

* ヤリ・アークコ、エリクソン

* Hirochika Asai, Preferred Networks

* 浅井宏親、Preferred Networks

* Farzaneh Badiei, Digital Medusa (PC)

* ファルザネ・バディエイ、デジタルメドゥーサ (PC)

* Fabrice Canel, Microsoft (PC)

* ファブリス・カネル、マイクロソフト (PC)

* Lena Cohen, EFF

* レナ・コーエン、EFF

* Alissa Cooper, Knight-Georgetown Institute (PC, IAB)

* Alissa Cooper、ナイトジョージタウン研究所 (PC、IAB)

* Marwan Fayed, Cloudflare

* マーワン・フェイド、クラウドフレア

* Christopher Flammang, Elsevier

* クリストファー・フラマン、エルゼビア

* Carl Gahnberg

* カール・ガーンバーグ

* Max Gendler, The News Corporation

* マックス・ジェンドラー、ニューズ・コーポレーション

* Ted Hardie

* テッド・ハーディ

* Dominique Hazaël-Massieux, W3C

* ドミニク・アザエル＝マシュー、W3C

* Gary Ilyes, Google (PC)

* ゲイリー・アイリーズ、Google (PC)

* Sarah Jennings, UK Department for Science, Innovation and Technology

* Sarah Jennings 氏、英国科学イノベーション技術省

* Paul Keller, Open Future

* ポール・ケラー『オープン・フューチャー』

* Elizabeth Kendall, Meta

* エリザベス・ケンダル、メタ

* Suresh Krishnan, Cisco (PC, IAB)

* スレシュ・クリシュナン、シスコ (PC、IAB)

* Mirja Kühlewind, Ericsson (PC, IAB)

* ミルヤ・キューレヴィント、エリクソン (PC、IAB)

* Greg Leppert, Berkman Klein Center

* グレッグ・レパート、バークマン・クライン・センター

* Greg Lindahl, Common Crawl Foundation

* グレッグ・リンダール、コモン・クロール財団

* Mike Linksvayer, GitHub

* マイク・リンクスヴェイヤー、GitHub

* Fred von Lohmann, OpenAI

* フレッド・フォン・ローマン、OpenAI

* Shayne Longpre, Data Provenance Initiative

* Shayne Longpre、データ出所イニシアチブ

* Don Marti, Raptive

* ドン・マルティ、ラプティブ

* Sarah McKenna, Alliance for Responsible Data Collection; Sequentum

* サラ・マッケンナ、責任あるデータ収集のための同盟。シークエンタム

* Eric Null, Center for Democracy and Technology

* エリック・ヌル、民主主義とテクノロジーセンター

* Chris Needham, BBC

* クリス・ニーダム、BBC

* Mark Nottingham, Cloudflare (PC)

* マーク・ノッティンガム、クラウドフレア (PC)

* Paul Ohm, Georgetown Law (PC)

* ポール・オーム、ジョージタウン法 (PC)

* Braxton Perkins, NBC Universal

* ブラクストン・パーキンス、NBC ユニバーサル

* Chris Petrillo, Wikimedia

* クリス・ペトリロ、ウィキメディア

* Sebastian Posth, Liccium

* セバスチャン・ポスト、リシウム

* Michael Prorock

* マイケル・プロロック

* Matt Rogerson, Financial Times

* マット・ロジャーソン、フィナンシャル・タイムズ

* Peter Santhanam, IBM

* ピーター・サンタナム、IBM

* Jeffrey Sedlik, IPTC/PLUS

* ジェフリー・セドリック、IPTC/PLUS

* Rony Shalit, Alliance For Responsible Data Collection; Bright Data

* ロニー・シャリット、責任あるデータ収集のための同盟。ブライトデータ

* Ian Sohl, OpenAI

* イアン・ソール、OpenAI

* Martin Thomson, Mozilla

* マーティン・トムソン、Mozilla

* Thom Vaughan, Common Crawl Foundation (PC)

* トム・ヴォーン、コモン・クロール財団 (PC)

* Kat Walsh, Creative Commons

* キャット・ウォルシュ、クリエイティブ・コモンズ

* James Whymark, Meta

* ジェームズ・ワイマーク、メタ

The following participants requested that their identity and/or affiliation not be revealed:

以下の参加者は、身元や所属を明らかにしないことを要求しました。

* A government official

* 政府関係者

IAB Members at the Time of Approval

承認時の IAB メンバー

Internet Architecture Board members at the time this document was approved for publication were:

この文書の発行が承認された時点のインターネットアーキテクチャ委員会のメンバーは次のとおりです。

* Matthew Bocci

* マシュー・ボッチ

* Roman Danyliw

* ローマン・ダニリュー

* Dhruv Dhody

* ドゥルブ・ドーディ

* Jana Iyengar

* ヤナ・アイアンガー

* Cullen Jennings

* カレン・ジェニングス

* Suresh Krishnan

* スレシュ・クリシュナン

* Mirja Kühlewind

* ミルヤ・キューレヴィント

* Warren Kumari

* ウォーレン・クマリ

* Jason Livingood

* ジェイソン・リヴィングッド

* Mark Nottingham

* マーク・ノッティンガム

* Tommy Pauly

* トミー・ポーリー

* Alvaro Retana

* アルバロ・レタナ

* Qin Wu

* 秦呉

Acknowledgements

謝辞

The program committee and the IAB would like to thank Wilkinson Barker Knauer for their generosity in hosting the workshop.

プログラム委員会とIABは、ワークショップを主催してくださったウィルキンソン・バーカー・ナウアー氏の寛大さに感謝したいと思います。

We also thank our scribes for capturing notes that assisted in the production of this report:

また、このレポートの作成に役立つメモを記録してくれた筆記者にも感謝します。

* Zander Arnao

* ザンダー・アルナオ

* Andrea Dean

* アンドレア・ディーン

* Patrick Yurky

* パトリック・ユルキー

Authors' Addresses

著者の住所

   Mark Nottingham
   Melbourne
   Australia
   Email: mnot@mnot.net
   URI:   https://www.mnot.net/

   Suresh Krishnan
   Email: suresh.krishnan@gmail.com