Skip to content

Commit 48da7a6

Browse files
authored
Merge pull request #390 from ipfs/ipfs-principles
IPFS Principles
2 parents 74711ac + 52834c6 commit 48da7a6

File tree

3 files changed

+323
-0
lines changed

3 files changed

+323
-0
lines changed

src/architecture/index.html

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
---
2+
title: Architecture
3+
description: |
4+
These documents define the architectural principles that IPFS is built upon, and can be used as tools to evaluate
5+
implementations and applications of IPFS.
6+
---
7+
8+
{% include 'header.html' %}
9+
10+
<main>
11+
<dl>
12+
<dt><a href="/architecture/principles/">IPFS Principles</a></dt>
13+
<dd>
14+
IPFS is a suite of specifications and tools that are defined by two key characteristics: content-addressing and
15+
transport-agnosticity. This document provides context and details about these characteristics. In doing so it defines what
16+
is or is not an IPFS implementation.
17+
</dd>
18+
</dl>
19+
</main>
20+
21+
{% include 'footer.html' %}

src/architecture/principles.md

Lines changed: 287 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,287 @@
1+
---
2+
editors:
3+
- name: Robin Berjon
4+
email: robin@berjon.com
5+
url: https://berjon.com/
6+
github: darobin
7+
twitter: robinberjon
8+
mastodon: "@robin@mastodon.social"
9+
affiliation:
10+
name: Protocol Labs
11+
url: https://protocol.ai/
12+
maturity: reliable
13+
date: 2023-03-28
14+
---
15+
16+
# IPFS Principles
17+
18+
The IPFS stack is a suite of specifications and tools that share two key characteristics:
19+
20+
1. Data is addressed by its contents using an extensible verifiability mechanism, and
21+
2. Data is moved in ways that are tolerant of arbitrary transport methods.
22+
23+
This document provides context and details about these characteristics. In doing so it defines
24+
what is or is not an IPFS implementation. This is a **living document**; it is expected to
25+
change over time as we define more of the principles that guide the architecture of IPFS or
26+
find clearer ways of describing those we have already defined.
27+
28+
## Addressing
29+
30+
The web's early designers conceived it as a universal space in which identifiers map to
31+
information resources. As the web grew, they enshrined in
32+
[web architecture](https://www.w3.org/TR/webarch/#identification) that all resources
33+
should have an identifier and
34+
defined "addressability" as meaning that
35+
"[*a URI alone is sufficient for an agent to carry out a particular type of interaction.*](https://www.w3.org/2001/tag/doc/whenToUseGet.html#uris)" (:cite[webarch])
36+
37+
This design is tremendously successful. For all its flaws, the web brings together a
38+
huge diversity of software, services, and resources under universal addressability.
39+
40+
Unfortunately, HTTP addressability is based on a hierarchy of authorities that places
41+
resources under the control of a host and places hosts under the control of the DNS system
42+
(further issues with this model are discussed further in the Appendix). As indicated
43+
in :cite[RFC3986]:
44+
45+
> Many URI schemes include a hierarchical element for a naming
46+
> authority so that governance of the name space defined by the
47+
> remainder of the URI is delegated to that authority (which may, in
48+
> turn, delegate it further).
49+
50+
[CIDs](https://github.com/multiformats/cid) in IPFS offer an improvement over HTTP URLs by
51+
maintaining universal addressability while eliminating the attack vectors inherent in
52+
hierarchical authority. Content addressability derives identifiers from the content of
53+
an information resource, such that any party can both mint the identifier and verify
54+
that it maps to the right resource. This eliminates the need for any authority outside
55+
of the resource itself to certify its content. It makes CIDs the universal
56+
self-certifying addressability component of the web.
57+
58+
Addressing data using [CIDs](https://github.com/multiformats/cid) is the first defining
59+
characteristic of IPFS. And the second characteristic, transport-agnosticity, can be
60+
supported thanks to the verifiability that CIDs offer. Across a vast diversity of implementations,
61+
architectures, and services, *IPFS is the space of resources that can be interacted with
62+
over arbitrary transports using a CID*. As Juan Benet once put it,
63+
"[*That's it!*](https://github.com/multiformats/cid/commit/ece08b40a6b1e9eeafc224e2757d8d1ef3317163#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R43)"
64+
65+
Conversely, any system that exposes interactions with resources based on CIDs is
66+
an IPFS implementation. There are
67+
[many contexts in which CIDs can be used for addressing](https://docs.ipfs.tech/how-to/address-ipfs-on-web/)
68+
and [content routing delegation](https://github.com/ipfs/specs/blob/main/routing/ROUTING_V1_HTTP.md)
69+
can support a wealth of interaction options by resolving CIDs.
70+
71+
## Robustness
72+
73+
Common wisdom about network protocol design is captured by *Postel's Law* or the
74+
*Robustness Principle*. Over the years it has developed multiple formulations, but the
75+
canonical one from :cite[RFC1958] ("*Architectural Principles of the Internet*") is:
76+
77+
> Be strict when sending and tolerant when receiving.
78+
79+
This principle is elegant, and expresses an intuitively pleasing behavior of protocol
80+
implementations. However, over the years, the experience of internet and web protocol
81+
designers has been that this principle can have detrimental effects on interoperability.
82+
As discussed in the Internet Architecture Board's recent work on
83+
[*Maintaining Robust Protocols*](https://datatracker.ietf.org/doc/html/draft-iab-protocol-maintenance),
84+
implementations that silently accept faulty input can lead to interoperability defects
85+
accumulating over time, leading the overall protocol ecosystem to decay.
86+
87+
There are two equilibrium points for protocol ecosystems: when deployed implementations
88+
are strict, new implementations, out of necessity, are required to be strict as well, leading to a
89+
strict ecosystem; conversely, when deployed implementations are tolerant, new
90+
implementations will have a strong incentive to tolerate non-compliance so as to
91+
interoperate. Tolerance is highly desirable for extensibility and adaptability to new
92+
environments, but strictness is highly desirable to prevent a protocol ecosystem from
93+
decaying into a complex collection of corner cases with poor or difficult
94+
interoperability (what the IETF refers to as
95+
"[virtuous intolerance](https://datatracker.ietf.org/doc/html/draft-iab-protocol-maintenance#name-virtuous-intolerance)").
96+
97+
IPFS approaches this problem space with a new iteration on the robustness principle:
98+
99+
> Be strict about the outcomes, be tolerant about the methods.
100+
101+
CIDs enforce strict outcomes because the mapping from address to content is verified;
102+
there is no room for outcomes that deviate from the intent expressed in an address.
103+
This strictness is complemented by a design that proactively expects change thanks to
104+
a self-describing format (CIDs are a [multiformat](https://multiformats.io/) implementation and support
105+
an open-ended list of hashes, codecs, etc.). The endpoints being enforceably strict means
106+
that everything else, notably transport, can be tolerant. Being tolerant about methods
107+
enables adaptability in how the protocol works, notably in how it can adapt to specific
108+
environments, and in how intelligence can be applied at the endpoints in novel ways, while
109+
being strict with outcomes guarantees that the result will be correct and interoperable.
110+
111+
Note that this approach to robustness also covers the
112+
[End-to-end Principle](https://en.wikipedia.org/wiki/End-to-end_principle). The end-to-end
113+
principle states that the reliability properties of a protocol have to be
114+
supported at its endpoints and not in intermediary nodes. For instance, you can best guarantee
115+
the confidentiality or authenticity of a message by encrypting or signing at one endpoint and
116+
decrypting or verifying at the other rather than asking relaying nodes to implement local
117+
protections. IPFS's aproach to robustness, via CIDs, is well aligned with that principle.
118+
119+
## IPFS Implementation Requirements
120+
121+
An :dfn[IPFS Implementation]:
122+
- MUST support addressability using CIDs.
123+
- MUST expose operations (eg. retrieval, provision, indexing) on resources using CIDs. The operations
124+
that an implementation may support is an open-ended set, but this requirement should cover any interaction
125+
which the implementation exposes to other IPFS implementations.
126+
- MUST verify that the CIDs it resolves match the resources they address, at least when it
127+
has access to the resources' bytes. Implementations MAY relax this requirement in
128+
controlled environments in which it is possible to ascertain that verification has happened
129+
elsewhere in a trusted part of the system.
130+
- SHOULD name all the important resources it exposes using CIDs. Determining which resources are
131+
important is a matter of judgment, but anything that another agent might legitimately wish to
132+
access is in scope, and it is best to err on the side of inclusion.
133+
- SHOULD expose the logical units of data that structure a resource (eg. a CBOR document, a file or
134+
directory, a branch of a B-tree search index) using CIDs.
135+
- SHOULD support incremental verifiability, notably so that it may process content of arbitrary sizes.
136+
- MAY rely on any transport layer. The transport layer cannot dictate or constrain the way in which
137+
CIDs map to content.
138+
139+
## Boundary Examples
140+
141+
These IPFS principles are broad. This is by design because, like HTTP, IPFS supports an open-ended set of
142+
use cases and is adaptable to a broad array of operating conditions. Considering cases
143+
at the boundary may help develop an intuition for the limits that these principles draw.
144+
145+
### Other Content-Addressing Systems
146+
147+
CIDs are readily made compatible with other content-addressable systems, but this does not
148+
entail that all content-addressable systems are part of IPFS. Git's SHA1 hashes aren't CIDs
149+
but can be converted into CIDs by prefixing them with `f01781114`. Likewise, BitTorrent v2
150+
uses multihashes in the `btmh:` scheme. BitTorrent addresses aren't CIDs, but can be
151+
converted to CIDs by replacing `btmh:` with `f017b`.
152+
153+
The simplicity with which one can expose these existing system over IPFS by simply prefixing
154+
existing addresses to mint CIDs enables radical interoperability with other content-addressable
155+
systems.
156+
157+
### Verification Matters
158+
159+
The requirements above state that an implementation may forgo verification when "*it is
160+
possible to ascertain that verification has happened elsewhere in a trusted part of the system.*"
161+
This is intended as a strict requirement in which implementors take trustlessness seriously, an indication
162+
that it's okay to not constantly spend cycles verifying hashes in an internal setup which you
163+
have reasons to believe is trustworthy. This is *not* a licence to trust an arbitrary data
164+
source just because you like them.
165+
166+
For instance:
167+
168+
- A JS code snippet that fetches data from an IPFS HTTP gateway without verifying it is not an
169+
IPFS implementation.
170+
- An IPFS HTTP gateway that verifies the data that it is pulling from other IPFS implementations
171+
before serving it over HTTP is an IPFS implementation.
172+
- That JS piece of code in the first bullet can be turned into an IPFS implementation if it
173+
fetches from a :cite[trustless-gateway] and verifies what it gets.
174+
175+
## Self-Certifying Addressability
176+
177+
:dfn[Authority] is control over a given domain of competence. :dfn[Naming authority] is
178+
control over what resources are called.
179+
180+
:dfn[Addressability] is the property of a naming system such that its names are sufficient
181+
for an agent to interact with the resources being named.
182+
183+
:dfn[Verifiability] is the property of a naming system such that an agent can certify that
184+
the mapping between a name it uses and a resource it is interacting with is correct without
185+
recourse to an authority other than itself and the resource.
186+
187+
:dfn[Self-certifying addressability] is the property of a naming system such that it is both
188+
addressable and verifiable: any name is sufficient to interact with a resource, and its mapping
189+
to that resource can be certified without recourse to additional authority. Self-certifying
190+
addressability is a key component of a
191+
[self-certifying web](https://jaygraber.medium.com/web3-is-self-certifying-9dad77fd8d81)
192+
and it supports capture-resistance which can help mitigate against centralization.
193+
194+
CIDs support :ref[self-certifying addressability]. With CIDs, the authority to name a resource
195+
resides only with that resource and derives directly from that resource's intrinsic
196+
property: its content. This frees interactions with CID-named resources from the power
197+
relation implicit in a client-server architecture. CIDs are the trust model of IPFS.
198+
199+
An implementation may retrieve a CID without verifying that the resource matches it, but that
200+
loses the resource's naming authority. Such an implementation would be comparable to an HTTP
201+
client looking DNS records up from a random person's resolver: it cannot guarantee that the
202+
addressing is authoritative. Implementers may make informed decisions as to where in their
203+
systems they support verification, but they should ensure that the mapping between CID and resource
204+
is verified whenever they have access to both the resource and the CID that maps to it.
205+
206+
## Appendix: Historical Notes
207+
208+
We tend not to think about addressability because it is so foundational that we
209+
struggle to apprehend a system without it, but that is precisely why it is important
210+
that we get it right. You can find extensive historical evidence that TimBL and others saw
211+
URLs as arguably the most fundamental invention of the Web, and the early groups that
212+
worked on Web architecture discussed and debated the properties of URLs at length. The
213+
problems of centralization we face today trace their lineage back to those decisions.
214+
215+
The hierarchical nature of the HTTP addresses was intentional, as TimBL wrote clearly in
216+
[Web Architecture from 50,000 feet](https://www.w3.org/DesignIssues/Architecture.html):
217+
> The HTTP space consists of two parts, one hierarchically delegated, for which the
218+
> Domain Name System is used, and the second an opaque string whose significance is
219+
> locally defined by the authority owning the domain name.
220+
221+
The model that the Web's earlier designers had in mind was a federated model
222+
in which authority is delegated and addresses are *owned* based on that
223+
authority delegation. This is notably clear in the *URI Ownership* passage of the
224+
[*Architecture of the World Wide Web, Volume One*](https://www.w3.org/TR/webarch/#def-uri-ownership):
225+
>URI ownership is a relation between a URI and a social entity, such as a person,
226+
>organization, or specification. URI ownership gives the relevant social entity certain
227+
>rights, including:
228+
> * to pass on ownership of some or all owned URIs to another owner—delegation; and
229+
> * to associate a resource with an owned URI—URI allocation.
230+
>
231+
> By social convention, URI ownership is delegated from the IANA URI scheme registry,
232+
> itself a social entity, to IANA-registered URI scheme specifications.(…)
233+
>
234+
> The approach taken for the "http" URI scheme, for example, follows the pattern whereby
235+
> the Internet community delegates authority, via the IANA URI scheme registry and the
236+
> DNS, over a set of URIs with a common prefix to one particular owner. One consequence
237+
> of this approach is the Web's heavy reliance on the central DNS registry.(…)
238+
>
239+
> URI owners are responsible for avoiding the assignment of equivalent URIs to multiple
240+
> resources. Thus, if a URI scheme specification does provide for the delegation of
241+
> individual or organized sets of URIs, it should take pains to ensure that ownership
242+
> ultimately resides in the hands of a single social entity. Allowing multiple owners
243+
> increases the likelihood of URI collisions.
244+
>
245+
> URI owners may organize or deploy infrastruture [sic] to ensure that representations of
246+
> associated resources are available and, where appropriate, interaction with the resource
247+
> is possible through the exchange of representations. There are social expectations for
248+
> responsible representation management (§3.5) by URI owners. Additional social
249+
> implications of URI ownership are not discussed here.
250+
251+
This notion of address or name ownership is
252+
[pervasive across architectural documents](https://www.w3.org/DesignIssues/). This passage
253+
from an interview of TimBL
254+
([Philosophical Engineering and Ownerhip of URIs](https://www.w3.org/DesignIssues/PhilosophicalEngineering.html)) is explicit:
255+
> **Alexandre Monnin**: Regarding names and URIs, a URI is not precisely a philosophical
256+
> concept, it's an artifiact [sic]. So you can own a URI while you cannot own a philosophical
257+
> name. The difference is entirely in this respect.\
258+
> **Tim Berners-Lee**: For your definition of a philosophical name, you cannot own it.
259+
> Maybe in your world, in your philosophy, you don't deal with names that are owned, but
260+
> in the world we're talking about, names are owned.
261+
262+
This expectation of delegated naming authority was so strong among early Web architects
263+
that the development of naming conventions in HTTP space (eg. `robots.txt`, `favicon.ico`,
264+
all the `.well-known` paths) is described as "*expropriation*" in the
265+
[Web Architecture](https://www.w3.org/TR/webarch/) and the W3C's Technical Architecture
266+
Group (TAG) issue on the topic stated that it "breaks the web".
267+
268+
Federated models only have weak capture-resistance because the federated entities can always
269+
concede power (precisely because they have ownership) but lack established means to
270+
support collective organization. As a result, any power imbalance will likely become hard
271+
to dislodge. A good example is search: as a publisher (the owner of delegated authority
272+
over your domain) you can cede the rights to index your content but you can't have a voice
273+
in what is done with the indexed content (individual opt out is not an option). This was
274+
fine when you could barter content for links, but once search power consolidated, the
275+
terms of trade deteriorated with no immediate recourse.
276+
277+
## Acknowledgements
278+
279+
Many thanks to the following people, listed alphabetically, whose feedback was instrumental
280+
in producing this document:
281+
Adin Schmahmann,
282+
biglep,
283+
Dietrich Ayala,
284+
Juan Benet,
285+
lidel,
286+
Molly Mackinlay, and
287+
mosh.

src/index.html

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,21 @@ <h3><a href="/ipns/">InterPlanetary Naming System</a></h3>
7878
</p>
7979
{% include 'specs/ipns.html' %}
8080
</section>
81+
<section>
82+
<h3><a href="/architecture/">Architecture</a></h3>
83+
<p>
84+
These documents define the architectural principles that IPFS is built upon, and can be used as tools to evaluate
85+
implementations and applications of IPFS.
86+
</p>
87+
<dl>
88+
<dt><a href="/architecture/principles/">IPFS Principles</a></dt>
89+
<dd>
90+
IPFS is a suite of specifications and tools that are defined by two key characteristics: content-addressing and
91+
transport-agnosticity. This document provides context and details about these characteristics. In doing so it defines what
92+
is or is not an IPFS implementation.
93+
</dd>
94+
</dl>
95+
</section>
8196
<section>
8297
<h3><a href="/meta/">Meta</a></h3>
8398
<p>

0 commit comments

Comments
 (0)