|
| 1 | +--- |
| 2 | +editors: |
| 3 | + - name: Robin Berjon |
| 4 | + email: robin@berjon.com |
| 5 | + url: https://berjon.com/ |
| 6 | + github: darobin |
| 7 | + twitter: robinberjon |
| 8 | + mastodon: "@robin@mastodon.social" |
| 9 | + affiliation: |
| 10 | + name: Protocol Labs |
| 11 | + url: https://protocol.ai/ |
| 12 | +maturity: reliable |
| 13 | +date: 2023-03-28 |
| 14 | +--- |
| 15 | + |
| 16 | +# IPFS Principles |
| 17 | + |
| 18 | +The IPFS stack is a suite of specifications and tools that share two key characteristics: |
| 19 | + |
| 20 | +1. Data is addressed by its contents using an extensible verifiability mechanism, and |
| 21 | +2. Data is moved in ways that are tolerant of arbitrary transport methods. |
| 22 | + |
| 23 | +This document provides context and details about these characteristics. In doing so it defines |
| 24 | +what is or is not an IPFS implementation. This is a **living document**; it is expected to |
| 25 | +change over time as we define more of the principles that guide the architecture of IPFS or |
| 26 | +find clearer ways of describing those we have already defined. |
| 27 | + |
| 28 | +## Addressing |
| 29 | + |
| 30 | +The web's early designers conceived it as a universal space in which identifiers map to |
| 31 | +information resources. As the web grew, they enshrined in |
| 32 | +[web architecture](https://www.w3.org/TR/webarch/#identification) that all resources |
| 33 | +should have an identifier and |
| 34 | +defined "addressability" as meaning that |
| 35 | +"[*a URI alone is sufficient for an agent to carry out a particular type of interaction.*](https://www.w3.org/2001/tag/doc/whenToUseGet.html#uris)" (:cite[webarch]) |
| 36 | + |
| 37 | +This design is tremendously successful. For all its flaws, the web brings together a |
| 38 | +huge diversity of software, services, and resources under universal addressability. |
| 39 | + |
| 40 | +Unfortunately, HTTP addressability is based on a hierarchy of authorities that places |
| 41 | +resources under the control of a host and places hosts under the control of the DNS system |
| 42 | +(further issues with this model are discussed further in the Appendix). As indicated |
| 43 | +in :cite[RFC3986]: |
| 44 | + |
| 45 | +> Many URI schemes include a hierarchical element for a naming |
| 46 | +> authority so that governance of the name space defined by the |
| 47 | +> remainder of the URI is delegated to that authority (which may, in |
| 48 | +> turn, delegate it further). |
| 49 | +
|
| 50 | +[CIDs](https://github.com/multiformats/cid) in IPFS offer an improvement over HTTP URLs by |
| 51 | +maintaining universal addressability while eliminating the attack vectors inherent in |
| 52 | +hierarchical authority. Content addressability derives identifiers from the content of |
| 53 | +an information resource, such that any party can both mint the identifier and verify |
| 54 | +that it maps to the right resource. This eliminates the need for any authority outside |
| 55 | +of the resource itself to certify its content. It makes CIDs the universal |
| 56 | +self-certifying addressability component of the web. |
| 57 | + |
| 58 | +Addressing data using [CIDs](https://github.com/multiformats/cid) is the first defining |
| 59 | +characteristic of IPFS. And the second characteristic, transport-agnosticity, can be |
| 60 | +supported thanks to the verifiability that CIDs offer. Across a vast diversity of implementations, |
| 61 | +architectures, and services, *IPFS is the space of resources that can be interacted with |
| 62 | +over arbitrary transports using a CID*. As Juan Benet once put it, |
| 63 | +"[*That's it!*](https://github.com/multiformats/cid/commit/ece08b40a6b1e9eeafc224e2757d8d1ef3317163#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R43)" |
| 64 | + |
| 65 | +Conversely, any system that exposes interactions with resources based on CIDs is |
| 66 | +an IPFS implementation. There are |
| 67 | +[many contexts in which CIDs can be used for addressing](https://docs.ipfs.tech/how-to/address-ipfs-on-web/) |
| 68 | +and [content routing delegation](https://github.com/ipfs/specs/blob/main/routing/ROUTING_V1_HTTP.md) |
| 69 | +can support a wealth of interaction options by resolving CIDs. |
| 70 | + |
| 71 | +## Robustness |
| 72 | + |
| 73 | +Common wisdom about network protocol design is captured by *Postel's Law* or the |
| 74 | +*Robustness Principle*. Over the years it has developed multiple formulations, but the |
| 75 | +canonical one from :cite[RFC1958] ("*Architectural Principles of the Internet*") is: |
| 76 | + |
| 77 | +> Be strict when sending and tolerant when receiving. |
| 78 | +
|
| 79 | +This principle is elegant, and expresses an intuitively pleasing behavior of protocol |
| 80 | +implementations. However, over the years, the experience of internet and web protocol |
| 81 | +designers has been that this principle can have detrimental effects on interoperability. |
| 82 | +As discussed in the Internet Architecture Board's recent work on |
| 83 | +[*Maintaining Robust Protocols*](https://datatracker.ietf.org/doc/html/draft-iab-protocol-maintenance), |
| 84 | +implementations that silently accept faulty input can lead to interoperability defects |
| 85 | +accumulating over time, leading the overall protocol ecosystem to decay. |
| 86 | + |
| 87 | +There are two equilibrium points for protocol ecosystems: when deployed implementations |
| 88 | +are strict, new implementations, out of necessity, are required to be strict as well, leading to a |
| 89 | +strict ecosystem; conversely, when deployed implementations are tolerant, new |
| 90 | +implementations will have a strong incentive to tolerate non-compliance so as to |
| 91 | +interoperate. Tolerance is highly desirable for extensibility and adaptability to new |
| 92 | +environments, but strictness is highly desirable to prevent a protocol ecosystem from |
| 93 | +decaying into a complex collection of corner cases with poor or difficult |
| 94 | +interoperability (what the IETF refers to as |
| 95 | +"[virtuous intolerance](https://datatracker.ietf.org/doc/html/draft-iab-protocol-maintenance#name-virtuous-intolerance)"). |
| 96 | + |
| 97 | +IPFS approaches this problem space with a new iteration on the robustness principle: |
| 98 | + |
| 99 | +> Be strict about the outcomes, be tolerant about the methods. |
| 100 | +
|
| 101 | +CIDs enforce strict outcomes because the mapping from address to content is verified; |
| 102 | +there is no room for outcomes that deviate from the intent expressed in an address. |
| 103 | +This strictness is complemented by a design that proactively expects change thanks to |
| 104 | +a self-describing format (CIDs are a [multiformat](https://multiformats.io/) implementation and support |
| 105 | +an open-ended list of hashes, codecs, etc.). The endpoints being enforceably strict means |
| 106 | +that everything else, notably transport, can be tolerant. Being tolerant about methods |
| 107 | +enables adaptability in how the protocol works, notably in how it can adapt to specific |
| 108 | +environments, and in how intelligence can be applied at the endpoints in novel ways, while |
| 109 | +being strict with outcomes guarantees that the result will be correct and interoperable. |
| 110 | + |
| 111 | +Note that this approach to robustness also covers the |
| 112 | +[End-to-end Principle](https://en.wikipedia.org/wiki/End-to-end_principle). The end-to-end |
| 113 | +principle states that the reliability properties of a protocol have to be |
| 114 | +supported at its endpoints and not in intermediary nodes. For instance, you can best guarantee |
| 115 | +the confidentiality or authenticity of a message by encrypting or signing at one endpoint and |
| 116 | +decrypting or verifying at the other rather than asking relaying nodes to implement local |
| 117 | +protections. IPFS's aproach to robustness, via CIDs, is well aligned with that principle. |
| 118 | + |
| 119 | +## IPFS Implementation Requirements |
| 120 | + |
| 121 | +An :dfn[IPFS Implementation]: |
| 122 | +- MUST support addressability using CIDs. |
| 123 | +- MUST expose operations (eg. retrieval, provision, indexing) on resources using CIDs. The operations |
| 124 | + that an implementation may support is an open-ended set, but this requirement should cover any interaction |
| 125 | + which the implementation exposes to other IPFS implementations. |
| 126 | +- MUST verify that the CIDs it resolves match the resources they address, at least when it |
| 127 | + has access to the resources' bytes. Implementations MAY relax this requirement in |
| 128 | + controlled environments in which it is possible to ascertain that verification has happened |
| 129 | + elsewhere in a trusted part of the system. |
| 130 | +- SHOULD name all the important resources it exposes using CIDs. Determining which resources are |
| 131 | + important is a matter of judgment, but anything that another agent might legitimately wish to |
| 132 | + access is in scope, and it is best to err on the side of inclusion. |
| 133 | +- SHOULD expose the logical units of data that structure a resource (eg. a CBOR document, a file or |
| 134 | + directory, a branch of a B-tree search index) using CIDs. |
| 135 | +- SHOULD support incremental verifiability, notably so that it may process content of arbitrary sizes. |
| 136 | +- MAY rely on any transport layer. The transport layer cannot dictate or constrain the way in which |
| 137 | + CIDs map to content. |
| 138 | + |
| 139 | +## Boundary Examples |
| 140 | + |
| 141 | +These IPFS principles are broad. This is by design because, like HTTP, IPFS supports an open-ended set of |
| 142 | +use cases and is adaptable to a broad array of operating conditions. Considering cases |
| 143 | +at the boundary may help develop an intuition for the limits that these principles draw. |
| 144 | + |
| 145 | +### Other Content-Addressing Systems |
| 146 | + |
| 147 | +CIDs are readily made compatible with other content-addressable systems, but this does not |
| 148 | +entail that all content-addressable systems are part of IPFS. Git's SHA1 hashes aren't CIDs |
| 149 | +but can be converted into CIDs by prefixing them with `f01781114`. Likewise, BitTorrent v2 |
| 150 | +uses multihashes in the `btmh:` scheme. BitTorrent addresses aren't CIDs, but can be |
| 151 | +converted to CIDs by replacing `btmh:` with `f017b`. |
| 152 | + |
| 153 | +The simplicity with which one can expose these existing system over IPFS by simply prefixing |
| 154 | +existing addresses to mint CIDs enables radical interoperability with other content-addressable |
| 155 | +systems. |
| 156 | + |
| 157 | +### Verification Matters |
| 158 | + |
| 159 | +The requirements above state that an implementation may forgo verification when "*it is |
| 160 | +possible to ascertain that verification has happened elsewhere in a trusted part of the system.*" |
| 161 | +This is intended as a strict requirement in which implementors take trustlessness seriously, an indication |
| 162 | +that it's okay to not constantly spend cycles verifying hashes in an internal setup which you |
| 163 | +have reasons to believe is trustworthy. This is *not* a licence to trust an arbitrary data |
| 164 | +source just because you like them. |
| 165 | + |
| 166 | +For instance: |
| 167 | + |
| 168 | +- A JS code snippet that fetches data from an IPFS HTTP gateway without verifying it is not an |
| 169 | + IPFS implementation. |
| 170 | +- An IPFS HTTP gateway that verifies the data that it is pulling from other IPFS implementations |
| 171 | + before serving it over HTTP is an IPFS implementation. |
| 172 | +- That JS piece of code in the first bullet can be turned into an IPFS implementation if it |
| 173 | + fetches from a :cite[trustless-gateway] and verifies what it gets. |
| 174 | + |
| 175 | +## Self-Certifying Addressability |
| 176 | + |
| 177 | +:dfn[Authority] is control over a given domain of competence. :dfn[Naming authority] is |
| 178 | +control over what resources are called. |
| 179 | + |
| 180 | +:dfn[Addressability] is the property of a naming system such that its names are sufficient |
| 181 | +for an agent to interact with the resources being named. |
| 182 | + |
| 183 | +:dfn[Verifiability] is the property of a naming system such that an agent can certify that |
| 184 | +the mapping between a name it uses and a resource it is interacting with is correct without |
| 185 | +recourse to an authority other than itself and the resource. |
| 186 | + |
| 187 | +:dfn[Self-certifying addressability] is the property of a naming system such that it is both |
| 188 | +addressable and verifiable: any name is sufficient to interact with a resource, and its mapping |
| 189 | +to that resource can be certified without recourse to additional authority. Self-certifying |
| 190 | +addressability is a key component of a |
| 191 | +[self-certifying web](https://jaygraber.medium.com/web3-is-self-certifying-9dad77fd8d81) |
| 192 | +and it supports capture-resistance which can help mitigate against centralization. |
| 193 | + |
| 194 | +CIDs support :ref[self-certifying addressability]. With CIDs, the authority to name a resource |
| 195 | +resides only with that resource and derives directly from that resource's intrinsic |
| 196 | +property: its content. This frees interactions with CID-named resources from the power |
| 197 | +relation implicit in a client-server architecture. CIDs are the trust model of IPFS. |
| 198 | + |
| 199 | +An implementation may retrieve a CID without verifying that the resource matches it, but that |
| 200 | +loses the resource's naming authority. Such an implementation would be comparable to an HTTP |
| 201 | +client looking DNS records up from a random person's resolver: it cannot guarantee that the |
| 202 | +addressing is authoritative. Implementers may make informed decisions as to where in their |
| 203 | +systems they support verification, but they should ensure that the mapping between CID and resource |
| 204 | +is verified whenever they have access to both the resource and the CID that maps to it. |
| 205 | + |
| 206 | +## Appendix: Historical Notes |
| 207 | + |
| 208 | +We tend not to think about addressability because it is so foundational that we |
| 209 | +struggle to apprehend a system without it, but that is precisely why it is important |
| 210 | +that we get it right. You can find extensive historical evidence that TimBL and others saw |
| 211 | +URLs as arguably the most fundamental invention of the Web, and the early groups that |
| 212 | +worked on Web architecture discussed and debated the properties of URLs at length. The |
| 213 | +problems of centralization we face today trace their lineage back to those decisions. |
| 214 | + |
| 215 | +The hierarchical nature of the HTTP addresses was intentional, as TimBL wrote clearly in |
| 216 | +[Web Architecture from 50,000 feet](https://www.w3.org/DesignIssues/Architecture.html): |
| 217 | +> The HTTP space consists of two parts, one hierarchically delegated, for which the |
| 218 | +> Domain Name System is used, and the second an opaque string whose significance is |
| 219 | +> locally defined by the authority owning the domain name. |
| 220 | +
|
| 221 | +The model that the Web's earlier designers had in mind was a federated model |
| 222 | +in which authority is delegated and addresses are *owned* based on that |
| 223 | +authority delegation. This is notably clear in the *URI Ownership* passage of the |
| 224 | +[*Architecture of the World Wide Web, Volume One*](https://www.w3.org/TR/webarch/#def-uri-ownership): |
| 225 | +>URI ownership is a relation between a URI and a social entity, such as a person, |
| 226 | +>organization, or specification. URI ownership gives the relevant social entity certain |
| 227 | +>rights, including: |
| 228 | +> * to pass on ownership of some or all owned URIs to another owner—delegation; and |
| 229 | +> * to associate a resource with an owned URI—URI allocation. |
| 230 | +> |
| 231 | +> By social convention, URI ownership is delegated from the IANA URI scheme registry, |
| 232 | +> itself a social entity, to IANA-registered URI scheme specifications.(…) |
| 233 | +> |
| 234 | +> The approach taken for the "http" URI scheme, for example, follows the pattern whereby |
| 235 | +> the Internet community delegates authority, via the IANA URI scheme registry and the |
| 236 | +> DNS, over a set of URIs with a common prefix to one particular owner. One consequence |
| 237 | +> of this approach is the Web's heavy reliance on the central DNS registry.(…) |
| 238 | +> |
| 239 | +> URI owners are responsible for avoiding the assignment of equivalent URIs to multiple |
| 240 | +> resources. Thus, if a URI scheme specification does provide for the delegation of |
| 241 | +> individual or organized sets of URIs, it should take pains to ensure that ownership |
| 242 | +> ultimately resides in the hands of a single social entity. Allowing multiple owners |
| 243 | +> increases the likelihood of URI collisions. |
| 244 | +> |
| 245 | +> URI owners may organize or deploy infrastruture [sic] to ensure that representations of |
| 246 | +> associated resources are available and, where appropriate, interaction with the resource |
| 247 | +> is possible through the exchange of representations. There are social expectations for |
| 248 | +> responsible representation management (§3.5) by URI owners. Additional social |
| 249 | +> implications of URI ownership are not discussed here. |
| 250 | +
|
| 251 | +This notion of address or name ownership is |
| 252 | +[pervasive across architectural documents](https://www.w3.org/DesignIssues/). This passage |
| 253 | +from an interview of TimBL |
| 254 | +([Philosophical Engineering and Ownerhip of URIs](https://www.w3.org/DesignIssues/PhilosophicalEngineering.html)) is explicit: |
| 255 | +> **Alexandre Monnin**: Regarding names and URIs, a URI is not precisely a philosophical |
| 256 | +> concept, it's an artifiact [sic]. So you can own a URI while you cannot own a philosophical |
| 257 | +> name. The difference is entirely in this respect.\ |
| 258 | +> **Tim Berners-Lee**: For your definition of a philosophical name, you cannot own it. |
| 259 | +> Maybe in your world, in your philosophy, you don't deal with names that are owned, but |
| 260 | +> in the world we're talking about, names are owned. |
| 261 | +
|
| 262 | +This expectation of delegated naming authority was so strong among early Web architects |
| 263 | +that the development of naming conventions in HTTP space (eg. `robots.txt`, `favicon.ico`, |
| 264 | +all the `.well-known` paths) is described as "*expropriation*" in the |
| 265 | +[Web Architecture](https://www.w3.org/TR/webarch/) and the W3C's Technical Architecture |
| 266 | +Group (TAG) issue on the topic stated that it "breaks the web". |
| 267 | + |
| 268 | +Federated models only have weak capture-resistance because the federated entities can always |
| 269 | +concede power (precisely because they have ownership) but lack established means to |
| 270 | +support collective organization. As a result, any power imbalance will likely become hard |
| 271 | +to dislodge. A good example is search: as a publisher (the owner of delegated authority |
| 272 | +over your domain) you can cede the rights to index your content but you can't have a voice |
| 273 | +in what is done with the indexed content (individual opt out is not an option). This was |
| 274 | +fine when you could barter content for links, but once search power consolidated, the |
| 275 | +terms of trade deteriorated with no immediate recourse. |
| 276 | + |
| 277 | +## Acknowledgements |
| 278 | + |
| 279 | +Many thanks to the following people, listed alphabetically, whose feedback was instrumental |
| 280 | +in producing this document: |
| 281 | +Adin Schmahmann, |
| 282 | +biglep, |
| 283 | +Dietrich Ayala, |
| 284 | +Juan Benet, |
| 285 | +lidel, |
| 286 | +Molly Mackinlay, and |
| 287 | +mosh. |
0 commit comments