Skip to content

IPIP 0499: CID Profiles #499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions src/ipips/ipip-0499.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
# IPIP number should match its pull request number. After you open a PR,
# please update title and update the filename to `ipip0000`.
title: "IPIP-0499: CID Profiles"
date: 2025-04-03
ipip: proposal
editors:
- name: Michelle Lee
github: mishmosh
affiliation:
name: IPFS Foundation
relatedIssues:
- https://discuss.ipfs.tech/t/should-we-profile-cids/18507
order: 0499
tags: ['ipips']
---

## Summary

<!--One paragraph explanation of the IPIP.-->
This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters.

## Motivation

Currently, CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID. Profiles offer With profiles, following the same profile will produce identical CIDs for identical content, whic makes verification regardless of implementation.

## Detailed design

We introduce a profile naming system,

Each profile must specify the following characteristics:

1. CID version (currently only CIDv0 or CIDv1)
1. Hash algorithm
1. UnixFS Chunk algorithm (e.g. size-based or content-based)
1. UnixFS directory DAG layout (e.g. balanced, trickle)
1. UnixFS file DAG width (max number of links per `File` node)
1. UnixFS directory DAG width (max number of links per basic `Directory` node)
1. UnixFS HAMT directory DAG threshold (max `Directory` size before switching to `HAMTDirectory`)
1. HAMT directory DAG width (max number of fanout links per internal HAMTDirectory node)
1. Leaf Envelope (historically `dag-pb`, CIDv1 introduced `raw` leaves)
1. Empty directories (informative suggestion)

Additional profiles can be added at a future date. Profile names may be chosen from the names of any botanical tree with compound leaves.

This would be specified as a table in (forthcoming UnixFS spec).

## Design rationale

The profile names are chosen to be easy to pronounce.

Here is a summary table of current (2025-Q2) defaults, thanks to input & clarifications from @2color @achingbrain @lidel:

| | Helia default | Kubo `legacy-cid-v0` (default) | Storacha default | Kubo `test-cid-v1` | Kubo `test-cid-v1-wide` | DASL |
|---------------------------------|---------------|-----------------------------------|------------------|--------------------|---------------------------|---------------|
| CID version | CIDv1 | CIDv0 | CIDv1 | CIDv1 | CIDv1 | CIDv1 |
| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 |
| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | 1MiB | not specified |
| Max links `File` node | 1024 | 174 | 1024 | 174 | **1024** | not specified |
| Max links `Directory` node | ? | 0 | ? | 0 | 0 | ? |
| Max fanout `HAMTDirectory` node | 256 blocks | 256 blocks | 256 blocks | 256 blocks | **1024** | not specified |
| `HAMTDirectory` threshold | 256KiB (est) | 256KiB (est:links[name+cid]) | 1000 **links** | 256KiB | **1MiB** | not specified |
| DAG layout | balanced | balanced | balanced | balanced | balanced | not specified |
| Leaves | raw | raw | raw | raw | raw | not specified |
| Empty directories | allowed | allowed | disallowed | allowed | allowed | not specified |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about the default behaviour rather than whether empty dirs are allowed.


See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/18507/

### User benefit

Reliable, deterministic CIDs allow independent verification of content across tools and ipmlementations.

### Compatibility

Implementations will need to (1) make CID generation settings configurable and (2) support user setting of profiles.

Kubo 0.35 will have [`Import.*` configuration](https://github.com/ipfs/kubo/blob/master/docs/config.md#import) option to control DAG width.

### Security

TODO

### Alternatives

Another approach could be to name profiles based on the key UnixFS/CID parameters, e.g. v1-sha256-balanced-1mib-1024w-raw. This is longer and more convoluted.


#### Empty directories

Decision if empty directories should be included is left out of scope.
Copy link
Member

@2color 2color May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidel I think we should include this, because the goal of this spec is to make UnixFS encoding deterministic when using the profile, and this stands in the way of this, potentially rendering this whole effort futile if not included.

I think the profile should dictate whether empty directories are included. This should also be configurable such that you can adjust to your own needs.

On the same note, we should probably also add mention of hidden files which are also excluded by Storacha, Kubo, and Helia by default.


Tools can apply arbitrary filtering before passing filesystem entries
to be converted into a DAG, thus for 1:1 CID reproducibility one should
run without any prefilters, or ensure the same prefilters are applied.

## Test fixtures

TODO

List relevant CIDs. Describe how implementations can use them to determine
specification compliance. This section can be skipped if IPIP does not deal
with the way IPFS handles content-addressed data, or the modified specification
file already includes this information.

### Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
Loading