- docs/README.md: restructure to surface 20+ previously undiscoverable docs - docs/README.md: fix broken github-issue-guide.md link (file was removed) - docs/add-code-flow.md: rewrite with current code flow and mermaid diagrams - docs/customizing.md, docs/gateway.md: use specs.ipfs.tech URLs - README.md: fix orphan #nix anchor, use go.dev links, link to contributors graph - remove stale docs/AUTHORS and docs/generate-authors.sh (last updated 2016)
7.5 KiB
How ipfs add Works
This document explains what happens when you run ipfs add to import files into IPFS. Understanding this flow helps when debugging, optimizing imports, or building applications on top of IPFS.
- The Big Picture
- Try It Yourself
- Step by Step
- Options
- UnixFS Format
- Code Architecture
- Further Reading
The Big Picture
When you add a file to IPFS, three main things happen:
- Chunking - The file is split into smaller pieces
- DAG Building - Those pieces are organized into a tree structure (a Merkle DAG)
- Pinning - The root of the tree is pinned so it persists in your local node
The result is a Content Identifier (CID) - a hash that uniquely identifies your content and can be used to retrieve it from anywhere in the IPFS network.
flowchart LR
A["Your File<br/>(bytes)"] --> B["Chunker<br/>(split data)"]
B --> C["DAG Builder<br/>(tree)"]
C --> D["CID<br/>(hash)"]
Try It Yourself
# Add a simple file
echo "Hello World" > hello.txt
ipfs add hello.txt
# added QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u hello.txt
# See what's inside
ipfs cat QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u
# Hello World
# View the DAG structure
ipfs dag get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u
Step by Step
Step 1: Chunking
Big files are split into chunks because:
- Large files need to be broken down for efficient transfer
- Identical chunks across files are stored only once (deduplication)
- You can fetch parts of a file without downloading the whole thing
Chunking strategies (set with --chunker):
| Strategy | Description | Best For |
|---|---|---|
size-N |
Fixed size chunks | General use |
rabin |
Content-defined chunks using rolling hash | Deduplication across similar files |
buzhash |
Alternative content-defined chunking | Similar to rabin |
See ipfs add --help for current defaults, or Import for making them permanent.
Content-defined chunking (rabin/buzhash) finds natural boundaries in the data. This means if you edit the middle of a file, only the changed chunks need to be re-stored - the rest can be deduplicated.
Step 2: Building the DAG
Each chunk becomes a leaf node in a tree. If a file has many chunks, intermediate nodes group them together. This creates a Merkle DAG (Directed Acyclic Graph) where:
- Each node is identified by a hash of its contents
- Parent nodes contain links (hashes) to their children
- The root node's hash becomes the file's CID
Layout strategies:
Balanced layout (default):
graph TD
Root --> Node1[Node]
Root --> Node2[Node]
Node1 --> Leaf1[Leaf]
Node1 --> Leaf2[Leaf]
Node2 --> Leaf3[Leaf]
All leaves at similar depth. Good for random access - you can jump to any part of the file efficiently.
Trickle layout (--trickle):
graph TD
Root --> Leaf1[Leaf]
Root --> Node1[Node]
Root --> Node2[Node]
Node1 --> Leaf2[Leaf]
Node2 --> Leaf3[Leaf]
Leaves added progressively. Good for streaming - you can start reading before the whole file is added.
Step 3: Storing Blocks
As the DAG is built, each node is stored in the blockstore:
- Normal mode: Data is copied into IPFS's internal storage (
~/.ipfs/blocks/) - Filestore mode (
--nocopy): Only references to the original file are stored (saves disk space but the original file must remain in place)
Step 4: Pinning
By default, added content is pinned (ipfs add --pin=true). This tells your IPFS node to keep this data - without pinning, content may eventually be removed to free up space.
Alternative: Organizing with MFS
Instead of pinning, you can use the Mutable File System (MFS) to organize content using familiar paths like /photos/vacation.jpg instead of raw CIDs:
# Add directly to MFS path
ipfs add --to-files=/backups/ myfile.txt
# Or copy an existing CID into MFS
ipfs files cp /ipfs/QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u /docs/hello.txt
Content in MFS is implicitly pinned and stays organized across node restarts.
Options
Run ipfs add --help to see all available options for controlling chunking, DAG layout, CID format, pinning behavior, and more.
UnixFS Format
IPFS uses UnixFS to represent files and directories. UnixFS is an abstraction layer that:
- Gives names to raw data blobs (so you can have
/foo/bar.txtinstead of just hashes) - Represents directories as lists of named links to other nodes
- Organizes large files as trees of smaller chunks
- Makes these structures cryptographically verifiable - any tampering is detectable because it would change the hashes
With --raw-leaves, leaf nodes store raw data without the UnixFS wrapper. This is more efficient and is the default when using CIDv1.
Code Architecture
The add flow spans several layers:
flowchart TD
subgraph CLI ["CLI Layer (kubo)"]
A["core/commands/add.go<br/>parses flags, shows progress"]
end
subgraph API ["CoreAPI Layer (kubo)"]
B["core/coreapi/unixfs.go<br/>UnixfsAPI.Add() entry point"]
end
subgraph Adder ["Adder (kubo)"]
C["core/coreunix/add.go<br/>orchestrates chunking, DAG building, MFS, pinning"]
end
subgraph Boxo ["boxo libraries"]
D["chunker/ - splits data into chunks"]
E["ipld/unixfs/ - DAG layout and UnixFS format"]
F["mfs/ - mutable filesystem abstraction"]
G["pinning/ - pin management"]
H["blockstore/ - block storage"]
end
A --> B --> C --> Boxo
Key Files
| Component | Location |
|---|---|
| CLI command | core/commands/add.go |
| API implementation | core/coreapi/unixfs.go |
| Adder logic | core/coreunix/add.go |
| Chunking | boxo/chunker |
| DAG layouts | boxo/ipld/unixfs/importer |
| MFS | boxo/mfs |
| Pinning | boxo/pinning/pinner |
The Adder
The Adder type in core/coreunix/add.go is the workhorse. It:
- Creates an MFS root - temporary in-memory filesystem for building the DAG
- Processes files recursively - chunks each file and builds DAG nodes
- Commits to blockstore - persists all blocks
- Pins the result - keeps content from being removed
- Returns the root CID
Key methods:
AddAllAndPin()- main entry pointaddFileNode()- handles a single file or directoryadd()- chunks data and builds the DAG using boxo's layout builders