kubo/docs/add-code-flow.md
Marcin Rataj f7aef45bc9 docs: cleanup broken links and outdated content
- docs/README.md: restructure to surface 20+ previously undiscoverable docs
- docs/README.md: fix broken github-issue-guide.md link (file was removed)
- docs/add-code-flow.md: rewrite with current code flow and mermaid diagrams
- docs/customizing.md, docs/gateway.md: use specs.ipfs.tech URLs
- README.md: fix orphan #nix anchor, use go.dev links, link to contributors graph
- remove stale docs/AUTHORS and docs/generate-authors.sh (last updated 2016)
2025-12-10 17:48:26 +01:00

7.5 KiB

How ipfs add Works

This document explains what happens when you run ipfs add to import files into IPFS. Understanding this flow helps when debugging, optimizing imports, or building applications on top of IPFS.

The Big Picture

When you add a file to IPFS, three main things happen:

  1. Chunking - The file is split into smaller pieces
  2. DAG Building - Those pieces are organized into a tree structure (a Merkle DAG)
  3. Pinning - The root of the tree is pinned so it persists in your local node

The result is a Content Identifier (CID) - a hash that uniquely identifies your content and can be used to retrieve it from anywhere in the IPFS network.

flowchart LR
    A["Your File<br/>(bytes)"] --> B["Chunker<br/>(split data)"]
    B --> C["DAG Builder<br/>(tree)"]
    C --> D["CID<br/>(hash)"]

Try It Yourself

# Add a simple file
echo "Hello World" > hello.txt
ipfs add hello.txt
# added QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u hello.txt

# See what's inside
ipfs cat QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u
# Hello World

# View the DAG structure
ipfs dag get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u

Step by Step

Step 1: Chunking

Big files are split into chunks because:

  • Large files need to be broken down for efficient transfer
  • Identical chunks across files are stored only once (deduplication)
  • You can fetch parts of a file without downloading the whole thing

Chunking strategies (set with --chunker):

Strategy Description Best For
size-N Fixed size chunks General use
rabin Content-defined chunks using rolling hash Deduplication across similar files
buzhash Alternative content-defined chunking Similar to rabin

See ipfs add --help for current defaults, or Import for making them permanent.

Content-defined chunking (rabin/buzhash) finds natural boundaries in the data. This means if you edit the middle of a file, only the changed chunks need to be re-stored - the rest can be deduplicated.

Step 2: Building the DAG

Each chunk becomes a leaf node in a tree. If a file has many chunks, intermediate nodes group them together. This creates a Merkle DAG (Directed Acyclic Graph) where:

  • Each node is identified by a hash of its contents
  • Parent nodes contain links (hashes) to their children
  • The root node's hash becomes the file's CID

Layout strategies:

Balanced layout (default):

graph TD
    Root --> Node1[Node]
    Root --> Node2[Node]
    Node1 --> Leaf1[Leaf]
    Node1 --> Leaf2[Leaf]
    Node2 --> Leaf3[Leaf]

All leaves at similar depth. Good for random access - you can jump to any part of the file efficiently.

Trickle layout (--trickle):

graph TD
    Root --> Leaf1[Leaf]
    Root --> Node1[Node]
    Root --> Node2[Node]
    Node1 --> Leaf2[Leaf]
    Node2 --> Leaf3[Leaf]

Leaves added progressively. Good for streaming - you can start reading before the whole file is added.

Step 3: Storing Blocks

As the DAG is built, each node is stored in the blockstore:

  • Normal mode: Data is copied into IPFS's internal storage (~/.ipfs/blocks/)
  • Filestore mode (--nocopy): Only references to the original file are stored (saves disk space but the original file must remain in place)

Step 4: Pinning

By default, added content is pinned (ipfs add --pin=true). This tells your IPFS node to keep this data - without pinning, content may eventually be removed to free up space.

Alternative: Organizing with MFS

Instead of pinning, you can use the Mutable File System (MFS) to organize content using familiar paths like /photos/vacation.jpg instead of raw CIDs:

# Add directly to MFS path
ipfs add --to-files=/backups/ myfile.txt

# Or copy an existing CID into MFS
ipfs files cp /ipfs/QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u /docs/hello.txt

Content in MFS is implicitly pinned and stays organized across node restarts.

Options

Run ipfs add --help to see all available options for controlling chunking, DAG layout, CID format, pinning behavior, and more.

UnixFS Format

IPFS uses UnixFS to represent files and directories. UnixFS is an abstraction layer that:

  • Gives names to raw data blobs (so you can have /foo/bar.txt instead of just hashes)
  • Represents directories as lists of named links to other nodes
  • Organizes large files as trees of smaller chunks
  • Makes these structures cryptographically verifiable - any tampering is detectable because it would change the hashes

With --raw-leaves, leaf nodes store raw data without the UnixFS wrapper. This is more efficient and is the default when using CIDv1.

Code Architecture

The add flow spans several layers:

flowchart TD
    subgraph CLI ["CLI Layer (kubo)"]
        A["core/commands/add.go<br/>parses flags, shows progress"]
    end
    subgraph API ["CoreAPI Layer (kubo)"]
        B["core/coreapi/unixfs.go<br/>UnixfsAPI.Add() entry point"]
    end
    subgraph Adder ["Adder (kubo)"]
        C["core/coreunix/add.go<br/>orchestrates chunking, DAG building, MFS, pinning"]
    end
    subgraph Boxo ["boxo libraries"]
        D["chunker/ - splits data into chunks"]
        E["ipld/unixfs/ - DAG layout and UnixFS format"]
        F["mfs/ - mutable filesystem abstraction"]
        G["pinning/ - pin management"]
        H["blockstore/ - block storage"]
    end
    A --> B --> C --> Boxo

Key Files

Component Location
CLI command core/commands/add.go
API implementation core/coreapi/unixfs.go
Adder logic core/coreunix/add.go
Chunking boxo/chunker
DAG layouts boxo/ipld/unixfs/importer
MFS boxo/mfs
Pinning boxo/pinning/pinner

The Adder

The Adder type in core/coreunix/add.go is the workhorse. It:

  1. Creates an MFS root - temporary in-memory filesystem for building the DAG
  2. Processes files recursively - chunks each file and builds DAG nodes
  3. Commits to blockstore - persists all blocks
  4. Pins the result - keeps content from being removed
  5. Returns the root CID

Key methods:

  • AddAllAndPin() - main entry point
  • addFileNode() - handles a single file or directory
  • add() - chunks data and builds the DAG using boxo's layout builders

Further Reading