# How `ipfs add` Works This document explains what happens when you run `ipfs add` to import files into IPFS. Understanding this flow helps when debugging, optimizing imports, or building applications on top of IPFS. - [The Big Picture](#the-big-picture) - [Try It Yourself](#try-it-yourself) - [Step by Step](#step-by-step) - [Step 1: Chunking](#step-1-chunking) - [Step 2: Building the DAG](#step-2-building-the-dag) - [Step 3: Storing Blocks](#step-3-storing-blocks) - [Step 4: Pinning](#step-4-pinning) - [Alternative: Organizing with MFS](#alternative-organizing-with-mfs) - [Options](#options) - [UnixFS Format](#unixfs-format) - [Code Architecture](#code-architecture) - [Key Files](#key-files) - [The Adder](#the-adder) - [Further Reading](#further-reading) ## The Big Picture When you add a file to IPFS, three main things happen: 1. **Chunking** - The file is split into smaller pieces 2. **DAG Building** - Those pieces are organized into a tree structure (a [Merkle DAG](https://docs.ipfs.tech/concepts/merkle-dag/)) 3. **Pinning** - The root of the tree is pinned so it persists in your local node The result is a Content Identifier (CID) - a hash that uniquely identifies your content and can be used to retrieve it from anywhere in the IPFS network. ```mermaid flowchart LR A["Your File
(bytes)"] --> B["Chunker
(split data)"] B --> C["DAG Builder
(tree)"] C --> D["CID
(hash)"] ``` ## Try It Yourself ```bash # Add a simple file echo "Hello World" > hello.txt ipfs add hello.txt # added QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u hello.txt # See what's inside ipfs cat QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u # Hello World # View the DAG structure ipfs dag get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u ``` ## Step by Step ### Step 1: Chunking Big files are split into chunks because: - Large files need to be broken down for efficient transfer - Identical chunks across files are stored only once (deduplication) - You can fetch parts of a file without downloading the whole thing **Chunking strategies** (set with `--chunker`): | Strategy | Description | Best For | |----------|-------------|----------| | `size-N` | Fixed size chunks | General use | | `rabin` | Content-defined chunks using rolling hash | Deduplication across similar files | | `buzhash` | Alternative content-defined chunking | Similar to rabin | See `ipfs add --help` for current defaults, or [Import](config.md#import) for making them permanent. Content-defined chunking (rabin/buzhash) finds natural boundaries in the data. This means if you edit the middle of a file, only the changed chunks need to be re-stored - the rest can be deduplicated. ### Step 2: Building the DAG Each chunk becomes a leaf node in a tree. If a file has many chunks, intermediate nodes group them together. This creates a Merkle DAG (Directed Acyclic Graph) where: - Each node is identified by a hash of its contents - Parent nodes contain links (hashes) to their children - The root node's hash becomes the file's CID **Layout strategies**: **Balanced layout** (default): ```mermaid graph TD Root --> Node1[Node] Root --> Node2[Node] Node1 --> Leaf1[Leaf] Node1 --> Leaf2[Leaf] Node2 --> Leaf3[Leaf] ``` All leaves at similar depth. Good for random access - you can jump to any part of the file efficiently. **Trickle layout** (`--trickle`): ```mermaid graph TD Root --> Leaf1[Leaf] Root --> Node1[Node] Root --> Node2[Node] Node1 --> Leaf2[Leaf] Node2 --> Leaf3[Leaf] ``` Leaves added progressively. Good for streaming - you can start reading before the whole file is added. ### Step 3: Storing Blocks As the DAG is built, each node is stored in the blockstore: - **Normal mode**: Data is copied into IPFS's internal storage (`~/.ipfs/blocks/`) - **Filestore mode** (`--nocopy`): Only references to the original file are stored (saves disk space but the original file must remain in place) ### Step 4: Pinning By default, added content is pinned (`ipfs add --pin=true`). This tells your IPFS node to keep this data - without pinning, content may eventually be removed to free up space. ### Alternative: Organizing with MFS Instead of pinning, you can use the [Mutable File System (MFS)](https://docs.ipfs.tech/concepts/file-systems/#mutable-file-system-mfs) to organize content using familiar paths like `/photos/vacation.jpg` instead of raw CIDs: ```bash # Add directly to MFS path ipfs add --to-files=/backups/ myfile.txt # Or copy an existing CID into MFS ipfs files cp /ipfs/QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u /docs/hello.txt ``` Content in MFS is implicitly pinned and stays organized across node restarts. ## Options Run `ipfs add --help` to see all available options for controlling chunking, DAG layout, CID format, pinning behavior, and more. ## UnixFS Format IPFS uses [UnixFS](https://specs.ipfs.tech/unixfs/) to represent files and directories. UnixFS is an abstraction layer that: - Gives names to raw data blobs (so you can have `/foo/bar.txt` instead of just hashes) - Represents directories as lists of named links to other nodes - Organizes large files as trees of smaller chunks - Makes these structures cryptographically verifiable - any tampering is detectable because it would change the hashes With `--raw-leaves`, leaf nodes store raw data without the UnixFS wrapper. This is more efficient and is the default when using CIDv1. ## Code Architecture The add flow spans several layers: ```mermaid flowchart TD subgraph CLI ["CLI Layer (kubo)"] A["core/commands/add.go
parses flags, shows progress"] end subgraph API ["CoreAPI Layer (kubo)"] B["core/coreapi/unixfs.go
UnixfsAPI.Add() entry point"] end subgraph Adder ["Adder (kubo)"] C["core/coreunix/add.go
orchestrates chunking, DAG building, MFS, pinning"] end subgraph Boxo ["boxo libraries"] D["chunker/ - splits data into chunks"] E["ipld/unixfs/ - DAG layout and UnixFS format"] F["mfs/ - mutable filesystem abstraction"] G["pinning/ - pin management"] H["blockstore/ - block storage"] end A --> B --> C --> Boxo ``` ### Key Files | Component | Location | |-----------|----------| | CLI command | `core/commands/add.go` | | API implementation | `core/coreapi/unixfs.go` | | Adder logic | `core/coreunix/add.go` | | Chunking | [boxo/chunker](https://github.com/ipfs/boxo/tree/main/chunker) | | DAG layouts | [boxo/ipld/unixfs/importer](https://github.com/ipfs/boxo/tree/main/ipld/unixfs/importer) | | MFS | [boxo/mfs](https://github.com/ipfs/boxo/tree/main/mfs) | | Pinning | [boxo/pinning/pinner](https://github.com/ipfs/boxo/tree/main/pinning/pinner) | ### The Adder The `Adder` type in `core/coreunix/add.go` is the workhorse. It: 1. **Creates an MFS root** - temporary in-memory filesystem for building the DAG 2. **Processes files recursively** - chunks each file and builds DAG nodes 3. **Commits to blockstore** - persists all blocks 4. **Pins the result** - keeps content from being removed 5. **Returns the root CID** Key methods: - `AddAllAndPin()` - main entry point - `addFileNode()` - handles a single file or directory - `add()` - chunks data and builds the DAG using boxo's layout builders ## Further Reading - [UnixFS specification](https://specs.ipfs.tech/unixfs/) - [IPLD and Merkle DAGs](https://docs.ipfs.tech/concepts/merkle-dag/) - [Pinning](https://docs.ipfs.tech/concepts/persistence/) - [MFS (Mutable File System)](https://docs.ipfs.tech/concepts/file-systems/#mutable-file-system-mfs)