mirror of
https://github.com/ipfs/kubo.git
synced 2026-02-21 10:27:46 +08:00
* docs: move kubo-specific docs * chore: note crypt cmd does not exist Context: https://github.com/ipfs/specs/pull/455 --------- Co-authored-by: Marcin Rataj <lidel@lidel.org>
132 lines
6.4 KiB
Markdown
132 lines
6.4 KiB
Markdown
#  IPFS Repo Spec
|
|
|
|
**Author(s)**:
|
|
- [Juan Benet](github.com/jbenet)
|
|
|
|
**Abstract**
|
|
|
|
This spec defines an IPFS Repo, its contents, and its interface. It does not specify how the repo data is actually stored, as that is done via swappable implementations.
|
|
|
|
# Table of Contents
|
|
|
|
- [Definition](#definition)
|
|
- [Repo Contents](#repo-contents)
|
|
- [version](#version)
|
|
- [datastore](#datastore)
|
|
- [keystore](#keystore)
|
|
- [config (state)](#config-state)
|
|
- [locks](#locks)
|
|
- [datastore\_spec](#datastore_spec)
|
|
- [hooks (TODO)](#hooks-todo)
|
|
- [Notes](#notes)
|
|
|
|
## Definition
|
|
|
|
A `repo` is the storage repository of an IPFS node. It is the subsystem that
|
|
actually stores the data IPFS nodes use. All IPFS objects are stored
|
|
in a repo (similar to git).
|
|
|
|
There are many possible repo implementations, depending on the storage media
|
|
used. Most commonly, IPFS nodes use an [fs-repo](repository_fs.md).
|
|
|
|
Repo Implementations:
|
|
- [fs-repo](repository_fs.md) - stored in the os filesystem
|
|
- mem-repo - stored in process memory
|
|
- s3-repo - stored in amazon s3
|
|
|
|
## Repo Contents
|
|
|
|
The Repo stores a collection of [IPLD](https://github.com/ipld/specs#readme) objects that represent:
|
|
|
|
- **config** - node configuration and settings
|
|
- **datastore** - content stored locally, and indexing data
|
|
- **keystore** - cryptographic keys, including node's identity
|
|
- **hooks** - scripts to run at predefined times (not yet implemented)
|
|
|
|
Note that the IPLD objects a repo stores are divided into:
|
|
- **state** (system, control plane) used for the node's internal state
|
|
- **content** (userland, data plane) which represent the user's cached and pinned data.
|
|
|
|
Additionally, the repo state must determine the following. These need not be IPLD objects, though it is of course encouraged:
|
|
|
|
- **version** - the repo version, required for safe migrations
|
|
- **locks** - process semaphores for correct concurrent access
|
|
- **datastore_spec** - array of mounting points and their properties
|
|
|
|
Finally, the repo also stores the blocks with blobs containing binary data.
|
|
|
|

|
|
|
|
### version
|
|
|
|
Repo implementations may change over time, thus they MUST include a `version` recognizable across versions. Meaning that a tool MUST be able to read the `version` of a given repo type.
|
|
|
|
For example, the `fs-repo` simply includes a `version` file with the version number. This way, the repo contents can evolve over time but the version remains readable the same way across versions.
|
|
|
|
### datastore
|
|
|
|
IPFS nodes store some IPLD objects locally. These are either (a) **state objects** required for local operation -- such as the `config` and `keys` -- or (b) **content objects** used to represent data locally available. **Content objects** are either _pinned_ (stored until they are unpinned) or _cached_ (stored until the next repo garbage collection).
|
|
|
|
The name "datastore" comes from [go-datastore](https://github.com/jbenet/go-datastore), a library for swappable key-value stores. Like its name-sake, some repo implementations feature swappable datastores, for example:
|
|
- an fs-repo with a leveldb datastore
|
|
- an fs-repo with a boltdb datastore
|
|
- an fs-repo with a union fs and leveldb datastore
|
|
- an fs-repo with an s3 datastore
|
|
- an s3-repo with a cached fs and s3 datastore
|
|
|
|
This makes it easy to change properties or performance characteristics of a repo without an entirely new implementation.
|
|
|
|
### keystore
|
|
|
|
A Repo typically holds the keys a node has access to, for signing and for encryption.
|
|
|
|
Details on operation and storage of the keystore can be found in [`repository_fs.md`](repository_fs.md) and [`keystore.md`](keystore.md).
|
|
|
|
### config (state)
|
|
|
|
The node's `config` (configuration) is a tree of variables, used to configure various aspects of operation. For example:
|
|
- the set of bootstrap peers IPFS uses to connect to the network
|
|
- the Swarm, API, and Gateway network listen addresses
|
|
- the Datastore configuration regarding the construction and operation of the on-disk storage system.
|
|
|
|
There is a set of properties, which are mandatory for the repo usage. Those are `Addresses`, `Discovery`, `Bootstrap`, `Identity`, `Datastore` and `Keychain`.
|
|
|
|
It is recommended that `config` files avoid identifying information, so that they may be re-shared across multiple nodes.
|
|
|
|
**CHANGES**: today, implementations like js-ipfs and go-ipfs store the peer-id and private key directly in the config. These will be removed and moved out.
|
|
|
|
### locks
|
|
|
|
IPFS implementations may use multiple processes, or may disallow multiple processes from using the same repo simultaneously. Others may disallow using the same repo but may allow sharing _datastores_ simultaneously. This synchronization is accomplished via _locks_.
|
|
|
|
All repos contain the following standard locks:
|
|
- `repo.lock` - prevents concurrent access to the repo. Must be held to _read_ or _write_.
|
|
|
|
### datastore_spec
|
|
|
|
This file is created according to the Datastore configuration specified in the `config` file. It contains an array with all the mounting points that the repo is using, as well as its properties. This way, the `datastore_spec` file must have the same mounting points as defined in the Datastore configuration.
|
|
|
|
It is important pointing out that the `Datastore` in config must have a `Spec` property, which defines the structure of the ipfs datastore. It is a composable structure, where each datastore is represented by a json object.
|
|
|
|
### hooks (TODO)
|
|
|
|
Like git, IPFS nodes will allow `hooks`, a set of user configurable scripts to run at predefined moments in IPFS operations. This makes it easy to customize the behavior of IPFS nodes without changing the implementations themselves.
|
|
|
|
## Notes
|
|
|
|
#### A Repo uniquely identifies an IPFS Node
|
|
|
|
A repository uniquely identifies a node. Running two different IPFS programs with identical repositories -- and thus identical identities -- WILL cause problems.
|
|
|
|
Datastores MAY be shared -- with proper synchronization -- though note that sharing datastore access MAY erode privacy.
|
|
|
|
#### Repo implementation changes MUST include migrations
|
|
|
|
**DO NOT BREAK USERS' DATA.** This is critical. Thus, any changes to a repo's implementation **MUST** be accompanied by a **SAFE** migration tool.
|
|
|
|
See https://github.com/jbenet/go-ipfs/issues/537 and https://github.com/jbenet/random-ideas/issues/33
|
|
|
|
#### Repo Versioning
|
|
|
|
A repo version is a single incrementing integer. All versions are considered non-compatible. Repos of different versions MUST be run through the appropriate migration tools before use.
|