kubo/docs/specifications/repository.md
Henrique Dias 48865a9092
docs: move kubo-specific docs (#10226)
* docs: move kubo-specific docs
* chore: note crypt cmd does not exist

Context: https://github.com/ipfs/specs/pull/455

---------

Co-authored-by: Marcin Rataj <lidel@lidel.org>
2023-11-20 23:22:45 +01:00

132 lines
6.4 KiB
Markdown

# ![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) IPFS Repo Spec
**Author(s)**:
- [Juan Benet](github.com/jbenet)
**Abstract**
This spec defines an IPFS Repo, its contents, and its interface. It does not specify how the repo data is actually stored, as that is done via swappable implementations.
# Table of Contents
- [Definition](#definition)
- [Repo Contents](#repo-contents)
- [version](#version)
- [datastore](#datastore)
- [keystore](#keystore)
- [config (state)](#config-state)
- [locks](#locks)
- [datastore\_spec](#datastore_spec)
- [hooks (TODO)](#hooks-todo)
- [Notes](#notes)
## Definition
A `repo` is the storage repository of an IPFS node. It is the subsystem that
actually stores the data IPFS nodes use. All IPFS objects are stored
in a repo (similar to git).
There are many possible repo implementations, depending on the storage media
used. Most commonly, IPFS nodes use an [fs-repo](repository_fs.md).
Repo Implementations:
- [fs-repo](repository_fs.md) - stored in the os filesystem
- mem-repo - stored in process memory
- s3-repo - stored in amazon s3
## Repo Contents
The Repo stores a collection of [IPLD](https://github.com/ipld/specs#readme) objects that represent:
- **config** - node configuration and settings
- **datastore** - content stored locally, and indexing data
- **keystore** - cryptographic keys, including node's identity
- **hooks** - scripts to run at predefined times (not yet implemented)
Note that the IPLD objects a repo stores are divided into:
- **state** (system, control plane) used for the node's internal state
- **content** (userland, data plane) which represent the user's cached and pinned data.
Additionally, the repo state must determine the following. These need not be IPLD objects, though it is of course encouraged:
- **version** - the repo version, required for safe migrations
- **locks** - process semaphores for correct concurrent access
- **datastore_spec** - array of mounting points and their properties
Finally, the repo also stores the blocks with blobs containing binary data.
![](./ipfs-repo-contents.png)
### version
Repo implementations may change over time, thus they MUST include a `version` recognizable across versions. Meaning that a tool MUST be able to read the `version` of a given repo type.
For example, the `fs-repo` simply includes a `version` file with the version number. This way, the repo contents can evolve over time but the version remains readable the same way across versions.
### datastore
IPFS nodes store some IPLD objects locally. These are either (a) **state objects** required for local operation -- such as the `config` and `keys` -- or (b) **content objects** used to represent data locally available. **Content objects** are either _pinned_ (stored until they are unpinned) or _cached_ (stored until the next repo garbage collection).
The name "datastore" comes from [go-datastore](https://github.com/jbenet/go-datastore), a library for swappable key-value stores. Like its name-sake, some repo implementations feature swappable datastores, for example:
- an fs-repo with a leveldb datastore
- an fs-repo with a boltdb datastore
- an fs-repo with a union fs and leveldb datastore
- an fs-repo with an s3 datastore
- an s3-repo with a cached fs and s3 datastore
This makes it easy to change properties or performance characteristics of a repo without an entirely new implementation.
### keystore
A Repo typically holds the keys a node has access to, for signing and for encryption.
Details on operation and storage of the keystore can be found in [`repository_fs.md`](repository_fs.md) and [`keystore.md`](keystore.md).
### config (state)
The node's `config` (configuration) is a tree of variables, used to configure various aspects of operation. For example:
- the set of bootstrap peers IPFS uses to connect to the network
- the Swarm, API, and Gateway network listen addresses
- the Datastore configuration regarding the construction and operation of the on-disk storage system.
There is a set of properties, which are mandatory for the repo usage. Those are `Addresses`, `Discovery`, `Bootstrap`, `Identity`, `Datastore` and `Keychain`.
It is recommended that `config` files avoid identifying information, so that they may be re-shared across multiple nodes.
**CHANGES**: today, implementations like js-ipfs and go-ipfs store the peer-id and private key directly in the config. These will be removed and moved out.
### locks
IPFS implementations may use multiple processes, or may disallow multiple processes from using the same repo simultaneously. Others may disallow using the same repo but may allow sharing _datastores_ simultaneously. This synchronization is accomplished via _locks_.
All repos contain the following standard locks:
- `repo.lock` - prevents concurrent access to the repo. Must be held to _read_ or _write_.
### datastore_spec
This file is created according to the Datastore configuration specified in the `config` file. It contains an array with all the mounting points that the repo is using, as well as its properties. This way, the `datastore_spec` file must have the same mounting points as defined in the Datastore configuration.
It is important pointing out that the `Datastore` in config must have a `Spec` property, which defines the structure of the ipfs datastore. It is a composable structure, where each datastore is represented by a json object.
### hooks (TODO)
Like git, IPFS nodes will allow `hooks`, a set of user configurable scripts to run at predefined moments in IPFS operations. This makes it easy to customize the behavior of IPFS nodes without changing the implementations themselves.
## Notes
#### A Repo uniquely identifies an IPFS Node
A repository uniquely identifies a node. Running two different IPFS programs with identical repositories -- and thus identical identities -- WILL cause problems.
Datastores MAY be shared -- with proper synchronization -- though note that sharing datastore access MAY erode privacy.
#### Repo implementation changes MUST include migrations
**DO NOT BREAK USERS' DATA.** This is critical. Thus, any changes to a repo's implementation **MUST** be accompanied by a **SAFE** migration tool.
See https://github.com/jbenet/go-ipfs/issues/537 and https://github.com/jbenet/random-ideas/issues/33
#### Repo Versioning
A repo version is a single incrementing integer. All versions are considered non-compatible. Repos of different versions MUST be run through the appropriate migration tools before use.