docs: add sweeping provide worker count recommendation (#11001)
Some checks are pending
CodeQL / codeql (push) Waiting to run
Docker Check / lint (push) Waiting to run
Docker Check / build (push) Waiting to run
Gateway Conformance / gateway-conformance (push) Waiting to run
Gateway Conformance / gateway-conformance-libp2p-experiment (push) Waiting to run
Go Build / go-build (push) Waiting to run
Go Check / go-check (push) Waiting to run
Go Lint / go-lint (push) Waiting to run
Go Test / go-test (push) Waiting to run
Interop / interop-prep (push) Waiting to run
Interop / helia-interop (push) Blocked by required conditions
Interop / ipfs-webui (push) Blocked by required conditions
Sharness / sharness-test (push) Waiting to run
Spell Check / spellcheck (push) Waiting to run

Add recommentation for worker count for the sweeping provide system for
users with millions of CIDs.
This commit is contained in:
Guillaume Michel 2025-10-01 16:01:27 +02:00 committed by GitHub
parent 42a4935abf
commit cf8194a8d1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -1633,12 +1633,14 @@ this limit in the configuration.
**Why operations fail instead of auto-flushing:** Automatic flushing once the limit
is reached was considered but rejected because it can lead to data corruption issues
that are difficult to debug. When the system decides to flush without user knowledge, it can:
- Create partial states that violate user expectations about atomicity
- Interfere with concurrent operations in unexpected ways
- Make debugging and recovery much harder when issues occur
By failing explicitly, users maintain control over when their data is persisted,
allowing them to:
- Batch related operations together before flushing
- Handle errors predictably at natural transaction boundaries
- Understand exactly when and why their data is written to disk
@ -1647,6 +1649,7 @@ If you expect automatic flushing behavior, simply use the default `--flush=true`
(or omit the flag entirely) instead of `--flush=false`.
**⚠️ WARNING:** Increasing this limit or disabling it (setting to 0) can lead to:
- **Out-of-memory errors (OOM)** - Each unflushed operation consumes memory
- **Data loss** - If the daemon crashes before flushing, all unflushed changes are lost
- **Degraded performance** - Large unflushed caches slow down MFS operations
@ -2033,6 +2036,19 @@ connections this setting can generate.
> At the same time, mind that raising this value too high may lead to increased load.
> Proceed with caution, ensure proper hardware and networking are in place.
> [!TIP]
> **When `SweepEnabled` is true:** Users providing millions of CIDs or more
> should increase the worker count accordingly. Underprovisioning can lead to
> slow provides (burst workers) and inability to keep up with content
> reproviding (periodic workers). For nodes with sufficient resources (CPU,
> bandwidth, number of connections), dedicating `1024` for [periodic
> workers](#providedhtdedicatedperiodicworkers) and `512` for [burst
> workers](#providedhtdedicatedburstworkers), and `2048` [max
> workers](#providedhtmaxworkers) should be adequate even for the largest
> users. The system will only use workers as needed - unused resources won't be
> consumed. Ensure you adjust the swarm [connection manager](#swarmconnmgr) and
> [resource manager](#swarmresourcemgr) configuration accordingly.
Default: `16`
Type: `optionalInteger` (non-negative; `0` means unlimited number of workers)
@ -2098,6 +2114,11 @@ number of workers will be dedicated to the periodic region reprovide only. The s
Any remaining workers (MaxWorkers - DedicatedPeriodicWorkers - DedicatedBurstWorkers)
form a shared pool that can be used for either type of work as needed.
> [!NOTE]
> If the provider system isn't able to keep up with reproviding all your
> content within the [Provide.DHT.Interval](#providedhtinterval), consider
> increasing this value.
Default: `2`
Type: `optionalInteger` (`0` means there are no dedicated workers, but the
@ -2121,6 +2142,10 @@ number of workers will be dedicated to burst provides only. In addition to
these, if there are available workers in the pool, they can also be used for
burst provides.
> [!NOTE]
> If CIDs aren't provided quickly enough to your taste, and you can afford more
> CPU and bandwidth, consider increasing this value.
Default: `1`
Type: `optionalInteger` (`0` means there are no dedicated workers, but the