kubo/docs/provide-stats.md
Guillaume Michel 16479ec692
Some checks are pending
CodeQL / codeql (push) Waiting to run
Docker Check / lint (push) Waiting to run
Docker Check / build (push) Waiting to run
Gateway Conformance / gateway-conformance (push) Waiting to run
Gateway Conformance / gateway-conformance-libp2p-experiment (push) Waiting to run
Go Build / go-build (push) Waiting to run
Go Check / go-check (push) Waiting to run
Go Lint / go-lint (push) Waiting to run
Go Test / go-test (push) Waiting to run
Interop / interop-prep (push) Waiting to run
Interop / helia-interop (push) Blocked by required conditions
Interop / ipfs-webui (push) Blocked by required conditions
Sharness / sharness-test (push) Waiting to run
Spell Check / spellcheck (push) Waiting to run
feat(provide): detailed ipfs provide stat (#11019)
* feat: provide stats

* added N/A

* format

* workers stats alignment

* ipfs provide stat --all --compact

* consolidating compact stat

* update column alignment

* flags combinations errors

* command description

* change schedule AvgPrefixLen to float

* changelog

* alignments

* provide stat description draft

* rephrased provide-stats.md

* linking provide-stats.md from command description

* documentation test

* fix: refactor provide stat command type handling

- add extractSweepingProvider() helper to reduce nested type switching
- extract lowWorkerThreshold constant for worker availability check
- fix --lan error handling to work with buffered providers

* docs: add clarifying comments

* fix(commands): improve provide stat compact mode

- prevent panic when both columns are empty
- fix column alignment with UTF-8 characters
- only track col0MaxWidth for first column (as intended)

* test: add tests for ipfs provide stat command

- test basic functionality, flags, JSON output
- test legacy provider behavior
- test integration with content scheduling
- test disabled provider configurations
- add parseSweepStats helper with t.Helper()

* docs: improve provide command help text

- update tagline to "Control and monitor content providing"
- simplify help descriptions
- make error messages more consistent
- update tests to match new error messages

* metrics rename

```
Next reprovide at:
Next prefix:
```
updated to:
```
Next region prefix:
Next region reprovide:
```

* docs: improve Provide system documentation clarity

Enhance documentation for the Provide system to better explain how provider
records work and the differences between sweep and legacy modes.

Changes to docs/config.md:
- Provide section: add clear explanation of provider records and their role
- Provide.DHT: add provider record lifecycle and two provider systems overview
- Provide.DHT.Interval: explain relationship to expiration, contrast sweep vs legacy behavior
- Provide.DHT.SweepEnabled: rewrite to explain legacy problem, sweep solution, and efficiency gains
- Monitoring section: prioritize command-line tools (ipfs provide stat) before Prometheus

Changes to core/commands/provide.go:
- ipfs provide stat help: add explanation of provider records, TTL expiration, and how sweep batching works

Changes to docs/changelogs/v0.39.md:
- Add context about why stats matter for monitoring provider health
- Emphasize real-time monitoring workflow with watch command
- Explain what users can observe (rates, queues, worker availability)

* depend on latest kad-dht master

* docs: nits

---------

Co-authored-by: Marcin Rataj <lidel@lidel.org>
2025-10-23 20:29:36 +02:00

5.2 KiB

Provide Stats

The ipfs provide stat command gives you statistics about your local provide system. This file provides a detailed explanation of the metrics reported by this command.

Connectivity

Status

Current connectivity status (online, disconnected, or offline) and when it last changed (see provide connectivity status).

Queues

Provide queue

Number of CIDs waiting for initial provide, and the number of keyspace regions they're grouped into.

Reprovide queue

Number of regions with overdue reprovides. These regions missed their scheduled reprovide time and will be processed as soon as possible. If decreasing, the node is recovering from downtime. If increasing, either the node is offline or the provide system needs more workers (see Provide.DHT.MaxWorkers and Provide.DHT.DedicatedPeriodicWorkers).

Schedule

CIDs scheduled

Total CIDs scheduled for reprovide.

Regions scheduled

Number of keyspace regions scheduled for reprovide. Each CID is mapped to a specific region, and all CIDs within the same region are reprovided together as a batch for efficient processing.

Avg prefix length

Average length of binary prefixes identifying the scheduled regions. Each keyspace region is identified by a binary prefix, and this shows the average prefix length across all regions in the schedule. Longer prefixes indicate more DHT servers in the swarm.

Next region prefix

Keyspace prefix of the next region to be reprovided.

Next region reprovide

When the next region is scheduled to be reprovided.

Timings

Uptime

How long the provide system has been running since Kubo started, along with the start timestamp.

Current time offset

Elapsed time in the current reprovide cycle, showing cycle progress.

Cycle started

When the current reprovide cycle began.

Reprovide interval

How often each CID is reprovided (the complete cycle duration).

Network

Avg record holders

Average number of provider records successfully sent for each CID to distinct DHT servers. In practice, this is often lower than the replication factor due to unreachable peers or timeouts. Matching the replication factor would indicate all DHT servers are reachable.

Note: some holders may have gone offline since receiving the record.

Peers swept

Number of DHT servers to which we tried to send provider records in the last reprovide cycle (sweep). Excludes peers contacted during initial provides or DHT lookups.

Full keyspace coverage

Whether provider records were sent to all DHT servers in the swarm during the last reprovide cycle. If true, peers swept approximates the total DHT swarm size over the last reprovide interval.

Reachable peers

Number and percentage of peers to which we successfully sent all provider records assigned to them during the last reprovide cycle.

Avg region size

Average number of DHT servers per keyspace region.

Replication factor

Target number of DHT servers to receive each provider record.

Operations

Ongoing provides

Number of CIDs and regions currently being provided for the first time. More CIDs than regions indicates efficient batching. Each region provide uses a burst worker.

Ongoing reprovides

Number of CIDs and regions currently being reprovided. Each region reprovide uses a periodic worker.

Total CIDs provided

Total number of provide operations since node startup (includes both provides and reprovides).

Total records provided

Total provider records successfully sent to DHT servers since startup (includes reprovides).

Total provide errors

Number of failed region provide/reprovide operations since startup. Failed regions are automatically retried unless the node is offline.

CIDs provided/min

Average rate of initial provides per minute during the last reprovide cycle (excludes reprovides).

CIDs reprovided/min

Average rate of reprovides per minute during the last reprovide cycle (excludes initial provides).

Region reprovide duration

Average time to reprovide all CIDs in a region during the last cycle.

Avg CIDs/reprovide

Average number of CIDs per region during the last reprovide cycle.

Regions reprovided (last cycle)

Number of regions reprovided in the last cycle.

Workers

Active workers

Number of workers currently processing provide or reprovide operations.

Free workers

Number of idle workers not reserved for periodic or burst tasks.

Workers stats

Breakdown of worker status by type (periodic for scheduled reprovides, burst for initial provides). For each: active (currently processing), dedicated (reserved for this type), available (idle dedicated + free workers), and queued (0 or 1, since we only acquire when needed). See provide queue and reprovide queue for regions waiting to be processed.

Max connections/worker

Maximum concurrent DHT server connections per worker when sending provider records for a region.