docs: clarify provide stats metric types and calculations (#11041)

add "Understanding the Metrics" section explaining three types:
- per-worker rates (multiply by active workers for total throughput)
- per-region averages (do NOT multiply by worker count)
- system totals (cumulative across all workers)

enhance metric descriptions with:
- explicit calculation examples showing which worker counts to use
- warnings about when NOT to multiply by worker count
- cross-references to relevant sections

add "Capacity Planning" section with:
- step-by-step throughput capacity calculations
- diagnostic guidance for common scenarios
- worked examples for estimating required vs actual capacity

addresses confusion from PR #11034 comments about when to multiply
metrics by worker count and how to interpret per-worker rates
This commit is contained in:
Marcin Rataj 2025-11-12 03:24:43 +01:00 committed by GitHub
parent 149ca2fd3b
commit 0954d249c2
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -4,6 +4,34 @@ The `ipfs provide stat` command gives you statistics about your local provide
system. This file provides a detailed explanation of the metrics reported by
this command.
## Understanding the Metrics
The statistics are organized into three types of measurements:
### Per-worker rates
Metrics like "CIDs reprovided/min/worker" measure the throughput of a single
worker processing one region. To estimate total system throughput, multiply by
the number of active workers of that type (see [Workers stats](#workers-stats)).
Example: If "CIDs reprovided/min/worker" shows 100 and you have 10 active
periodic workers, your total reprovide throughput is approximately 1,000
CIDs/min.
### Per-region averages
Metrics like "Avg CIDs/reprovide" measure properties of the work units (keyspace
regions). These represent the average size or characteristics of a region, not a
rate. Do NOT multiply these by worker count.
Example: "Avg CIDs/reprovide: 250,000" means each region contains an average of
250,000 CIDs that get reprovided together as a batch.
### System totals
Metrics like "Total CIDs provided" are cumulative counts since node startup.
These aggregate all work across all workers over time.
## Connectivity
### Status
@ -148,19 +176,31 @@ regions are automatically retried unless the node is offline.
Average rate of initial provides per minute per worker during the last
reprovide cycle (excludes reprovides). Each worker handles one keyspace region
at a time, providing all CIDs in that region. This rate only counts active time
(timer doesn't run when no initial provides are being processed). The overall
provide rate can be higher when multiple workers are providing different
regions concurrently.
at a time, providing all CIDs in that region. This measures the throughput of a
single worker only.
To estimate total system provide throughput, multiply by the number of active
burst workers shown in [Workers stats](#workers-stats) (Burst > Active).
Note: This rate only counts active time when initial provides are being
processed. If workers are idle, actual throughput may be lower.
### CIDs reprovided/min/worker
Average rate of reprovides per minute per worker during the last reprovide
cycle (excludes initial provides). Each worker handles one keyspace region at a
time, reproviding all CIDs in that region. The overall reprovide rate can be
higher when multiple workers are reproviding different regions concurrently. To
estimate total reprovide rate, multiply by the number of [periodic
workers](./config.md#providedhtdedicatedperiodicworkers) in use.
time, reproviding all CIDs in that region. This measures the throughput of a
single worker only.
To estimate total system reprovide throughput, multiply by the number of active
periodic workers shown in [Workers stats](#workers-stats) (Periodic > Active).
Example: If this shows 100 CIDs/min and you have 10 active periodic workers,
your total reprovide throughput is approximately 1,000 CIDs/min.
Note: This rate only counts active time when regions are being reprovided. If
workers are idle due to network issues or queue exhaustion, actual throughput
may be lower.
### Region reprovide duration
@ -170,6 +210,13 @@ Average time to reprovide all CIDs in a region during the last cycle.
Average number of CIDs per region during the last reprovide cycle.
This measures the average size of a region (how many CIDs are batched together),
not a throughput rate. Do NOT multiply this by worker count.
Combined with [Region reprovide duration](#region-reprovide-duration), this
helps estimate per-worker throughput: dividing Avg CIDs/reprovide by Region
reprovide duration gives CIDs/min/worker.
### Regions reprovided (last cycle)
Number of regions reprovided in the last cycle.
@ -189,11 +236,16 @@ Number of idle workers not reserved for periodic or burst tasks.
Breakdown of worker status by type (periodic for scheduled reprovides, burst for
initial provides). For each type:
- **Active**: Currently processing operations
- **Active**: Currently processing operations (use this count when calculating total throughput from per-worker rates)
- **Dedicated**: Reserved for this type
- **Available**: Idle dedicated workers + [free workers](#free-workers)
- **Queued**: 0 or 1 (workers acquired only when needed)
The number of active workers determines your total system throughput. For
example, if you have 10 active periodic workers, multiply
[CIDs reprovided/min/worker](#cids-reprovidedminworker) by 10 to estimate total
reprovide throughput.
See [provide queue](#provide-queue) and [reprovide queue](#reprovide-queue) for
regions waiting to be processed.
@ -202,6 +254,31 @@ regions waiting to be processed.
Maximum concurrent DHT server connections per worker when sending provider
records for a region.
## Capacity Planning
### Estimating if your system can keep up with the reprovide schedule
To check if your provide system has sufficient capacity:
1. Calculate required throughput:
- Required CIDs/min = [CIDs scheduled](#cids-scheduled) / ([Reprovide interval](#reprovide-interval) in minutes)
- Example: 67M CIDs / (22 hours × 60 min) = 50,758 CIDs/min needed
2. Calculate actual throughput:
- Actual CIDs/min = [CIDs reprovided/min/worker](#cids-reprovidedminworker) × Active periodic workers
- Example: 100 CIDs/min/worker × 256 active workers = 25,600 CIDs/min
3. Compare:
- If actual < required: System is underprovisioned, increase [MaxWorkers](./config.md#providedhtmaxworkers) or [DedicatedPeriodicWorkers](./config.md#providedhtdedicatedperiodicworkers)
- If actual > required: System has excess capacity
- If [Reprovide queue](#reprovide-queue) is growing: System is falling behind
### Understanding worker utilization
- High active workers with growing reprovide queue: Need more workers or network connectivity is limiting throughput
- Low active workers with non-empty reprovide queue: Workers may be waiting for network or DHT operations
- Check [Reachable peers](#reachable-peers) to diagnose network connectivity issues
## See Also
- [Provide configuration reference](./config.md#provide)