mirror of
https://github.com/ipfs/kubo.git
synced 2026-02-21 10:27:46 +08:00
docs: clarify provide stats metric types and calculations (#11041)
add "Understanding the Metrics" section explaining three types: - per-worker rates (multiply by active workers for total throughput) - per-region averages (do NOT multiply by worker count) - system totals (cumulative across all workers) enhance metric descriptions with: - explicit calculation examples showing which worker counts to use - warnings about when NOT to multiply by worker count - cross-references to relevant sections add "Capacity Planning" section with: - step-by-step throughput capacity calculations - diagnostic guidance for common scenarios - worked examples for estimating required vs actual capacity addresses confusion from PR #11034 comments about when to multiply metrics by worker count and how to interpret per-worker rates
This commit is contained in:
parent
149ca2fd3b
commit
0954d249c2
@ -4,6 +4,34 @@ The `ipfs provide stat` command gives you statistics about your local provide
|
||||
system. This file provides a detailed explanation of the metrics reported by
|
||||
this command.
|
||||
|
||||
## Understanding the Metrics
|
||||
|
||||
The statistics are organized into three types of measurements:
|
||||
|
||||
### Per-worker rates
|
||||
|
||||
Metrics like "CIDs reprovided/min/worker" measure the throughput of a single
|
||||
worker processing one region. To estimate total system throughput, multiply by
|
||||
the number of active workers of that type (see [Workers stats](#workers-stats)).
|
||||
|
||||
Example: If "CIDs reprovided/min/worker" shows 100 and you have 10 active
|
||||
periodic workers, your total reprovide throughput is approximately 1,000
|
||||
CIDs/min.
|
||||
|
||||
### Per-region averages
|
||||
|
||||
Metrics like "Avg CIDs/reprovide" measure properties of the work units (keyspace
|
||||
regions). These represent the average size or characteristics of a region, not a
|
||||
rate. Do NOT multiply these by worker count.
|
||||
|
||||
Example: "Avg CIDs/reprovide: 250,000" means each region contains an average of
|
||||
250,000 CIDs that get reprovided together as a batch.
|
||||
|
||||
### System totals
|
||||
|
||||
Metrics like "Total CIDs provided" are cumulative counts since node startup.
|
||||
These aggregate all work across all workers over time.
|
||||
|
||||
## Connectivity
|
||||
|
||||
### Status
|
||||
@ -148,19 +176,31 @@ regions are automatically retried unless the node is offline.
|
||||
|
||||
Average rate of initial provides per minute per worker during the last
|
||||
reprovide cycle (excludes reprovides). Each worker handles one keyspace region
|
||||
at a time, providing all CIDs in that region. This rate only counts active time
|
||||
(timer doesn't run when no initial provides are being processed). The overall
|
||||
provide rate can be higher when multiple workers are providing different
|
||||
regions concurrently.
|
||||
at a time, providing all CIDs in that region. This measures the throughput of a
|
||||
single worker only.
|
||||
|
||||
To estimate total system provide throughput, multiply by the number of active
|
||||
burst workers shown in [Workers stats](#workers-stats) (Burst > Active).
|
||||
|
||||
Note: This rate only counts active time when initial provides are being
|
||||
processed. If workers are idle, actual throughput may be lower.
|
||||
|
||||
### CIDs reprovided/min/worker
|
||||
|
||||
Average rate of reprovides per minute per worker during the last reprovide
|
||||
cycle (excludes initial provides). Each worker handles one keyspace region at a
|
||||
time, reproviding all CIDs in that region. The overall reprovide rate can be
|
||||
higher when multiple workers are reproviding different regions concurrently. To
|
||||
estimate total reprovide rate, multiply by the number of [periodic
|
||||
workers](./config.md#providedhtdedicatedperiodicworkers) in use.
|
||||
time, reproviding all CIDs in that region. This measures the throughput of a
|
||||
single worker only.
|
||||
|
||||
To estimate total system reprovide throughput, multiply by the number of active
|
||||
periodic workers shown in [Workers stats](#workers-stats) (Periodic > Active).
|
||||
|
||||
Example: If this shows 100 CIDs/min and you have 10 active periodic workers,
|
||||
your total reprovide throughput is approximately 1,000 CIDs/min.
|
||||
|
||||
Note: This rate only counts active time when regions are being reprovided. If
|
||||
workers are idle due to network issues or queue exhaustion, actual throughput
|
||||
may be lower.
|
||||
|
||||
### Region reprovide duration
|
||||
|
||||
@ -170,6 +210,13 @@ Average time to reprovide all CIDs in a region during the last cycle.
|
||||
|
||||
Average number of CIDs per region during the last reprovide cycle.
|
||||
|
||||
This measures the average size of a region (how many CIDs are batched together),
|
||||
not a throughput rate. Do NOT multiply this by worker count.
|
||||
|
||||
Combined with [Region reprovide duration](#region-reprovide-duration), this
|
||||
helps estimate per-worker throughput: dividing Avg CIDs/reprovide by Region
|
||||
reprovide duration gives CIDs/min/worker.
|
||||
|
||||
### Regions reprovided (last cycle)
|
||||
|
||||
Number of regions reprovided in the last cycle.
|
||||
@ -189,11 +236,16 @@ Number of idle workers not reserved for periodic or burst tasks.
|
||||
Breakdown of worker status by type (periodic for scheduled reprovides, burst for
|
||||
initial provides). For each type:
|
||||
|
||||
- **Active**: Currently processing operations
|
||||
- **Active**: Currently processing operations (use this count when calculating total throughput from per-worker rates)
|
||||
- **Dedicated**: Reserved for this type
|
||||
- **Available**: Idle dedicated workers + [free workers](#free-workers)
|
||||
- **Queued**: 0 or 1 (workers acquired only when needed)
|
||||
|
||||
The number of active workers determines your total system throughput. For
|
||||
example, if you have 10 active periodic workers, multiply
|
||||
[CIDs reprovided/min/worker](#cids-reprovidedminworker) by 10 to estimate total
|
||||
reprovide throughput.
|
||||
|
||||
See [provide queue](#provide-queue) and [reprovide queue](#reprovide-queue) for
|
||||
regions waiting to be processed.
|
||||
|
||||
@ -202,6 +254,31 @@ regions waiting to be processed.
|
||||
Maximum concurrent DHT server connections per worker when sending provider
|
||||
records for a region.
|
||||
|
||||
## Capacity Planning
|
||||
|
||||
### Estimating if your system can keep up with the reprovide schedule
|
||||
|
||||
To check if your provide system has sufficient capacity:
|
||||
|
||||
1. Calculate required throughput:
|
||||
- Required CIDs/min = [CIDs scheduled](#cids-scheduled) / ([Reprovide interval](#reprovide-interval) in minutes)
|
||||
- Example: 67M CIDs / (22 hours × 60 min) = 50,758 CIDs/min needed
|
||||
|
||||
2. Calculate actual throughput:
|
||||
- Actual CIDs/min = [CIDs reprovided/min/worker](#cids-reprovidedminworker) × Active periodic workers
|
||||
- Example: 100 CIDs/min/worker × 256 active workers = 25,600 CIDs/min
|
||||
|
||||
3. Compare:
|
||||
- If actual < required: System is underprovisioned, increase [MaxWorkers](./config.md#providedhtmaxworkers) or [DedicatedPeriodicWorkers](./config.md#providedhtdedicatedperiodicworkers)
|
||||
- If actual > required: System has excess capacity
|
||||
- If [Reprovide queue](#reprovide-queue) is growing: System is falling behind
|
||||
|
||||
### Understanding worker utilization
|
||||
|
||||
- High active workers with growing reprovide queue: Need more workers or network connectivity is limiting throughput
|
||||
- Low active workers with non-empty reprovide queue: Workers may be waiting for network or DHT operations
|
||||
- Check [Reachable peers](#reachable-peers) to diagnose network connectivity issues
|
||||
|
||||
## See Also
|
||||
|
||||
- [Provide configuration reference](./config.md#provide)
|
||||
|
||||
Loading…
Reference in New Issue
Block a user