kubo/docs/metrics.md
Marcin Rataj 71e883440e
Some checks are pending
CodeQL / codeql (push) Waiting to run
Docker Check / lint (push) Waiting to run
Docker Check / build (push) Waiting to run
Gateway Conformance / gateway-conformance (push) Waiting to run
Gateway Conformance / gateway-conformance-libp2p-experiment (push) Waiting to run
Go Build / go-build (push) Waiting to run
Go Check / go-check (push) Waiting to run
Go Lint / go-lint (push) Waiting to run
Go Test / go-test (push) Waiting to run
Interop / interop-prep (push) Waiting to run
Interop / helia-interop (push) Blocked by required conditions
Interop / ipfs-webui (push) Blocked by required conditions
Sharness / sharness-test (push) Waiting to run
Spell Check / spellcheck (push) Waiting to run
refactor(config): migration 17-to-18 to unify Provider/Reprovider into Provide.DHT (#10951)
* refactor: consolidate Provider/Reprovider into unified Provide config

- merge Provider and Reprovider configs into single Provide section
- add fs-repo-17-to-18 migration for config consolidation
- improve migration ergonomics with common package utilities
- convert deprecated "flat" strategy to "all" during migration
- improve Provide docs

* docs: add total_provide_count metric guidance

- document how to monitor provide success rates via prometheus metrics
- add performance comparison section to changelog
- explain how to evaluate sweep vs legacy provider effectiveness

* fix: add OpenTelemetry meter provider for metrics

- set up meter provider with Prometheus exporter in daemon
- enables metrics from external libs like go-libp2p-kad-dht
- fixes missing total_provide_count_total when SweepEnabled=true
- update docs to reflect actual metric names

---------

Co-authored-by: gammazero <11790789+gammazero@users.noreply.github.com>
Co-authored-by: guillaumemichel <guillaume@michel.id>
Co-authored-by: Daniel Norman <1992255+2color@users.noreply.github.com>
Co-authored-by: Hector Sanjuan <code@hector.link>
2025-09-18 22:17:43 +02:00

5.7 KiB

Kubo metrics

By default, a Prometheus endpoint is exposed by Kubo at http://127.0.0.1:5001/debug/metrics/prometheus.

It includes default Prometheus Go client metrics + Kubo-specific metrics listed below.

Table of Contents

Warning

This documentation is incomplete. For an up-to-date list of metrics available at daemon startup, see test/sharness/t0119-prometheus-data/prometheus_metrics_added_by_measure_profile.

Additional metrics may appear during runtime as some components (like boxo/gateway) register metrics only after their first event occurs (e.g., HTTP request/response).

DHT RPC

Metrics from go-libp2p-kad-dht for DHT RPC operations:

Inbound RPC metrics

  • rpc_inbound_messages_total - Counter: total messages received per RPC
  • rpc_inbound_message_errors_total - Counter: total errors for received messages
  • rpc_inbound_bytes_[bucket|sum|count] - Histogram: distribution of received bytes per RPC
  • rpc_inbound_request_latency_[bucket|sum|count] - Histogram: latency distribution for inbound RPCs

Outbound RPC metrics

  • rpc_outbound_messages_total - Counter: total messages sent per RPC
  • rpc_outbound_message_errors_total - Counter: total errors for sent messages
  • rpc_outbound_requests_total - Counter: total requests sent
  • rpc_outbound_request_errors_total - Counter: total errors for sent requests
  • rpc_outbound_bytes_[bucket|sum|count] - Histogram: distribution of sent bytes per RPC
  • rpc_outbound_request_latency_[bucket|sum|count] - Histogram: latency distribution for outbound RPCs

Provide

Legacy Provider

Metrics for the legacy provider system when Provide.DHT.SweepEnabled=false:

  • provider_reprovider_provide_count - Counter: total successful provide operations since node startup
  • provider_reprovider_reprovide_count - Counter: total reprovide sweep operations since node startup

DHT Provider

Metrics for the DHT provider system when Provide.DHT.SweepEnabled=true:

  • total_provide_count_total - Counter: total successful provide operations since node startup (includes both one-time provides and periodic provides done on Provide.DHT.Interval)

Note

These metrics are exposed by go-libp2p-kad-dht. You can enable debug logging for DHT provider activity with GOLOG_LOG_LEVEL=dht/provider=debug.

Gateway (boxo/gateway)

Tip

These metrics are limited to IPFS Gateway endpoints. For general HTTP metrics across all endpoints, consider using a reverse proxy.

Gateway metrics appear after the first HTTP request is processed:

HTTP metrics

  • ipfs_http_gw_responses_total{code} - Counter: total HTTP responses by status code
  • ipfs_http_gw_retrieval_timeouts_total{code,truncated} - Counter: requests that timed out during content retrieval
  • ipfs_http_gw_concurrent_requests - Gauge: number of requests currently being processed

Blockstore cache metrics

  • ipfs_http_blockstore_cache_hit - Counter: global block cache hits
  • ipfs_http_blockstore_cache_requests - Counter: global block cache requests

Backend metrics

  • ipfs_gw_backend_api_call_duration_seconds_[bucket|sum|count]{backend_method} - Histogram: time spent in IPFSBackend API calls

Generic HTTP Servers

Tip

The metrics below are not very useful and exist mostly for historical reasons. If you need non-gateway HTTP metrics, it's better to put a reverse proxy in front of Kubo and use its metrics.

Core HTTP metrics (ipfs_http_*)

Prometheus metrics for the HTTP API exposed at port 5001:

  • ipfs_http_requests_total{method,code,handler} - Counter: total HTTP requests (Legacy - new metrics are provided by boxo/gateway for gateway traffic)
  • ipfs_http_request_duration_seconds[_sum|_count]{handler} - Summary: request processing duration
  • ipfs_http_request_size_bytes[_sum|_count]{handler} - Summary: request body sizes
  • ipfs_http_response_size_bytes[_sum|_count]{handler} - Summary: response body sizes

HTTP Server metrics (http_server_*)

Additional HTTP instrumentation for all handlers (Gateway, API commands, etc.):

  • http_server_request_body_size_bytes_[bucket|count|sum] - Histogram: distribution of request body sizes
  • http_server_request_duration_seconds_[bucket|count|sum] - Histogram: distribution of request processing times
  • http_server_response_body_size_bytes_[bucket|count|sum] - Histogram: distribution of response body sizes

These metrics are automatically added to Gateway handlers, Hostname Gateway, Libp2p Gateway, and API command handlers.

OpenTelemetry Metadata

Kubo uses Prometheus for metrics collection for historical reasons, but OpenTelemetry metrics are automatically exposed through the same Prometheus endpoint. These metadata metrics provide context about the instrumentation:

  • otel_scope_info - Information about instrumentation libraries producing metrics
  • target_info - Service metadata including version and instance information