address code review feedback for PR #11069:
- fix: propagate decode errors in client/rpc dag import (was silently dropping errors)
- fix: acquire pinlock before spawning goroutine to prevent race with GC
- fix: update fast-provide test to always expect failure in isolated environment
- test: add proper json compatibility test for provide stats (replaces compile-time check)
- docs: add educational comments explaining batch config defaults
- style: standardize error messages to use consistent "failed to X: %w" pattern
the pinlock fix is critical - moving acquisition before goroutine spawn prevents
blocks from being garbage collected before the lock is held. the error handling
fix ensures RPC clients receive decode errors instead of empty results.
adds RPC client support for:
- ipfs provide stat (with --lan flag for dual DHT)
- ipfs dag import (with --fast-provide-root/--fast-provide-wait)
client/rpc changes:
- dag.go: add Import() method (~70 lines)
- dag_test.go: 4 test cases for Import (new file)
- routing.go: add ProvideStats() method (~25 lines)
- routing_test.go: 3 test cases for ProvideStats (new file)
to enable RPC client, refactored commands to use CoreAPI:
- add ProvideStats() to RoutingAPI interface and implementation
- add Import() to APIDagService interface and implementation
- commands delegate to CoreAPI (provide.go, dag/import.go)
adds Import method to APIDagService interface and RPC client implementation
- new DagImportResult, DagImportRoot, DagImportStats types in coreiface
- DagImportOptions with uniform Set pattern for all params (PinRoots, Stats, FastProvideRoot, FastProvideWait)
- streaming channel API for handling multiple roots and stats
- tests covering basic import, stats, offline mode, and blocking wait
add FastProvideRoot and FastProvideWait options to UnixfsAddSettings,
allowing RPC clients to control immediate DHT providing of root CIDs
for faster content discovery
these options default to server config (Import.FastProvideRoot and
Import.FastProvideWait) when not explicitly set by the client
This allows Kubo to respond to the GetClosestPeers() http routing v1 endpoint
as spec'ed here: https://github.com/ipfs/specs/pull/476
It is based on work from https://github.com/ipfs/boxo/pull/1021
We let IpfsNode implmement the contentRouter.Client interface with the new
method. We use our WAN-DHT to get the closest peers.
Additionally, Routing V1 HTTP API is exposed by default which enables light clients in browsers to use Kubo Gateway as delegated routing backend
Co-authored-by: Marcin Rataj <lidel@lidel.org>
PathOrCidPath was returning the error from the second path.NewPath call
instead of the original error when both attempts failed. This fix preserves
the first error before attempting the fallback, ensuring users get the
most relevant error message about their input.
* fix(add): respect Provide config in fast-provide-root
fast-provide-root should honor the same config settings as the regular
provide system:
- skip when Provide.Enabled is false
- skip when Provide.DHT.Interval is 0
- respect Provide.Strategy (all/pinned/roots/mfs/combinations)
This ensures fast-provide only runs when appropriate based on user
configuration and the nature of the content being added (pinned vs
unpinned, added to MFS or not).
* feat(config): options to adjust global defaults
Add Import.FastProvideRoot and Import.FastProvideWait configuration options
to control default behavior of fast-provide-root and fast-provide-wait flags
in ipfs add command. Users can now set global defaults in config while
maintaining per-command flag overrides.
- Add Import.FastProvideRoot (default: true)
- Add Import.FastProvideWait (default: false)
- Add ResolveBoolFromConfig helper for config resolution
- Update docs with configuration details
- Add log-based tests verifying actual behavior
* refactor: extract fast-provide logic into reusable functions
Extract fast-provide logic from add command into reusable components:
- Add config.ShouldProvideForStrategy helper for strategy matching
- Add ExecuteFastProvide function reusable across add and dag import commands
- Move DefaultFastProvideTimeout constant to config/provide.go
- Simplify add.go from 72 lines to 6 lines for fast-provide
- Move fast-provide tests to dedicated TestAddFastProvide function
Benefits:
- cleaner API: callers only pass content characteristics
- all strategy logic centralized in one place
- better separation of concerns
- easier to add fast-provide to other commands in future
* feat(dag): add fast-provide support for dag import
Adds --fast-provide-root and --fast-provide-wait flags to `ipfs dag import`,
mirroring the fast-provide functionality available in `ipfs add`.
Changes:
- Add --fast-provide-root and --fast-provide-wait flags to dag import command
- Implement fast-provide logic for all root CIDs in imported CAR files
- Works even when --pin-roots=false (strategy checked internally)
- Share ExecuteFastProvide implementation between add and dag import
- Move ExecuteFastProvide to cmdenv package to avoid import cycles
- Add logging when fast-provide is disabled
- Conditional error handling: return error when wait=true, warn when wait=false
- Update config docs to mention both ipfs add and ipfs dag import
- Update changelog to use "provide" terminology and include dag import examples
- Add comprehensive test coverage (TestDagImportFastProvide with 6 test cases)
The fast-provide feature allows immediate DHT announcement of root CIDs
for faster content discovery, bypassing the regular background queue.
* docs: improve fast-provide documentation
Refine documentation to better explain fast-provide and sweep provider working
together, and highlight the performance improvement.
Changelog:
- add fast-provide to sweep provider features list
- explain performance improvement: root CIDs discoverable in <1s vs 30+ seconds
- note this uses optimistic DHT operations (faster with sweep provider)
- simplify examples, point to --help for details
Config docs:
- fix: --fast-provide-roots should be --fast-provide-root (singular)
- clarify Import.FastProvideRoot focuses on root CIDs while sweep handles all blocks
- simplify Import.FastProvideWait description
Command help:
- ipfs add: explain sweep provider context upfront
- ipfs dag import: add fast-provide explanation section
- both explain the split: fast-provide for roots, sweep for all blocks
* test: add tests for ShouldProvideForStrategy
add tests covering all provide strategy combinations with focus on
bitflag OR logic (the else-if bug fix). organized by behavior:
- all strategy always provides
- single strategies match only their flag
- combined strategies use OR logic
- zero strategy never provides
* refactor: error cmd on error and wait=true
change ExecuteFastProvide() to return error, enabling proper error
propagation when --fast-provide-wait=true. in sync mode, provide
failures now error the command as expected. in async mode (default),
always returns nil with errors logged in background goroutine.
also remove duplicate ExecuteFastProvide() from provide.go (75 lines),
keeping single implementation in cmdenv/env.go for reuse across add
and dag import commands.
call sites simplified:
- add.go: check and propagate error from ExecuteFastProvide
- dag/import.go: return error from ForEach callback, remove confusing
conditional error handling
semantics:
- precondition skips (DHT unavailable, etc): return nil (not failure)
- async mode (wait=false): return nil, log errors in goroutine
- sync mode (wait=true): return wrapped error on provide failure
* feat: fast provide
* Check error from provideRoot
* do not provide if nil router
* fix(commands): prevent panic from typed nil DHTClient interface
Fixes panic when ipfsNode.DHTClient is a non-nil interface containing a
nil pointer value (typed nil). This happened when Routing.Type=delegated
or when using HTTP-only routing without DHT.
The panic occurred because:
- Go interfaces can be non-nil while containing nil pointer values
- Simple `if DHTClient == nil` checks pass, but calling methods panics
- Example: `(*ddht.DHT)(nil)` stored in interface passes nil check
Solution:
- Add HasActiveDHTClient() method to check both interface and concrete value
- Update all 7 call sites to use proper check before DHT operations
- Rename provideRoot → provideCIDSync for clarity
- Add structured logging with "fast-provide" prefix for easier filtering
- Add tests covering nil cases and valid DHT configurations
Fixes: https://github.com/ipfs/kubo/pull/11046#issuecomment-3525313349
* feat(add): split fast-provide into two flags for async/sync control
Renames --fast-provide to --fast-provide-root and adds --fast-provide-wait
to give users control over synchronous vs asynchronous providing behavior.
Changes:
- --fast-provide-root (default: true): enables immediate root CID providing
- --fast-provide-wait (default: false): controls whether to block until complete
- Default behavior: async provide (fast, non-blocking)
- Opt-in: --fast-provide-wait for guaranteed discoverability (slower, blocking)
- Can disable with --fast-provide-root=false to rely on background reproviding
Implementation:
- Async mode: launches goroutine with detached context for fire-and-forget
- Added 10 second timeout to prevent hanging on network issues
- Timeout aligns with other kubo operations (ping, DNS resolve, p2p)
- Sufficient for DHT with sweep provider or accelerated client
- Sync mode: blocks on provideCIDSync until completion (uses req.Context)
- Improved structured logging with "fast-provide-root:" prefix
- Removed redundant "root CID" from messages (already in prefix)
- Clear async/sync distinction in log messages
- Added FAST PROVIDE OPTIMIZATION section to ipfs add --help explaining:
- The problem: background queue takes time, content not immediately discoverable
- The solution: extra immediate announcement of just the root CID
- The benefit: peers can find content right away while queue handles rest
- Usage: async by default, --fast-provide-wait for guaranteed completion
Changelog:
- Added highlight section for fast root CID providing feature
- Updated TOC and overview
- Included usage examples with clear comments explaining each mode
- Emphasized this is extra announcement independent of background queue
The feature works best with sweep provider and accelerated DHT client
where provide operations are significantly faster.
* fix(add): respect Provide config in fast-provide-root
fast-provide-root should honor the same config settings as the regular
provide system:
- skip when Provide.Enabled is false
- skip when Provide.DHT.Interval is 0
- respect Provide.Strategy (all/pinned/roots/mfs/combinations)
This ensures fast-provide only runs when appropriate based on user
configuration and the nature of the content being added (pinned vs
unpinned, added to MFS or not).
* Update core/commands/add.go
---------
Co-authored-by: gammazero <11790789+gammazero@users.noreply.github.com>
Co-authored-by: Marcin Rataj <lidel@lidel.org>
* telemetry: collect provideDHTSweepEnabled
Fixes#11055.
* telemetry: track custom Provide.DHT.Interval and MaxWorkers
collects whether users customize Interval and MaxWorkers from defaults
to help identify if defaults need adjustment
* docs: improve telemetry documentation structure and clarity
restructure docs/telemetry.md into meaningful sections (routing & discovery,
content providing, network configuration), add exact config field paths for all
tracked settings, and establish code as source of truth by linking from LogEvent
struct while removing redundant field comments
---------
Co-authored-by: Marcin Rataj <lidel@lidel.org>
adds Gateway.MaxRangeRequestFileSize configuration to protect against CDN bugs
where range requests over certain sizes return entire files instead of requested
byte ranges, causing unexpected bandwidth costs.
- default: 0 (no limit)
- returns 501 Not Implemented for oversized range requests
- protects against CDNs like Cloudflare that ignore range requests over 5GiB
also introduces OptionalBytes type to reduce code duplication when handling
byte-size configuration values, replacing manual string parsing with humanize.ParseBytes.
migrates existing byte-size configs to use this new type.
Fixes: https://github.com/ipfs/boxo/issues/856
add "Understanding the Metrics" section explaining three types:
- per-worker rates (multiply by active workers for total throughput)
- per-region averages (do NOT multiply by worker count)
- system totals (cumulative across all workers)
enhance metric descriptions with:
- explicit calculation examples showing which worker counts to use
- warnings about when NOT to multiply by worker count
- cross-references to relevant sections
add "Capacity Planning" section with:
- step-by-step throughput capacity calculations
- diagnostic guidance for common scenarios
- worked examples for estimating required vs actual capacity
addresses confusion from PR #11034 comments about when to multiply
metrics by worker count and how to interpret per-worker rates
* provider: protect libp2p connections
Use latest kad-dht version, introducing connection protection and
retention of addresses in peerstore during provide operations.
* depend on kad-dht master
addresses stream frame memory pooling issue where StreamFrame objects
weren't properly returned to sync.Pool during stream cancellation
see quic-go/quic-go#5327
* chore(deps): update go-libp2p to v0.44.0
- includes self-healing UPnP port mappings after router restarts
- update go-netroute to v0.3.0
- update quic-go to v0.55.0
- add changelog entry for UPnP fix
* docs: improve provide and UPnP clarity in changelog and docs
- add alert polling rationale to changelog
- add UPnP config note with default clarification
- clarify sweep timing and prefix length explanations
- add concrete examples for time offset and record holders
- improve workers stats formatting
- add See Also section to provide-stats.md
* docs: add RISC-V prebuilt binaries to changelog and README
- highlight linux-riscv64 availability with open hardware context
- update README with arm64 builds, remove 32-bit examples
* feat: provide stats
* added N/A
* format
* workers stats alignment
* ipfs provide stat --all --compact
* consolidating compact stat
* update column alignment
* flags combinations errors
* command description
* change schedule AvgPrefixLen to float
* changelog
* alignments
* provide stat description draft
* rephrased provide-stats.md
* linking provide-stats.md from command description
* documentation test
* fix: refactor provide stat command type handling
- add extractSweepingProvider() helper to reduce nested type switching
- extract lowWorkerThreshold constant for worker availability check
- fix --lan error handling to work with buffered providers
* docs: add clarifying comments
* fix(commands): improve provide stat compact mode
- prevent panic when both columns are empty
- fix column alignment with UTF-8 characters
- only track col0MaxWidth for first column (as intended)
* test: add tests for ipfs provide stat command
- test basic functionality, flags, JSON output
- test legacy provider behavior
- test integration with content scheduling
- test disabled provider configurations
- add parseSweepStats helper with t.Helper()
* docs: improve provide command help text
- update tagline to "Control and monitor content providing"
- simplify help descriptions
- make error messages more consistent
- update tests to match new error messages
* metrics rename
```
Next reprovide at:
Next prefix:
```
updated to:
```
Next region prefix:
Next region reprovide:
```
* docs: improve Provide system documentation clarity
Enhance documentation for the Provide system to better explain how provider
records work and the differences between sweep and legacy modes.
Changes to docs/config.md:
- Provide section: add clear explanation of provider records and their role
- Provide.DHT: add provider record lifecycle and two provider systems overview
- Provide.DHT.Interval: explain relationship to expiration, contrast sweep vs legacy behavior
- Provide.DHT.SweepEnabled: rewrite to explain legacy problem, sweep solution, and efficiency gains
- Monitoring section: prioritize command-line tools (ipfs provide stat) before Prometheus
Changes to core/commands/provide.go:
- ipfs provide stat help: add explanation of provider records, TTL expiration, and how sweep batching works
Changes to docs/changelogs/v0.39.md:
- Add context about why stats matter for monitoring provider health
- Emphasize real-time monitoring workflow with watch command
- Explain what users can observe (rates, queues, worker availability)
* depend on latest kad-dht master
* docs: nits
---------
Co-authored-by: Marcin Rataj <lidel@lidel.org>
Increase default Provide.DHT.MaxProvideConnsPerWorker value to match the
DHT replication factor (16 -> 20). A similar value is used in legacy
systems (with and without accelerated DHT client).
- clarify staging environment step for FINAL releases
- mark infrastructure updates (collab cluster, bootstrappers) as FINAL only
- improve ipfs-desktop release step wording
- update discourse topic examples to v0.38.0
- reference v0.38.0 release issue in metadata comment
* test: add migration tests for Windows and macOS
- add dedicated CI workflow for migration tests on Windows/macOS
- workflow triggers on migration-related file changes only
* build: remove redundant go version checks
- remove GO_MIN_VERSION and check_go_version scripts
- go.mod already enforces minimum version (go 1.25)
- fixes make build on Windows
* fix: windows migration panic by reading config into memory
fixes migration panic on Windows when upgrading from v0.37 to v0.38
by reading the entire config file into memory before performing atomic
operations. this avoids file locking issues on Windows where open files
cannot be renamed.
also fixes:
- TestRepoDir to set USERPROFILE on Windows (not just HOME)
- CLI migration tests to sanitize directory names (remove colons)
minimal fix that solves the "panic: error can't be dealt with
transactionally: Access is denied" error without adding unnecessary
platform-specific complexity.
* fix: set PATH for CLI migration tests in CI
the CLI tests need the built ipfs binary to be in PATH
* fix: use ipfs shutdown for graceful daemon termination in tests
replaces platform-specific signal handling with ipfs shutdown command
which works consistently across all platforms including Windows
* fix: isolate PATH modifications in parallel migration tests
tests running in parallel with t.Parallel() were interfering with each
other through global PATH modifications via os.Setenv(). this caused
tests to download real migration binaries instead of using mocks,
leading to Windows failures due to path separator issues in external tools.
now each test builds its own custom PATH and passes it explicitly to
commands, preventing interference between parallel tests.
* chore: improve error messages in WithBackup
* fix: Windows CI migration test failures
- add .exe extension to mock migration binaries on Windows
- handle repo lock file properly in mock migration binary
- ensure lock is created and removed to prevent conflicts
* refactor: align atomicfile error handling with fs-repo-migrations
- check close error in Abort() before attempting removal
- leave temp file on rename failure for debugging (like fs-repo-15-to-16)
- improves consistency with external migration implementations
* fix: use req.Context in repo migrate to avoid double-lock
The repo migrate command was calling cctx.Context() which has a hidden
side effect: it lazily constructs the IPFS node by calling GetNode(),
which opens the repository and acquires repo.lock. When migrations then
tried to acquire the same lock, it failed with "lock is already held by us"
because go4.org/lock tracks locks per-process in a global map.
The fix uses req.Context instead, which is a plain context.Context with
no side effects. This provides what migrations need (cancellation handling)
without triggering node construction or repo opening.
Context types explained:
- req.Context: Standard Go context for request lifetime, cancellation,
and timeouts. No side effects.
- cctx.Context(): Kubo-specific method that lazily constructs the full
IPFS node (opens repo, acquires lock, initializes subsystems). Returns
the node's internal context.
Why req.Context is correct here:
- Migrations work on raw filesystem (only need ConfigRoot path)
- Command has SetDoesNotUseRepo(true) - doesn't need running node
- Migrations handle their own locking via lockfile.Lock()
- Need cancellation support but not node lifecycle
The bug only appeared with embedded migrations (v16+) because they run
in-process. External migrations (pre-v16) were separate processes, so
each had isolated state. Sequential migrations (forward then backward)
in the same process exposed this latent double-lock issue.
Also adds repo.lock acquisition to RunEmbeddedMigrations to prevent
concurrent migration access, and removes the now-unnecessary daemon
lock check from the migrate command handler.
* fix: use req.Context for migrations and autoconf in daemon startup
daemon.go was incorrectly using cctx.Context() in two critical places:
1. Line 337: migrations call - cctx.Context() triggers GetNode() which
opens the repo and acquires repo.lock BEFORE migrations run, causing
"lock is already held by us" errors when migrations try to lock
2. Line 390: autoconf client.Start() - uses context for HTTP timeouts
and background updater lifecycle, doesn't need node construction
Both now use req.Context (plain Go context) which provides:
- request lifetime and cancellation
- no side effects (doesn't construct node or open repo)
- correct lifecycle for HTTP requests and background goroutines
(cherry picked from commit f4834e797d)
* test: add migration tests for Windows and macOS
- add dedicated CI workflow for migration tests on Windows/macOS
- workflow triggers on migration-related file changes only
* build: remove redundant go version checks
- remove GO_MIN_VERSION and check_go_version scripts
- go.mod already enforces minimum version (go 1.25)
- fixes make build on Windows
* fix: windows migration panic by reading config into memory
fixes migration panic on Windows when upgrading from v0.37 to v0.38
by reading the entire config file into memory before performing atomic
operations. this avoids file locking issues on Windows where open files
cannot be renamed.
also fixes:
- TestRepoDir to set USERPROFILE on Windows (not just HOME)
- CLI migration tests to sanitize directory names (remove colons)
minimal fix that solves the "panic: error can't be dealt with
transactionally: Access is denied" error without adding unnecessary
platform-specific complexity.
* fix: set PATH for CLI migration tests in CI
the CLI tests need the built ipfs binary to be in PATH
* fix: use ipfs shutdown for graceful daemon termination in tests
replaces platform-specific signal handling with ipfs shutdown command
which works consistently across all platforms including Windows
* fix: isolate PATH modifications in parallel migration tests
tests running in parallel with t.Parallel() were interfering with each
other through global PATH modifications via os.Setenv(). this caused
tests to download real migration binaries instead of using mocks,
leading to Windows failures due to path separator issues in external tools.
now each test builds its own custom PATH and passes it explicitly to
commands, preventing interference between parallel tests.
* chore: improve error messages in WithBackup
* fix: Windows CI migration test failures
- add .exe extension to mock migration binaries on Windows
- handle repo lock file properly in mock migration binary
- ensure lock is created and removed to prevent conflicts
* refactor: align atomicfile error handling with fs-repo-migrations
- check close error in Abort() before attempting removal
- leave temp file on rename failure for debugging (like fs-repo-15-to-16)
- improves consistency with external migration implementations
* fix: use req.Context in repo migrate to avoid double-lock
The repo migrate command was calling cctx.Context() which has a hidden
side effect: it lazily constructs the IPFS node by calling GetNode(),
which opens the repository and acquires repo.lock. When migrations then
tried to acquire the same lock, it failed with "lock is already held by us"
because go4.org/lock tracks locks per-process in a global map.
The fix uses req.Context instead, which is a plain context.Context with
no side effects. This provides what migrations need (cancellation handling)
without triggering node construction or repo opening.
Context types explained:
- req.Context: Standard Go context for request lifetime, cancellation,
and timeouts. No side effects.
- cctx.Context(): Kubo-specific method that lazily constructs the full
IPFS node (opens repo, acquires lock, initializes subsystems). Returns
the node's internal context.
Why req.Context is correct here:
- Migrations work on raw filesystem (only need ConfigRoot path)
- Command has SetDoesNotUseRepo(true) - doesn't need running node
- Migrations handle their own locking via lockfile.Lock()
- Need cancellation support but not node lifecycle
The bug only appeared with embedded migrations (v16+) because they run
in-process. External migrations (pre-v16) were separate processes, so
each had isolated state. Sequential migrations (forward then backward)
in the same process exposed this latent double-lock issue.
Also adds repo.lock acquisition to RunEmbeddedMigrations to prevent
concurrent migration access, and removes the now-unnecessary daemon
lock check from the migrate command handler.
* fix: use req.Context for migrations and autoconf in daemon startup
daemon.go was incorrectly using cctx.Context() in two critical places:
1. Line 337: migrations call - cctx.Context() triggers GetNode() which
opens the repo and acquires repo.lock BEFORE migrations run, causing
"lock is already held by us" errors when migrations try to lock
2. Line 390: autoconf client.Start() - uses context for HTTP timeouts
and background updater lifecycle, doesn't need node construction
Both now use req.Context (plain Go context) which provides:
- request lifetime and cancellation
- no side effects (doesn't construct node or open repo)
- correct lifecycle for HTTP requests and background goroutines