addresses stream frame memory pooling issue where StreamFrame objects
weren't properly returned to sync.Pool during stream cancellation
see quic-go/quic-go#5327
* test: add migration tests for Windows and macOS
- add dedicated CI workflow for migration tests on Windows/macOS
- workflow triggers on migration-related file changes only
* build: remove redundant go version checks
- remove GO_MIN_VERSION and check_go_version scripts
- go.mod already enforces minimum version (go 1.25)
- fixes make build on Windows
* fix: windows migration panic by reading config into memory
fixes migration panic on Windows when upgrading from v0.37 to v0.38
by reading the entire config file into memory before performing atomic
operations. this avoids file locking issues on Windows where open files
cannot be renamed.
also fixes:
- TestRepoDir to set USERPROFILE on Windows (not just HOME)
- CLI migration tests to sanitize directory names (remove colons)
minimal fix that solves the "panic: error can't be dealt with
transactionally: Access is denied" error without adding unnecessary
platform-specific complexity.
* fix: set PATH for CLI migration tests in CI
the CLI tests need the built ipfs binary to be in PATH
* fix: use ipfs shutdown for graceful daemon termination in tests
replaces platform-specific signal handling with ipfs shutdown command
which works consistently across all platforms including Windows
* fix: isolate PATH modifications in parallel migration tests
tests running in parallel with t.Parallel() were interfering with each
other through global PATH modifications via os.Setenv(). this caused
tests to download real migration binaries instead of using mocks,
leading to Windows failures due to path separator issues in external tools.
now each test builds its own custom PATH and passes it explicitly to
commands, preventing interference between parallel tests.
* chore: improve error messages in WithBackup
* fix: Windows CI migration test failures
- add .exe extension to mock migration binaries on Windows
- handle repo lock file properly in mock migration binary
- ensure lock is created and removed to prevent conflicts
* refactor: align atomicfile error handling with fs-repo-migrations
- check close error in Abort() before attempting removal
- leave temp file on rename failure for debugging (like fs-repo-15-to-16)
- improves consistency with external migration implementations
* fix: use req.Context in repo migrate to avoid double-lock
The repo migrate command was calling cctx.Context() which has a hidden
side effect: it lazily constructs the IPFS node by calling GetNode(),
which opens the repository and acquires repo.lock. When migrations then
tried to acquire the same lock, it failed with "lock is already held by us"
because go4.org/lock tracks locks per-process in a global map.
The fix uses req.Context instead, which is a plain context.Context with
no side effects. This provides what migrations need (cancellation handling)
without triggering node construction or repo opening.
Context types explained:
- req.Context: Standard Go context for request lifetime, cancellation,
and timeouts. No side effects.
- cctx.Context(): Kubo-specific method that lazily constructs the full
IPFS node (opens repo, acquires lock, initializes subsystems). Returns
the node's internal context.
Why req.Context is correct here:
- Migrations work on raw filesystem (only need ConfigRoot path)
- Command has SetDoesNotUseRepo(true) - doesn't need running node
- Migrations handle their own locking via lockfile.Lock()
- Need cancellation support but not node lifecycle
The bug only appeared with embedded migrations (v16+) because they run
in-process. External migrations (pre-v16) were separate processes, so
each had isolated state. Sequential migrations (forward then backward)
in the same process exposed this latent double-lock issue.
Also adds repo.lock acquisition to RunEmbeddedMigrations to prevent
concurrent migration access, and removes the now-unnecessary daemon
lock check from the migrate command handler.
* fix: use req.Context for migrations and autoconf in daemon startup
daemon.go was incorrectly using cctx.Context() in two critical places:
1. Line 337: migrations call - cctx.Context() triggers GetNode() which
opens the repo and acquires repo.lock BEFORE migrations run, causing
"lock is already held by us" errors when migrations try to lock
2. Line 390: autoconf client.Start() - uses context for HTTP timeouts
and background updater lifecycle, doesn't need node construction
Both now use req.Context (plain Go context) which provides:
- request lifetime and cancellation
- no side effects (doesn't construct node or open repo)
- correct lifecycle for HTTP requests and background goroutines
(cherry picked from commit f4834e797d)
* fix: add MFS operation limit for --flush=false
adds a global counter that tracks consecutive MFS operations performed
with --flush=false and fails with clear error after limit is reached.
this prevents unbounded memory growth while avoiding the data corruption
risks of auto-flushing.
- adds Internal.MFSNoFlushLimit config
- operations fail with actionable error at limit
- counter resets on successful flush or any --flush=true operation
- operations with --flush=true reset and don't count
this commit removes automatic flush from https://github.com/ipfs/kubo/pull/10971
and instead errors to encourage users of --flush=false to develop a habit
of calling 'ipfs files flush' periodically.
boxo will no longer auto-flush (https://github.com/ipfs/boxo/pull/1041) to
avoid corruption issues, and kubo applies the limit to 'ipfs files' commands
instead.
closes#10842
* test: add tests for MFSNoFlushLimit
tests verify the new Internal.MFSNoFlushLimit config option:
- default limit of 256 operations
- custom limit configuration
- counter reset on flush=true
- counter reset on explicit flush command
- limit=0 disables the feature
- multiple MFS command types count towards limit
* docs: explain why MFS operations fail instead of auto-flushing
addresses feedback from https://github.com/ipfs/kubo/pull/10985#pullrequestreview-3256250970
- clarify that automatic flushing at limit was considered but rejected
- explain the data corruption risks of auto-flushing
- guide users who want auto-flush to use --flush=true (default)
- document benefits of explicit failure for batch operations
(cherry picked from commit a688b7eeac)
* Filestore: provide Filestore nodes
When strategy is set to "all" (the blockstore does all the providing when a
block is written), no providing was happening to Filestore blocks that were
not written to the underlying blockstore (so, the DAG leaves, as they live in
the filesystem directly). This fixes that.
* docs: clarify filestore and urlstore fix in changelog
both filestore (local file references) and urlstore (HTTP/HTTPS URL
references) blocks are now properly provided shortly after initial add
(cherry picked from commit f63887ae96)
* fix: SweepingProvider shouldn't error when missing DHT
* fix: prevent panic when SweepingProvider has no DHT
when SweepingProvider is enabled but no DHT is available (e.g., Routing.Type=none),
the daemon would panic with a nil pointer dereference in ResettableKeystore.ResetCids.
this fix:
- returns NoopProvider when no DHT implementation is available
- skips keystore initialization for NoopProvider to avoid unnecessary operations
- allows nodes to run without DHT when using HTTP-only routing or offline mode
the panic occurred because initKeyStore tried to access a nil keystore when
SweepingProvider returned nil for the keystore parameter. by checking if the
provider is NoopProvider and skipping keystore operations, we avoid the panic
while maintaining correct behavior for all other provider types.
cc #10974#10975
---------
Co-authored-by: Marcin Rataj <lidel@lidel.org>
* feat: allow custom http provide when offline
* refactor: improve offline HTTP provider handling and tests
- fixed comment/function name mismatch
- added mock server test for HTTP provide success
- clarified test names for offline scenarios
* test: simplify single-node provider tests
use h.NewNode().Init() instead of NewNodes(1) for cleaner test setup
* fix: allow SweepingProvider to work with HTTP-only routing
when no DHT is available but HTTP routers are configured for providing,
return NoopProvider instead of failing. this allows the daemon to start
and HTTP-based providing to work through the routing system.
moved HTTP provider detection to config package as HasHTTPProviderConfigured()
for better code organization and reusability.
this fix is important as SweepingProvider will become the new default in the future.
---------
Co-authored-by: Marcin Rataj <lidel@lidel.org>
* docs: improve slow reprovide warning messages
simplify warning text and provide actionable solutions in order of preference
* feat(config): add validation for Provide.DHT settings
- validate interval doesn't exceed DHT record validity (48h)
- validate worker counts and other parameters are within valid ranges
- improve slow reprovide warning messages to reference config parameter
- add tests for all validation cases
* docs: add reprovide cycle visualization
shows traffic patterns of legacy vs sweep vs accelerated DHT
* ci: optimize build workflows
- use go version from go.mod instead of hardcoding
- group platforms by OS for parallel builds
- remove legacy try-build targets
* fix: checkout before setup-go in all workflows
setup-go needs go.mod to be present, so checkout must happen first
* chore: remove deprecated // +build syntax
go 1.17+ uses //go:build, the old syntax is no longer needed
* simplify: remove nofuse tag from CI workflows
- workflows now rely on platform build constraints
- keep make nofuse target for manual builds
- remove unused appveyor.yml
* ci: remove legacy travis variable and fix gateway-conformance
- remove TRAVIS env variable from 4 workflows
- fix gateway-conformance checkout path to match working-directory
- replace deprecated cache-go-action with built-in setup-go caching
* fix(webui): show helpful errors for incompatible configurations
- show error when Gateway.NoFetch=true and WebUI is not available locally
- show error when Gateway.DeserializedResponses=false (incompatible)
- add tests for both error scenarios
* chore(webui): update to v4.9.0
https://github.com/ipfs/ipfs-webui/releases/tag/v4.9.0
* docs: add WebUI v4.9.0 update to v0.38 changelog
- highlight new diagnostics screen for troubleshooting
- include screenshots of key features in table format
- add local access URL for WebUI
- update TOC with new sections
* fix: prevent --flush=false in 'ipfs files rm' command
the 'ipfs files rm' command always flushes for safety to ensure
data integrity. this change adds an explicit error when users
try to pass --flush=false, improving ux and preventing confusion.
related to #10842
* fix: add MFS cache size limit to prevent unbounded growth
- add Internal.MFSAutoflushThreshold config (experimental)
- directories auto-flush when cache exceeds threshold with --flush=false
- prevents high memory usage issue from #10842
- default: 256 entries per directory (matching HAMT shard size)
- set to 0 to restore old behavior (risky, may cause errors)
Closes#10842
* fix: use CheckIfPinnedWithType for pin ls with names
updates to use CheckIfPinnedWithType method from https://github.com/ipfs/boxo/pull/1035,
enabling efficient pin name retrieval for 'ipfs pin ls <cid> --names'
- uses new CheckIfPinnedWithType from boxo for type-specific pin checks
- pin names are now returned when listing specific CIDs with --names flag
* test: add CLI tests for pin ls with names
tests cover:
- pin ls with specific CIDs returning names
- pin ls without CID listing all pins with names
- pin ls with --type and --names combinations
- JSON output with and without names
- pin update preserving names
- error cases (invalid CID, unpinned CID)
* docs: add pin name improvements to v0.38 changelog
covers fix for ipfs pin ls --names with specific CIDs
and RPC pin name leak fix
* fix(rpc): support pin names in Add()
passes the Name field from PinAddSettings to the API request
adds test to verify pin names work via RPC
* test: add coverage for pin names functionality
- test special characters, unicode, long names
- test concurrent operations
- test persistence across daemon restarts
- test garbage collection preservation
- fix indirect pin test logic
* chore: boxo@main with boxo#1039
* fix(pin): improve pin ls robustness and validation
- add nil check for n.Pinning with early fail-fast validation
- use pin.StringToMode() for consistent type validation
- add edge case tests for invalid types and unpinned CIDs
* refactor: consolidate Provider/Reprovider into unified Provide config
- merge Provider and Reprovider configs into single Provide section
- add fs-repo-17-to-18 migration for config consolidation
- improve migration ergonomics with common package utilities
- convert deprecated "flat" strategy to "all" during migration
- improve Provide docs
* docs: add total_provide_count metric guidance
- document how to monitor provide success rates via prometheus metrics
- add performance comparison section to changelog
- explain how to evaluate sweep vs legacy provider effectiveness
* fix: add OpenTelemetry meter provider for metrics
- set up meter provider with Prometheus exporter in daemon
- enables metrics from external libs like go-libp2p-kad-dht
- fixes missing total_provide_count_total when SweepEnabled=true
- update docs to reflect actual metric names
---------
Co-authored-by: gammazero <11790789+gammazero@users.noreply.github.com>
Co-authored-by: guillaumemichel <guillaume@michel.id>
Co-authored-by: Daniel Norman <1992255+2color@users.noreply.github.com>
Co-authored-by: Hector Sanjuan <code@hector.link>
* reprovide sweep draft
* update reprovider dep
* go mod tidy
* fix provider type
* change router type
* dual reprovider
* revert to provider.System
* back to start
* SweepingReprovider test
* fix nil pointer deref
* noop provider for nil dht
* disabled initial network estimation
* another iteration
* suppress missing self addrs err
* silence empty rt err on lan dht
* comments
* new attempt at integrating
* reverting changes in core/node/libp2p/routing.go
* removing SweepingProvider
* make reprovider optional
* add noop reprovider
* update KeyChanFunc type alias
* restore boxo KeyChanFunc
* fix missing KeyChanFunc
* test(sharness): PARALLEL=1 and timeout 30m
running sequentially to see where timeout occurs
* initialize MHStore
* revert workflow debug
* config
* config docs
* merged IpfsNode provider and reprovider
* move Provider interface to from kad-dht to node
* moved Provider interface from kad-dht to kubo/core/node
* mod_tidy
* Add Clear to Provider interface
* use latest kad-dht commit
* make linter happy
* updated boxo provide interface
* boxo PR fix
* using latest kad-dht commit
* use latest boxo release
* fix fx
* fx cyclic deps
* fix merge issues
* extended tests
* don't provide LAN DHT
* docs
* restore dual dht provider
* don't start provider before it is online
* address linter
* dual/provider fix
* add delay in provider tests for dht bootstrap
* add OfflineDelay parameter to config
* remove increase number of workers in test
* improved keystore gc process
* fix: replace incorrect logger import in coreapi
replaced github.com/labstack/gommon/log with the standard
github.com/ipfs/go-log/v2 logger used throughout kubo.
removed unused labstack dependency from go.mod files.
* fix: remove duplicate WithDefault call in provider config
* fix: use correct option method for burst workers
* fix: improve error messages for experimental sweeping provider
updated error messages to clearly indicate when commands are unavailable
due to experimental sweeping provider being enabled via Reprovider.Sweep.Enabled=true
* docs: remove obsolete KeyStoreGCInterval config
removed from config.md as option no longer exists (removed in b540fba1a)
updated keystore description to reflect gc happens at reprovide interval
* docs: add TODO placeholder changelog for experimental sweeping DHT provider
using v0.38-TODO.md name to avoid merge conflicts with master branch
and allow CI tests to run. will be renamed to v0.38.md once config
migration is added to the PR
* fix: provideKeysRec go routine
* clear keystore on close
* fix: datastore prefix
* fix: improve error handling in provideKeysRec
- close errCh channel to distinguish between nil and pending errors
- check for pending errors when provided.New closes
- handle context cancellation during error send
- prevent race condition where errors could be silently lost
this ensures DAG walk errors are always propagated correctly
* address gammazero's review
* rename BurstProvider to LegacyProvider
* use latest provider/keystore
* boxo: make mfs StartProviding async
* bump boxo
* chore: update boxo to f2b4e12fb9a8ac138ccb82aae3b51ec51d9f631c
- updated boxo dependency to specified commit
- updated go.mod and go.sum files across all modules
* use latest kad-dht/boxo
* Buffered SweepingProvider wrapper
* use latest kad-dht commit
* allow no DHT router
* use latest kad-dht & boxo
---------
Co-authored-by: Marcin Rataj <lidel@lidel.org>
Co-authored-by: gammazero <11790789+gammazero@users.noreply.github.com>
* rpc: don't reuse object during decode
Retaining the object during the loop will make fields such as `Name`
stick between iterations.
This patch decodes into a new struct each iteration, assuring we don't
retain values from other pins.
* rpc: use the Detailed option during request
* fix: enforce identity CID size limits
- validate --inline-limit against verifcid.MaxDigestSize
- add error when --hash=identity exceeds size limit
- add tests for identity CID overflow scenarios
- update help text to show maximum inline limit
This prevents creation of unbounded identity CIDs by enforcing
the 128-byte limit defined in https://github.com/ipfs/boxo/pull/1018Fixes#6011
IPIP: https://github.com/ipfs/specs/pull/512
validates Import configuration fields to prevent invalid values:
- CidVersion: must be 0 or 1
- UnixFSFileMaxLinks: must be positive
- UnixFSDirectoryMaxLinks: must be non-negative
- UnixFSHAMTDirectoryMaxFanout: power of 2, multiple of 8, ≤ 1024
- BatchMaxNodes/BatchMaxSize: must be positive
- UnixFSChunker: validates format patterns
- HashFunction: must be allowed by verifcid
* telemetry: use systemd-detect-virt for container/vm detection
Current VM detection is not very accurate and systemd-detect-virt does exactly
what's needed under a miriad of virtualization platforms.
The downside is that we are running a system command which is uglier and might
perhaps flip anti-viruses or something.
* telemetry: improve vm/container detection with pure go
replace systemd-detect-virt with file-based detection to avoid:
- security risks from executing external binaries
- unnecessary repeated detection (now cached with sync.Once)
- missing detection on non-systemd systems
removes false positives:
- cpu hypervisor flag (indicates capability, not guest status)
- generic dmi strings that match physical hardware
- overlay filesystem check (used by immutable distros)
Co-authored-by: Andrew Gillis <11790789+gammazero@users.noreply.github.com>
Co-authored-by: Marcin Rataj <lidel@lidel.org>
* fix ctrl-c prompt during run migrations prompt
At "Run migrations" prompt, hitting ctrl-c does not show the "(Hit ctrl-c again to force-shutdown the daemon.)" prompt, even though a 2nd ctrl-c does cause the daemon to exit.
The problem is that the goroutine that displays the prompt only run after the prompt to "Run migrations", so hitting ctrl-c during the "Run migrations" prompt does not show the "(Hit ctrl-c again to force-shutdown the daemon.)" prompt.
This PR fixes the problem by starting the goroutine to display the "(Hit ctrl-c again to force-shutdown the daemon.)" prompt sooner, before the "Run mirgarions" prompt. Additionally, this PR also disables showing the ctrl-c prompt if the daemon function has already exited, in which case only the exit message should be shown.
Closes#3157
* exit immediately if ctrl-c during migration prompt