kubo/config/import.go
Marcin Rataj cec7432043
Some checks failed
CodeQL / codeql (push) Has been cancelled
Docker Check / lint (push) Has been cancelled
Docker Check / build (push) Has been cancelled
Gateway Conformance / gateway-conformance (push) Has been cancelled
Gateway Conformance / gateway-conformance-libp2p-experiment (push) Has been cancelled
Go Build / go-build (push) Has been cancelled
Go Check / go-check (push) Has been cancelled
Go Lint / go-lint (push) Has been cancelled
Go Test / go-test (push) Has been cancelled
Interop / interop-prep (push) Has been cancelled
Sharness / sharness-test (push) Has been cancelled
Spell Check / spellcheck (push) Has been cancelled
Interop / helia-interop (push) Has been cancelled
Interop / ipfs-webui (push) Has been cancelled
feat: fast provide support in dag import (#11058)
* fix(add): respect Provide config in fast-provide-root

fast-provide-root should honor the same config settings as the regular
provide system:
- skip when Provide.Enabled is false
- skip when Provide.DHT.Interval is 0
- respect Provide.Strategy (all/pinned/roots/mfs/combinations)

This ensures fast-provide only runs when appropriate based on user
configuration and the nature of the content being added (pinned vs
unpinned, added to MFS or not).

* feat(config): options to adjust global defaults

Add Import.FastProvideRoot and Import.FastProvideWait configuration options
to control default behavior of fast-provide-root and fast-provide-wait flags
in ipfs add command. Users can now set global defaults in config while
maintaining per-command flag overrides.

- Add Import.FastProvideRoot (default: true)
- Add Import.FastProvideWait (default: false)
- Add ResolveBoolFromConfig helper for config resolution
- Update docs with configuration details
- Add log-based tests verifying actual behavior

* refactor: extract fast-provide logic into reusable functions

Extract fast-provide logic from add command into reusable components:
- Add config.ShouldProvideForStrategy helper for strategy matching
- Add ExecuteFastProvide function reusable across add and dag import commands
- Move DefaultFastProvideTimeout constant to config/provide.go
- Simplify add.go from 72 lines to 6 lines for fast-provide
- Move fast-provide tests to dedicated TestAddFastProvide function

Benefits:
- cleaner API: callers only pass content characteristics
- all strategy logic centralized in one place
- better separation of concerns
- easier to add fast-provide to other commands in future

* feat(dag): add fast-provide support for dag import

Adds --fast-provide-root and --fast-provide-wait flags to `ipfs dag import`,
mirroring the fast-provide functionality available in `ipfs add`.

Changes:
- Add --fast-provide-root and --fast-provide-wait flags to dag import command
- Implement fast-provide logic for all root CIDs in imported CAR files
- Works even when --pin-roots=false (strategy checked internally)
- Share ExecuteFastProvide implementation between add and dag import
- Move ExecuteFastProvide to cmdenv package to avoid import cycles
- Add logging when fast-provide is disabled
- Conditional error handling: return error when wait=true, warn when wait=false
- Update config docs to mention both ipfs add and ipfs dag import
- Update changelog to use "provide" terminology and include dag import examples
- Add comprehensive test coverage (TestDagImportFastProvide with 6 test cases)

The fast-provide feature allows immediate DHT announcement of root CIDs
for faster content discovery, bypassing the regular background queue.

* docs: improve fast-provide documentation

Refine documentation to better explain fast-provide and sweep provider working
together, and highlight the performance improvement.

Changelog:
- add fast-provide to sweep provider features list
- explain performance improvement: root CIDs discoverable in <1s vs 30+ seconds
- note this uses optimistic DHT operations (faster with sweep provider)
- simplify examples, point to --help for details

Config docs:
- fix: --fast-provide-roots should be --fast-provide-root (singular)
- clarify Import.FastProvideRoot focuses on root CIDs while sweep handles all blocks
- simplify Import.FastProvideWait description

Command help:
- ipfs add: explain sweep provider context upfront
- ipfs dag import: add fast-provide explanation section
- both explain the split: fast-provide for roots, sweep for all blocks

* test: add tests for ShouldProvideForStrategy

add tests covering all provide strategy combinations with focus on
bitflag OR logic (the else-if bug fix). organized by behavior:
- all strategy always provides
- single strategies match only their flag
- combined strategies use OR logic
- zero strategy never provides

* refactor: error cmd on error and wait=true

change ExecuteFastProvide() to return error, enabling proper error
propagation when --fast-provide-wait=true. in sync mode, provide
failures now error the command as expected. in async mode (default),
always returns nil with errors logged in background goroutine.

also remove duplicate ExecuteFastProvide() from provide.go (75 lines),
keeping single implementation in cmdenv/env.go for reuse across add
and dag import commands.

call sites simplified:
- add.go: check and propagate error from ExecuteFastProvide
- dag/import.go: return error from ForEach callback, remove confusing
  conditional error handling

semantics:
- precondition skips (DHT unavailable, etc): return nil (not failure)
- async mode (wait=false): return nil, log errors in goroutine
- sync mode (wait=true): return wrapped error on provide failure
2025-11-14 21:06:25 -08:00

185 lines
5.9 KiB
Go

package config
import (
"fmt"
"strconv"
"strings"
"github.com/ipfs/boxo/ipld/unixfs/importer/helpers"
"github.com/ipfs/boxo/ipld/unixfs/io"
"github.com/ipfs/boxo/verifcid"
mh "github.com/multiformats/go-multihash"
)
const (
DefaultCidVersion = 0
DefaultUnixFSRawLeaves = false
DefaultUnixFSChunker = "size-262144"
DefaultHashFunction = "sha2-256"
DefaultFastProvideRoot = true
DefaultFastProvideWait = false
DefaultUnixFSHAMTDirectorySizeThreshold = 262144 // 256KiB - https://github.com/ipfs/boxo/blob/6c5a07602aed248acc86598f30ab61923a54a83e/ipld/unixfs/io/directory.go#L26
// DefaultBatchMaxNodes controls the maximum number of nodes in a
// write-batch. The total size of the batch is limited by
// BatchMaxnodes and BatchMaxSize.
DefaultBatchMaxNodes = 128
// DefaultBatchMaxSize controls the maximum size of a single
// write-batch. The total size of the batch is limited by
// BatchMaxnodes and BatchMaxSize.
DefaultBatchMaxSize = 100 << 20 // 20MiB
)
var (
DefaultUnixFSFileMaxLinks = int64(helpers.DefaultLinksPerBlock)
DefaultUnixFSDirectoryMaxLinks = int64(0)
DefaultUnixFSHAMTDirectoryMaxFanout = int64(io.DefaultShardWidth)
)
// Import configures the default options for ingesting data. This affects commands
// that ingest data, such as 'ipfs add', 'ipfs dag put, 'ipfs block put', 'ipfs files write'.
type Import struct {
CidVersion OptionalInteger
UnixFSRawLeaves Flag
UnixFSChunker OptionalString
HashFunction OptionalString
UnixFSFileMaxLinks OptionalInteger
UnixFSDirectoryMaxLinks OptionalInteger
UnixFSHAMTDirectoryMaxFanout OptionalInteger
UnixFSHAMTDirectorySizeThreshold OptionalBytes
BatchMaxNodes OptionalInteger
BatchMaxSize OptionalInteger
FastProvideRoot Flag
FastProvideWait Flag
}
// ValidateImportConfig validates the Import configuration according to UnixFS spec requirements.
// See: https://specs.ipfs.tech/unixfs/#hamt-structure-and-parameters
func ValidateImportConfig(cfg *Import) error {
// Validate CidVersion
if !cfg.CidVersion.IsDefault() {
cidVer := cfg.CidVersion.WithDefault(DefaultCidVersion)
if cidVer != 0 && cidVer != 1 {
return fmt.Errorf("Import.CidVersion must be 0 or 1, got %d", cidVer)
}
}
// Validate UnixFSFileMaxLinks
if !cfg.UnixFSFileMaxLinks.IsDefault() {
maxLinks := cfg.UnixFSFileMaxLinks.WithDefault(DefaultUnixFSFileMaxLinks)
if maxLinks <= 0 {
return fmt.Errorf("Import.UnixFSFileMaxLinks must be positive, got %d", maxLinks)
}
}
// Validate UnixFSDirectoryMaxLinks
if !cfg.UnixFSDirectoryMaxLinks.IsDefault() {
maxLinks := cfg.UnixFSDirectoryMaxLinks.WithDefault(DefaultUnixFSDirectoryMaxLinks)
if maxLinks < 0 {
return fmt.Errorf("Import.UnixFSDirectoryMaxLinks must be non-negative, got %d", maxLinks)
}
}
// Validate UnixFSHAMTDirectoryMaxFanout if set
if !cfg.UnixFSHAMTDirectoryMaxFanout.IsDefault() {
fanout := cfg.UnixFSHAMTDirectoryMaxFanout.WithDefault(DefaultUnixFSHAMTDirectoryMaxFanout)
// Check all requirements: fanout < 8 covers both non-positive and non-multiple of 8
// Combined with power of 2 check and max limit, this ensures valid values: 8, 16, 32, 64, 128, 256, 512, 1024
if fanout < 8 || !isPowerOfTwo(fanout) || fanout > 1024 {
return fmt.Errorf("Import.UnixFSHAMTDirectoryMaxFanout must be a positive power of 2, multiple of 8, and not exceed 1024 (got %d)", fanout)
}
}
// Validate BatchMaxNodes
if !cfg.BatchMaxNodes.IsDefault() {
maxNodes := cfg.BatchMaxNodes.WithDefault(DefaultBatchMaxNodes)
if maxNodes <= 0 {
return fmt.Errorf("Import.BatchMaxNodes must be positive, got %d", maxNodes)
}
}
// Validate BatchMaxSize
if !cfg.BatchMaxSize.IsDefault() {
maxSize := cfg.BatchMaxSize.WithDefault(DefaultBatchMaxSize)
if maxSize <= 0 {
return fmt.Errorf("Import.BatchMaxSize must be positive, got %d", maxSize)
}
}
// Validate UnixFSChunker format
if !cfg.UnixFSChunker.IsDefault() {
chunker := cfg.UnixFSChunker.WithDefault(DefaultUnixFSChunker)
if !isValidChunker(chunker) {
return fmt.Errorf("Import.UnixFSChunker invalid format: %q (expected \"size-<bytes>\", \"rabin-<min>-<avg>-<max>\", or \"buzhash\")", chunker)
}
}
// Validate HashFunction
if !cfg.HashFunction.IsDefault() {
hashFunc := cfg.HashFunction.WithDefault(DefaultHashFunction)
hashCode, ok := mh.Names[strings.ToLower(hashFunc)]
if !ok {
return fmt.Errorf("Import.HashFunction unrecognized: %q", hashFunc)
}
// Check if the hash is allowed by verifcid
if !verifcid.DefaultAllowlist.IsAllowed(hashCode) {
return fmt.Errorf("Import.HashFunction %q is not allowed for use in IPFS", hashFunc)
}
}
return nil
}
// isPowerOfTwo checks if a number is a power of 2
func isPowerOfTwo(n int64) bool {
return n > 0 && (n&(n-1)) == 0
}
// isValidChunker validates chunker format
func isValidChunker(chunker string) bool {
if chunker == "buzhash" {
return true
}
// Check for size-<bytes> format
if strings.HasPrefix(chunker, "size-") {
sizeStr := strings.TrimPrefix(chunker, "size-")
if sizeStr == "" {
return false
}
// Check if it's a valid positive integer (no negative sign allowed)
if sizeStr[0] == '-' {
return false
}
size, err := strconv.Atoi(sizeStr)
// Size must be positive (not zero)
return err == nil && size > 0
}
// Check for rabin-<min>-<avg>-<max> format
if strings.HasPrefix(chunker, "rabin-") {
parts := strings.Split(chunker, "-")
if len(parts) != 4 {
return false
}
// Parse and validate min, avg, max values
values := make([]int, 3)
for i := 0; i < 3; i++ {
val, err := strconv.Atoi(parts[i+1])
if err != nil {
return false
}
values[i] = val
}
// Validate ordering: min <= avg <= max
min, avg, max := values[0], values[1], values[2]
return min <= avg && avg <= max
}
return false
}