feat(config): optional Gateway.MaxRangeRequestFileSize (#10997)
Some checks are pending
CodeQL / codeql (push) Waiting to run
Docker Check / lint (push) Waiting to run
Docker Check / build (push) Waiting to run
Gateway Conformance / gateway-conformance (push) Waiting to run
Gateway Conformance / gateway-conformance-libp2p-experiment (push) Waiting to run
Go Build / go-build (push) Waiting to run
Go Check / go-check (push) Waiting to run
Go Lint / go-lint (push) Waiting to run
Go Test / go-test (push) Waiting to run
Interop / interop-prep (push) Waiting to run
Interop / helia-interop (push) Blocked by required conditions
Interop / ipfs-webui (push) Blocked by required conditions
Sharness / sharness-test (push) Waiting to run
Spell Check / spellcheck (push) Waiting to run

adds Gateway.MaxRangeRequestFileSize configuration to protect against CDN bugs
where range requests over certain sizes return entire files instead of requested
byte ranges, causing unexpected bandwidth costs.

- default: 0 (no limit)
- returns 501 Not Implemented for oversized range requests
- protects against CDNs like Cloudflare that ignore range requests over 5GiB

also introduces OptionalBytes type to reduce code duplication when handling
byte-size configuration values, replacing manual string parsing with humanize.ParseBytes.
migrates existing byte-size configs to use this new type.

Fixes: https://github.com/ipfs/boxo/issues/856
This commit is contained in:
Marcin Rataj 2025-11-12 03:54:43 +01:00 committed by GitHub
parent 0954d249c2
commit 93f8897d7c
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
11 changed files with 256 additions and 36 deletions

View File

@ -12,8 +12,9 @@ const (
DefaultDiagnosticServiceURL = "https://check.ipfs.network"
// Gateway limit defaults from boxo
DefaultRetrievalTimeout = gateway.DefaultRetrievalTimeout
DefaultMaxConcurrentRequests = gateway.DefaultMaxConcurrentRequests
DefaultRetrievalTimeout = gateway.DefaultRetrievalTimeout
DefaultMaxConcurrentRequests = gateway.DefaultMaxConcurrentRequests
DefaultMaxRangeRequestFileSize = 0 // 0 means no limit
)
type GatewaySpec struct {
@ -100,6 +101,12 @@ type Gateway struct {
// A value of 0 disables the limit.
MaxConcurrentRequests *OptionalInteger `json:",omitempty"`
// MaxRangeRequestFileSize limits the maximum file size for HTTP range requests.
// Range requests for files larger than this limit return 501 Not Implemented.
// This protects against CDN issues with large file range requests and prevents
// excessive bandwidth consumption. A value of 0 disables the limit.
MaxRangeRequestFileSize *OptionalBytes `json:",omitempty"`
// DiagnosticServiceURL is the URL for a service to diagnose CID retrievability issues.
// When the gateway returns a 504 Gateway Timeout error, an "Inspect retrievability of CID"
// button will be shown that links to this service with the CID appended as ?cid=<CID-to-diagnose>.

View File

@ -17,7 +17,7 @@ const (
DefaultUnixFSChunker = "size-262144"
DefaultHashFunction = "sha2-256"
DefaultUnixFSHAMTDirectorySizeThreshold = "256KiB" // https://github.com/ipfs/boxo/blob/6c5a07602aed248acc86598f30ab61923a54a83e/ipld/unixfs/io/directory.go#L26
DefaultUnixFSHAMTDirectorySizeThreshold = 262144 // 256KiB - https://github.com/ipfs/boxo/blob/6c5a07602aed248acc86598f30ab61923a54a83e/ipld/unixfs/io/directory.go#L26
// DefaultBatchMaxNodes controls the maximum number of nodes in a
// write-batch. The total size of the batch is limited by
@ -45,7 +45,7 @@ type Import struct {
UnixFSFileMaxLinks OptionalInteger
UnixFSDirectoryMaxLinks OptionalInteger
UnixFSHAMTDirectoryMaxFanout OptionalInteger
UnixFSHAMTDirectorySizeThreshold OptionalString
UnixFSHAMTDirectorySizeThreshold OptionalBytes
BatchMaxNodes OptionalInteger
BatchMaxSize OptionalInteger
}

View File

@ -322,7 +322,7 @@ fetching may be degraded.
c.Import.UnixFSFileMaxLinks = *NewOptionalInteger(174)
c.Import.UnixFSDirectoryMaxLinks = *NewOptionalInteger(0)
c.Import.UnixFSHAMTDirectoryMaxFanout = *NewOptionalInteger(256)
c.Import.UnixFSHAMTDirectorySizeThreshold = *NewOptionalString("256KiB")
c.Import.UnixFSHAMTDirectorySizeThreshold = *NewOptionalBytes("256KiB")
return nil
},
},
@ -336,7 +336,7 @@ fetching may be degraded.
c.Import.UnixFSFileMaxLinks = *NewOptionalInteger(174)
c.Import.UnixFSDirectoryMaxLinks = *NewOptionalInteger(0)
c.Import.UnixFSHAMTDirectoryMaxFanout = *NewOptionalInteger(256)
c.Import.UnixFSHAMTDirectorySizeThreshold = *NewOptionalString("256KiB")
c.Import.UnixFSHAMTDirectorySizeThreshold = *NewOptionalBytes("256KiB")
return nil
},
},
@ -350,7 +350,7 @@ fetching may be degraded.
c.Import.UnixFSFileMaxLinks = *NewOptionalInteger(1024)
c.Import.UnixFSDirectoryMaxLinks = *NewOptionalInteger(0) // no limit here, use size-based Import.UnixFSHAMTDirectorySizeThreshold instead
c.Import.UnixFSHAMTDirectoryMaxFanout = *NewOptionalInteger(1024)
c.Import.UnixFSHAMTDirectorySizeThreshold = *NewOptionalString("1MiB") // 1MiB
c.Import.UnixFSHAMTDirectorySizeThreshold = *NewOptionalBytes("1MiB") // 1MiB
return nil
},
},

View File

@ -118,7 +118,7 @@ type ResourceMgr struct {
Enabled Flag `json:",omitempty"`
Limits swarmLimits `json:",omitempty"`
MaxMemory *OptionalString `json:",omitempty"`
MaxMemory *OptionalBytes `json:",omitempty"`
MaxFileDescriptors *OptionalInteger `json:",omitempty"`
// A list of multiaddrs that can bypass normal system limits (but are still

View File

@ -7,6 +7,8 @@ import (
"io"
"strings"
"time"
humanize "github.com/dustin/go-humanize"
)
// Strings is a helper type that (un)marshals a single string to/from a single
@ -425,8 +427,79 @@ func (p OptionalString) String() string {
}
var (
_ json.Unmarshaler = (*OptionalInteger)(nil)
_ json.Marshaler = (*OptionalInteger)(nil)
_ json.Unmarshaler = (*OptionalString)(nil)
_ json.Marshaler = (*OptionalString)(nil)
)
// OptionalBytes represents a byte size that has a default value
//
// When encoded in json, Default is encoded as "null".
// Stores the original string representation and parses on access.
// Embeds OptionalString to share common functionality.
type OptionalBytes struct {
OptionalString
}
// NewOptionalBytes returns an OptionalBytes from a string.
func NewOptionalBytes(s string) *OptionalBytes {
return &OptionalBytes{OptionalString{value: &s}}
}
// IsDefault returns if this is a default optional byte value.
func (p *OptionalBytes) IsDefault() bool {
if p == nil {
return true
}
return p.OptionalString.IsDefault()
}
// WithDefault resolves the byte size with the given default.
// Parses the stored string value using humanize.ParseBytes.
func (p *OptionalBytes) WithDefault(defaultValue uint64) (value uint64) {
if p.IsDefault() {
return defaultValue
}
strValue := p.OptionalString.WithDefault("")
bytes, err := humanize.ParseBytes(strValue)
if err != nil {
// This should never happen as values are validated during UnmarshalJSON.
// If it does, it indicates either config corruption or a programming error.
panic(fmt.Sprintf("invalid byte size in OptionalBytes: %q - %v", strValue, err))
}
return bytes
}
// UnmarshalJSON validates the input is a parseable byte size.
func (p *OptionalBytes) UnmarshalJSON(input []byte) error {
switch string(input) {
case "null", "undefined":
*p = OptionalBytes{}
default:
var value interface{}
err := json.Unmarshal(input, &value)
if err != nil {
return err
}
switch v := value.(type) {
case float64:
str := fmt.Sprintf("%.0f", v)
p.value = &str
case string:
_, err := humanize.ParseBytes(v)
if err != nil {
return err
}
p.value = &v
default:
return fmt.Errorf("unable to parse byte size, expected a size string (e.g., \"5GiB\") or a number, but got %T", v)
}
}
return nil
}
var (
_ json.Unmarshaler = (*OptionalBytes)(nil)
_ json.Marshaler = (*OptionalBytes)(nil)
)
type swarmLimits doNotUse

View File

@ -5,6 +5,9 @@ import (
"encoding/json"
"testing"
"time"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestOptionalDuration(t *testing.T) {
@ -509,3 +512,125 @@ func TestOptionalString(t *testing.T) {
}
}
}
func TestOptionalBytes(t *testing.T) {
makeStringPointer := func(v string) *string { return &v }
t.Run("default value", func(t *testing.T) {
var b OptionalBytes
assert.True(t, b.IsDefault())
assert.Equal(t, uint64(0), b.WithDefault(0))
assert.Equal(t, uint64(1024), b.WithDefault(1024))
assert.Equal(t, "default", b.String())
})
t.Run("non-default value", func(t *testing.T) {
b := OptionalBytes{OptionalString{value: makeStringPointer("1MiB")}}
assert.False(t, b.IsDefault())
assert.Equal(t, uint64(1048576), b.WithDefault(512))
assert.Equal(t, "1MiB", b.String())
})
t.Run("JSON roundtrip", func(t *testing.T) {
testCases := []struct {
jsonInput string
jsonOutput string
expectedValue string
}{
{"null", "null", ""},
{"\"256KiB\"", "\"256KiB\"", "256KiB"},
{"\"1MiB\"", "\"1MiB\"", "1MiB"},
{"\"5GiB\"", "\"5GiB\"", "5GiB"},
{"\"256KB\"", "\"256KB\"", "256KB"},
{"1048576", "\"1048576\"", "1048576"},
}
for _, tc := range testCases {
t.Run(tc.jsonInput, func(t *testing.T) {
var b OptionalBytes
err := json.Unmarshal([]byte(tc.jsonInput), &b)
require.NoError(t, err)
if tc.expectedValue == "" {
assert.Nil(t, b.value)
} else {
require.NotNil(t, b.value)
assert.Equal(t, tc.expectedValue, *b.value)
}
out, err := json.Marshal(b)
require.NoError(t, err)
assert.Equal(t, tc.jsonOutput, string(out))
})
}
})
t.Run("parsing byte sizes", func(t *testing.T) {
testCases := []struct {
input string
expected uint64
}{
{"256KiB", 262144},
{"1MiB", 1048576},
{"5GiB", 5368709120},
{"256KB", 256000},
{"1048576", 1048576},
}
for _, tc := range testCases {
t.Run(tc.input, func(t *testing.T) {
var b OptionalBytes
err := json.Unmarshal([]byte("\""+tc.input+"\""), &b)
require.NoError(t, err)
assert.Equal(t, tc.expected, b.WithDefault(0))
})
}
})
t.Run("omitempty", func(t *testing.T) {
type Foo struct {
B *OptionalBytes `json:",omitempty"`
}
out, err := json.Marshal(new(Foo))
require.NoError(t, err)
assert.Equal(t, "{}", string(out))
var foo2 Foo
err = json.Unmarshal(out, &foo2)
require.NoError(t, err)
if foo2.B != nil {
assert.Equal(t, uint64(1024), foo2.B.WithDefault(1024))
assert.True(t, foo2.B.IsDefault())
} else {
// When field is omitted, pointer is nil which is also considered default
t.Log("B is nil, which is acceptable for omitempty")
}
})
t.Run("invalid values", func(t *testing.T) {
invalidInputs := []string{
"\"5XiB\"", "\"invalid\"", "\"\"", "[]", "{}",
}
for _, invalid := range invalidInputs {
t.Run(invalid, func(t *testing.T) {
var b OptionalBytes
err := json.Unmarshal([]byte(invalid), &b)
assert.Error(t, err)
})
}
})
t.Run("panic on invalid stored value", func(t *testing.T) {
// This tests that if somehow an invalid value gets stored
// (bypassing UnmarshalJSON validation), WithDefault will panic
invalidValue := "invalid-size"
b := OptionalBytes{OptionalString{value: &invalidValue}}
assert.Panics(t, func() {
b.WithDefault(1024)
}, "should panic on invalid stored value")
})
}

View File

@ -111,9 +111,10 @@ func Libp2pGatewayOption() ServeOption {
PublicGateways: nil,
Menu: nil,
// Apply timeout and concurrency limits from user config
RetrievalTimeout: cfg.Gateway.RetrievalTimeout.WithDefault(config.DefaultRetrievalTimeout),
MaxConcurrentRequests: int(cfg.Gateway.MaxConcurrentRequests.WithDefault(int64(config.DefaultMaxConcurrentRequests))),
DiagnosticServiceURL: "", // Not used since DisableHTMLErrors=true
RetrievalTimeout: cfg.Gateway.RetrievalTimeout.WithDefault(config.DefaultRetrievalTimeout),
MaxConcurrentRequests: int(cfg.Gateway.MaxConcurrentRequests.WithDefault(int64(config.DefaultMaxConcurrentRequests))),
MaxRangeRequestFileSize: int64(cfg.Gateway.MaxRangeRequestFileSize.WithDefault(uint64(config.DefaultMaxRangeRequestFileSize))),
DiagnosticServiceURL: "", // Not used since DisableHTMLErrors=true
}
handler := gateway.NewHandler(gwConfig, &offlineGatewayErrWrapper{gwimpl: backend})
@ -266,13 +267,14 @@ func getGatewayConfig(n *core.IpfsNode) (gateway.Config, map[string][]string, er
// Initialize gateway configuration, with empty PublicGateways, handled after.
gwCfg := gateway.Config{
DeserializedResponses: cfg.Gateway.DeserializedResponses.WithDefault(config.DefaultDeserializedResponses),
DisableHTMLErrors: cfg.Gateway.DisableHTMLErrors.WithDefault(config.DefaultDisableHTMLErrors),
NoDNSLink: cfg.Gateway.NoDNSLink,
PublicGateways: map[string]*gateway.PublicGateway{},
RetrievalTimeout: cfg.Gateway.RetrievalTimeout.WithDefault(config.DefaultRetrievalTimeout),
MaxConcurrentRequests: int(cfg.Gateway.MaxConcurrentRequests.WithDefault(int64(config.DefaultMaxConcurrentRequests))),
DiagnosticServiceURL: cfg.Gateway.DiagnosticServiceURL.WithDefault(config.DefaultDiagnosticServiceURL),
DeserializedResponses: cfg.Gateway.DeserializedResponses.WithDefault(config.DefaultDeserializedResponses),
DisableHTMLErrors: cfg.Gateway.DisableHTMLErrors.WithDefault(config.DefaultDisableHTMLErrors),
NoDNSLink: cfg.Gateway.NoDNSLink,
PublicGateways: map[string]*gateway.PublicGateway{},
RetrievalTimeout: cfg.Gateway.RetrievalTimeout.WithDefault(config.DefaultRetrievalTimeout),
MaxConcurrentRequests: int(cfg.Gateway.MaxConcurrentRequests.WithDefault(int64(config.DefaultMaxConcurrentRequests))),
MaxRangeRequestFileSize: int64(cfg.Gateway.MaxRangeRequestFileSize.WithDefault(uint64(config.DefaultMaxRangeRequestFileSize))),
DiagnosticServiceURL: cfg.Gateway.DiagnosticServiceURL.WithDefault(config.DefaultDiagnosticServiceURL),
}
// Add default implicit known gateways, such as subdomain gateway on localhost.

View File

@ -8,7 +8,6 @@ import (
"strings"
"time"
"github.com/dustin/go-humanize"
blockstore "github.com/ipfs/boxo/blockstore"
offline "github.com/ipfs/boxo/exchange/offline"
uio "github.com/ipfs/boxo/ipld/unixfs/io"
@ -423,7 +422,10 @@ func IPFS(ctx context.Context, bcfg *BuildCfg) fx.Option {
logger.Fatal(msg) // conflicting values, hard fail
}
logger.Error(msg)
cfg.Import.UnixFSHAMTDirectorySizeThreshold = *cfg.Internal.UnixFSShardingSizeThreshold
// Migrate the old OptionalString value to the new OptionalBytes field.
// Since OptionalBytes embeds OptionalString, we can construct it directly
// with the old value, preserving the user's original string (e.g., "256KiB").
cfg.Import.UnixFSHAMTDirectorySizeThreshold = config.OptionalBytes{OptionalString: *cfg.Internal.UnixFSShardingSizeThreshold}
}
// Validate Import configuration
@ -437,11 +439,7 @@ func IPFS(ctx context.Context, bcfg *BuildCfg) fx.Option {
}
// Auto-sharding settings
shardingThresholdString := cfg.Import.UnixFSHAMTDirectorySizeThreshold.WithDefault(config.DefaultUnixFSHAMTDirectorySizeThreshold)
shardSingThresholdInt, err := humanize.ParseBytes(shardingThresholdString)
if err != nil {
return fx.Error(err)
}
shardSingThresholdInt := cfg.Import.UnixFSHAMTDirectorySizeThreshold.WithDefault(config.DefaultUnixFSHAMTDirectorySizeThreshold)
shardMaxFanout := cfg.Import.UnixFSHAMTDirectoryMaxFanout.WithDefault(config.DefaultUnixFSHAMTDirectoryMaxFanout)
// TODO: avoid overriding this globally, see if we can extend Directory interface like Get/SetMaxLinks from https://github.com/ipfs/boxo/pull/906
uio.HAMTShardingSize = int(shardSingThresholdInt)

View File

@ -19,12 +19,8 @@ var infiniteResourceLimits = rcmgr.InfiniteLimits.ToPartialLimitConfig().System
// The defaults follow the documentation in docs/libp2p-resource-management.md.
// Any changes in the logic here should be reflected there.
func createDefaultLimitConfig(cfg config.SwarmConfig) (limitConfig rcmgr.ConcreteLimitConfig, logMessageForStartup string, err error) {
maxMemoryDefaultString := humanize.Bytes(uint64(memory.TotalMemory()) / 2)
maxMemoryString := cfg.ResourceMgr.MaxMemory.WithDefault(maxMemoryDefaultString)
maxMemory, err := humanize.ParseBytes(maxMemoryString)
if err != nil {
return rcmgr.ConcreteLimitConfig{}, "", err
}
maxMemoryDefault := uint64(memory.TotalMemory()) / 2
maxMemory := cfg.ResourceMgr.MaxMemory.WithDefault(maxMemoryDefault)
maxMemoryMB := maxMemory / (1024 * 1024)
maxFD := int(cfg.ResourceMgr.MaxFileDescriptors.WithDefault(int64(fd.GetNumFDs()) / 2))
@ -142,7 +138,7 @@ Computed default go-libp2p Resource Manager limits based on:
These can be inspected with 'ipfs swarm resources'.
`, maxMemoryString, maxFD)
`, humanize.Bytes(maxMemory), maxFD)
// We already have a complete value thus pass in an empty ConcreteLimitConfig.
return partialLimits.Build(rcmgr.ConcreteLimitConfig{}), msg, nil

View File

@ -24,6 +24,11 @@ This release was brought to you by the [Shipyard](https://ipshipyard.com/) team.
### 🔦 Highlights
#### 🚦 Gateway range request limits for CDN compatibility
The new [`Gateway.MaxRangeRequestFileSize`](https://github.com/ipfs/kubo/blob/master/docs/config.md#gatewaymaxrangerequestfilesize) configuration protects against CDN bugs where range requests over a certain size are silently ignored and the entire file is returned instead ([boxo#856](https://github.com/ipfs/boxo/issues/856#issuecomment-2786431369)). This causes unexpected bandwidth costs for both gateway operators and clients who only wanted a small byte range.
Set this to your CDN's range request limit (e.g., `"5GiB"` for Cloudflare's default plan) to return 501 Not Implemented for oversized range requests, with an error message suggesting verifiable block requests as an alternative.
#### 📊 Detailed statistics for Sweep provider with `ipfs provide stat`
The experimental Sweep provider system ([introduced in

View File

@ -66,6 +66,7 @@ config file at runtime.
- [`Gateway.DisableHTMLErrors`](#gatewaydisablehtmlerrors)
- [`Gateway.ExposeRoutingAPI`](#gatewayexposeroutingapi)
- [`Gateway.RetrievalTimeout`](#gatewayretrievaltimeout)
- [`Gateway.MaxRangeRequestFileSize`](#gatewaymaxrangerequestfilesize)
- [`Gateway.MaxConcurrentRequests`](#gatewaymaxconcurrentrequests)
- [`Gateway.HTTPHeaders`](#gatewayhttpheaders)
- [`Gateway.RootRedirect`](#gatewayrootredirect)
@ -1159,6 +1160,18 @@ Default: `30s`
Type: `optionalDuration`
### `Gateway.MaxRangeRequestFileSize`
Maximum file size for HTTP range requests. Range requests for files larger than this limit return 501 Not Implemented.
Protects against CDN bugs where range requests are silently ignored and the entire file is returned instead. For example, Cloudflare's default plan returns the full file for range requests over 5GiB, causing unexpected bandwidth costs for both gateway operators and clients who only wanted a small byte range.
Set this to your CDN's range request limit (e.g., `"5GiB"` for Cloudflare's default plan). The error response suggests using verifiable block requests (application/vnd.ipld.raw) as an alternative.
Default: `0` (no limit)
Type: [`optionalBytes`](#optionalbytes)
### `Gateway.MaxConcurrentRequests`
Limits concurrent HTTP requests. Requests beyond limit receive 429 Too Many Requests.
@ -3145,7 +3158,7 @@ It is possible to inspect the runtime limits via `ipfs swarm resources --help`.
> To set memory limit for the entire Kubo process, use [`GOMEMLIMIT` environment variable](http://web.archive.org/web/20240222201412/https://kupczynski.info/posts/go-container-aware/) which all Go programs recognize, and then set `Swarm.ResourceMgr.MaxMemory` to less than your custom `GOMEMLIMIT`.
Default: `[TOTAL_SYSTEM_MEMORY]/2`
Type: `optionalBytes`
Type: [`optionalBytes`](#optionalbytes)
#### `Swarm.ResourceMgr.MaxFileDescriptors`
@ -3698,7 +3711,7 @@ Commands affected: `ipfs add`, `ipfs daemon` (globally overrides [`boxo/ipld/uni
Default: `256KiB` (may change, inspect `DefaultUnixFSHAMTDirectorySizeThreshold` to confirm)
Type: `optionalBytes`
Type: [`optionalBytes`](#optionalbytes)
## `Version`
@ -4015,6 +4028,7 @@ an implicit default when missing from the config file:
- a string value indicating the number of bytes, including human readable representations:
- [SI sizes](https://en.wikipedia.org/wiki/Metric_prefix#List_of_SI_prefixes) (metric units, powers of 1000), e.g. `1B`, `2kB`, `3MB`, `4GB`, `5TB`, …)
- [IEC sizes](https://en.wikipedia.org/wiki/Binary_prefix#IEC_prefixes) (binary units, powers of 1024), e.g. `1B`, `2KiB`, `3MiB`, `4GiB`, `5TiB`, …)
- a raw number (will be interpreted as bytes, e.g. `1048576` for 1MiB)
### `optionalString`