Mplex does not implement backpressure, our implementation will randomly reset streams if buffers overflow instead of risking deadlocks.
In the past we had a bug where kubo nodes would prefer mplex over yamux. Turning off mplex make our connections to thoses nodes negociate yamux.
Closes#9958
* fix: mark ipns pubsub router DoNotWaitForSearchValue
That means if the DHT has finished searching and no one responded over pubsub *yet*, we will not spend 1 minute searching for no reason.
This also include other error handling bug fixes inside `go-libp2p-routing-helpers`.
Fixes: #9927
* routing: bring back the old IPNS behaviour
Stop making this configurable let everything race like it used to do.
This adds the ability to enable "optimistic provide" to the default
DHT client, which enables faster provides and reprovides.
For more information about optimistic provide, see:
https://protocollabs.notion.site/Optimistic-Provide-2c79745820fa45649d48de038516b814
Note that this feature only works when using non-custom router
types. This does not include the ability to enable optimistic provide
on custom routers for now, to minimize the footprint of this
experimental feature. We intend on continuing to test this and improve
the UX, which may or may not involve adding configuration for it to
custom routers. We also plan on refactoring/redesigning custom routers
more broadly so I don't want this to add more effort for maintainers
and confusion for users.
In order to make it possible to easily-overwrite the path Resolvers (i.e. via
plugins), this creates resolvers as part of the Node rather than creating them
ad-hoc.
* fix: remove timeout on default DHT operations
This removes the timeout by default for DHT operations. In particular
this causes issues with ProvideMany requests which can take an
indeterminate amount of time, but really these should just respect
context timeouts by default. Users can still specify timeouts here if
they want, but by default they will be set to "0" which means "no
timeout".
This is unlikely to break existing users of custom routing, because
there was previously no utility in configuring a router with timeout=0
because that would cause the router to immediately fail, so it is
unlikely (and incorrect) if anybody was using timeout=0.
* fix: remove 5m timeout on ProvideManyRouter
For context see
5fda291b66
---------
Co-authored-by: Marcin Rataj <lidel@lidel.org>
Added a comment next to the value to make possible to people to grep
over the code and find where that value is set.
Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
This PR adds several new functionalities to make easier the usage of ResourceManager:
- Now resource manager logs when resources are exceeded are on ERROR instead of warning.
- The resources exceeded error now shows what kind of limit was reached and the scope.
- When there was no limit exceeded, we print a message for the user saying that limits are not exceeded anymore.
- Added `swarm limit all` command to show all set limits with the same format as `swarm stats all`
- Added `min-used-limit-perc` option to `swarm stats all` to only show stats that are above a specific percentage
- Simplify a lot default values.
- **Enable ResourceManager by default.**
Output example:
```
2022-11-09T10:51:40.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:59 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:51:50.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 483095 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:51:50.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:59 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:00.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 455294 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:00.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:59 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:10.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 471384 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:10.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:59 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:20.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 8 times with error "peer:12D3KooWKqcaBtcmZKLKCCoDPBuA6AXGJMNrLQUPPMsA5Q6D1eG6: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:20.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 192 times with error "peer:12D3KooWPjetWPGQUih9LZTGHdyAM9fKaXtUxDyBhA93E3JAWCXj: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:20.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 469746 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:20.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:59 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:30.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 484137 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:30.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 29 times with error "peer:12D3KooWPjetWPGQUih9LZTGHdyAM9fKaXtUxDyBhA93E3JAWCXj: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:30.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:59 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:40.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 468843 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:40.566+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:59 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:50.566+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 366638 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:50.566+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:59 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:53:00.566+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 405526 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:53:00.566+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 107 times with error "peer:12D3KooWQZQCwevTDGhkE9iGYk5sBzWRDUSX68oyrcfM9tXyrs2Q: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:53:00.566+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:59 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:53:10.566+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 336923 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:53:10.566+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:59 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:53:20.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:55 Resource limits were exceeded 71 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:53:20.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:59 Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:53:30.565+0100 ERROR resourcemanager libp2p/rcmgr_logging.go:64 Resrouce limits are no longer being exceeded.
```
## Validation tests
- Accelerated DHT client runs with no errors when ResourceManager is active. No problems were observed.
- Running an attack with 200 connections and 1M streams using yamux protocol. Node was usable during the attack. With ResourceManager deactivated, the node was killed by the OS because of the amount of memory consumed.
- Actions done when the attack was active:
- Add files
- Force a reprovide
- Use the gateway to resolve an IPNS address.
It closes#9001
It closes#9351
It closes#9322
* feat: --reset flag on swarm limit command
This flag allows to the user to reset limits to default values.
Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
* Use adjusted default limits and remove already fixed FIXME
Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
* Apply suggestions from code review
Co-authored-by: Gus Eggert <gus@gus.dev>
* Return correct defaults
* Remove resetting all values from a map.
Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
Co-authored-by: Gus Eggert <gus@gus.dev>
`swarm stats all` requires that the ResourceManager instance
implements `rcmgr.ResourceManagerState`, and `loggingResourceManager`
was not implementing it, so the command was failing.
Also added a sharness test to check that the command is executing
correctly, because `jq -e` doesn't return an error if the json is nil.
Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
New multi-router configuration system based on https://hackmd.io/G1KRDEX5T3qyfoBMkIrBew#Methods
- Added a new routing type: "custom"
- Added specific struct types for different Routers (instead of map[string]interface{})
- Added `Duration` config type, to make easier time string parsing
- Added config documentation.
- Use the latest go-delegated-routing library version with GET support.
- Added changelog notes for this feature.
It:
- closes#9157
- closes#9079
- closes#9186
* Delegated Routing.
Implementation of Reframe specs (https://github.com/ipfs/specs/blob/master/REFRAME.md) using go-delegated-routing library.
* Requested changes.
* Init using op string
* Separate possible ContentRouters for TopicDiscovery.
If we don't do this, we have a ciclic dependency creating TieredRouter.
Now we can create first all possible content routers, and after that,
create Routers.
* Set dht default routing type
* Add tests and remove uneeded code
* Add documentation.
* docs: Routing.Routers
* Requested changes.
Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
* Add some documentation on new fx functions.
* Add changelog entry and integration tests
* test: sharness for 'dht' in 'routing' commands
Since 'routing' is currently the same as 'dht' (minus query command)
we need to test both, that way we won't have unnoticed divergence
in the default behavior.
* test(sharness): delegated routing via reframe URL
* Add more tests for delegated routing.
* If any put operation fails, the tiered router will fail.
* refactor: Routing.Routers: Parameters.Endpoint
As agreed in https://github.com/ipfs/kubo/pull/8997#issuecomment-1175684716
* Try to improve CHANGELOG entry.
* chore: update reframe spec link
* Update go-delegated-routing dependency
* Fix config error test
* use new changelog format
* Remove port conflict
* go mod tidy
* ProviderManyWrapper to ProviderMany
* Update docs/changelogs/v0.14.md
Co-authored-by: Adin Schmahmann <adin.schmahmann@gmail.com>
Co-authored-by: Marcin Rataj <lidel@lidel.org>
Co-authored-by: Adin Schmahmann <adin.schmahmann@gmail.com>
* fix: remove mdns_legacy
We've been running both implementations for a long, long time.
It is time to remove legacy version and lower the number of LAN packets
IPFS node produces.
See https://github.com/ipfs/go-ipfs/pull/9048#discussion_r906814717
for the Interval removal rational.
* feat: disable resource manager by default
We are disabling this by default for v0.13 as we work to improve the
UX around Resource Manager. It is still usable and can be enabled in
the IPFS config with "ipfs config --bool Swarm.ResourceMgr.Enabled true".
We intend to enable Resource Manager by default in a subsequent
release.
* docs(config): Swarm.ResourceMgr disabled by default
Co-authored-by: Marcin Rataj <lidel@lidel.org>
* fix: adjust rcmgr limits for accelerated DHT client rt refresh
The Accelerated DHT client periodically refreshes its routing table,
including at startup, and if Resource Manager throttling causes the
client's routing table to be incomplete, then content routing may be
degraded or broken for users.
This adjusts the default limits to a level that empirically doesn't
cause Resource Manager throttling during initial DHT client
bootstrapping. Ideally the Accelerated DHT client would handle this
scenario more gracefully, but this works for now to unblock the 0.13
release.
* Set default outbound conns unconditionally
This also sets the default overall conns as a function of the outbound
and inbound conns, since they are adjusted dynamically, and it makes
the intention of the value clear.
* increase min FD limit
This periodically logs how many times Resource Manager limits were
exceeded. If they aren't exceeded, then nothing is logged. The log
levels are at ERROR log level so that they are shown by default.
The motivation is so that users know when they have exceeded resource
manager limits. To find what is exceeding the limits, they'll need to
turn on debug logging and inspect the errors being logged. This could
collect the specific limits being reached, but that's more complicated
to implement and could result in much longer log messages.
* update go-libp2p to v0.19.0
* chore: go-namesys v0.5.0
* refactor(config): cleanup relay handling
* docs(config): document updated defaults
* fix(tests): panic during sharness
* fix: t0160-resolve.sh
See https://github.com/ipfs/go-namesys/pull/32
* fix: t0182-circuit-relay.sh
* test: transport encryption
Old tests were no longer working because go-libp2p 0.19 removed
the undocumented 'ls' pseudoprotocol.
This replaces these tests with handshake attempt (name is echoed back on
OK or 'na' is returned when protocol is not available) for tls and noise
variants + adds explicit test that safeguards us against enabling
plaintext by default by a mistake.
* fix: ./t0182-circuit-relay.sh
test is flaky, for now we just restart the testbed when we get
NO_RESERVATION error
* refactor: AutoRelayFeeder with exp. backoff
It starts at feeding peers ever 15s, then backs off each time
until it is done once an hour
Should be acceptable until we have smarter mechanism in go-lib2p 0.20
* feat(AutoRelay): prioritize Peering.Peers
This ensures we feed trusted Peering.Peers in addition to any peers
discovered over DHT.
* docs(CHANGELOG): document breaking changes
Co-authored-by: Marcin Rataj <lidel@lidel.org>
Co-authored-by: Gus Eggert <gus@gus.dev>
* feat: persist limit changes to config
This changes the "ipfs swarm limit" command so that when limit changes
are applied via the command line, they are persisted to the repo
config, so that they remain in effect when the daemon restarts.
Any existing limit.json can be dropped into the IPFS config easily
using something like:
cat ~/.ipfs/config | jq ".Swarm.ResourceMgr.Limits = $(cat limit.json)" | sponge ~/.ipfs/config
This also upgrades to Resource Manager v0.3.0, which exports the config
schema so that we don't have to maintain our own copy of it.
Co-authored-by: Marcin Rataj <lidel@lidel.org>
This adds simple check that will scream loud and clear every time
go-libp2p libraries change any of the implicit defaults
related to go-libp2p-resource-manager