Commit Graph

31 Commits

Author SHA1 Message Date
Jorropo
af79d841dc fix: future proof with > rcmgr.DefaultLimit for new enum rcmgr values 2023-03-20 10:05:23 +01:00
Jorropo
6ff764f9fb fix: preserve Unlimited StreamsInbound in connmgr reconciliation
Fixes #9695
2023-03-20 10:05:11 +01:00
Jorropo
7986196414 feat: Reduce RM code footprint
Co-Authored-By: Antonio Navarro Perez <antnavper@gmail.com>
2023-03-06 12:46:58 +01:00
Jorropo
0ff406170d fix: update rcmgr for go-libp2p v0.25 2023-02-14 22:19:46 +01:00
Antonio Navarro Perez
633c497f63 Adjust inbound connection limits depending on memory. 2023-01-30 11:01:03 +01:00
Jorropo
8328bab28d fix: ensure connmgr is smaller then autoscalled ressource limits
Fixes #9545
2023-01-20 19:25:38 +01:00
Antonio Navarro Perez
9662c8e3f9 Fix sharness test
Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
2022-12-07 18:30:21 +01:00
Antonio Navarro Perez
a54cf2a95a Requested changes.
Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
2022-12-07 18:30:21 +01:00
Antonio Navarro Perez
67886f7bd3 Fix: RM: Improve init RM message and fix final memory value.
Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
2022-12-07 18:30:21 +01:00
Antonio Navarro Perez
7a8639ee33 Apply suggestions from code review
Co-authored-by: Steve Loeppky <biglep@protocol.ai>
2022-12-07 16:47:37 +01:00
Antonio Navarro Perez
22a03bda6d Increase MaxMemory param to use half of total memory.
Previously it was using 1/8 of the total memory.

Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
2022-12-07 16:47:37 +01:00
Antonio Navarro Perez
96bbd53033 Fix: RM: Set no-limit value to 1e9 (1000000000).
Added a comment next to the value to make possible to people to grep
over the code and find where that value is set.

Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
2022-12-07 15:56:41 +01:00
Steve Loeppky
83034d840c Doc improvements and changelog for resource manager (#9413)
Co-authored-by: Antonio Navarro Perez <antnavper@gmail.com>
2022-11-16 13:09:12 +01:00
Antonio Navarro Perez
dfa631e841 Apply go fmt
Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
2022-11-15 18:43:44 +01:00
Antonio Navarro Perez
4bff042036 Update core/node/libp2p/rcmgr_defaults.go
Co-authored-by: Steve Loeppky <biglep@protocol.ai>
2022-11-15 18:43:44 +01:00
Antonio Navarro Perez
377d4e9938 Remove limitation by HighWater param.
Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
2022-11-15 18:43:44 +01:00
Antonio Navarro Perez
fb42b53f58 Fix RM errors when acceleratedDHT is active
Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
2022-11-15 18:43:44 +01:00
Antonio Navarro Perez
254d81a9d5
feat: Improve ResourceManager UX (#9338)
This PR adds several new functionalities to make easier the usage of ResourceManager:

- Now resource manager logs when resources are exceeded are on ERROR instead of warning.
- The resources exceeded error now shows what kind of limit was reached and the scope.
- When there was no limit exceeded, we print a message for the user saying that limits are not exceeded anymore.
- Added `swarm limit all` command to show all set limits with the same format as `swarm stats all`
- Added `min-used-limit-perc` option to `swarm stats all` to only show stats that are above a specific percentage
- Simplify a lot default values.
- **Enable ResourceManager by default.**

Output example:
```
2022-11-09T10:51:40.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:51:50.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 483095 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:51:50.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:00.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 455294 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:00.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:10.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 471384 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:10.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:20.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 8 times with error "peer:12D3KooWKqcaBtcmZKLKCCoDPBuA6AXGJMNrLQUPPMsA5Q6D1eG6: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:20.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 192 times with error "peer:12D3KooWPjetWPGQUih9LZTGHdyAM9fKaXtUxDyBhA93E3JAWCXj: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:20.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 469746 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:20.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:30.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 484137 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:30.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 29 times with error "peer:12D3KooWPjetWPGQUih9LZTGHdyAM9fKaXtUxDyBhA93E3JAWCXj: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:30.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:40.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 468843 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:40.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:50.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 366638 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:50.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:53:00.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 405526 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:53:00.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 107 times with error "peer:12D3KooWQZQCwevTDGhkE9iGYk5sBzWRDUSX68oyrcfM9tXyrs2Q: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:53:00.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:53:10.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 336923 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:53:10.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:53:20.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 71 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:53:20.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:53:30.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:64      Resrouce limits are no longer being exceeded.

```
## Validation tests

- Accelerated DHT client runs with no errors when ResourceManager is active. No problems were observed.
- Running an attack with 200 connections and 1M streams using yamux protocol. Node was usable during the attack. With ResourceManager deactivated, the node was killed by the OS because of the amount of memory consumed.
	- Actions done when the attack was active:
		- Add files 
		- Force a reprovide
		- Use the gateway to resolve an IPNS address.

It closes #9001 
It closes #9351
It closes #9322
2022-11-10 12:25:57 +01:00
Antonio Navarro Perez
bf8274f6e2
feat: --reset flag on swarm limit command (#9310)
* feat: --reset flag on swarm limit command

This flag allows to the user to reset limits to default values.

Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>

* Use adjusted default limits and remove already fixed FIXME

Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>

* Apply suggestions from code review

Co-authored-by: Gus Eggert <gus@gus.dev>

* Return correct defaults

* Remove resetting all values from a map.

Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>

Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
Co-authored-by: Gus Eggert <gus@gus.dev>
2022-10-12 11:02:43 -04:00
Jorropo
fb22320c9a fix: error message for DefaultServiceLimits change 2022-09-21 23:16:03 +02:00
Jorropo
74aaf37cec chore: bump go-libp2p v0.23.1
This does not include any WebTransport config code in Kubo, this will be done later in an other PR.
2022-09-21 23:16:03 +02:00
Jorropo
196887cbe5 chore: bump go-libp2p v0.22.0 & go1.18&go1.19
Fixes: #9225
2022-09-09 17:09:38 +02:00
Marco Munizaga
7245e8dd11 Update checkImplicitDefaults 2022-08-15 14:12:21 -07:00
Marco Munizaga
30fb10a635 Update pinned defaults 2022-08-15 14:12:21 -07:00
Marco Munizaga
62a9829caf Reintroduce connmgr hi watermark logic 2022-08-15 14:12:21 -07:00
Marten Seemann
c3589a1728 WIP rcmgr auto limit scaling 2022-08-15 14:12:18 -07:00
Marcin Rataj
82467bc936 refactor: rename to kubo 2022-07-06 18:40:37 +02:00
Gus Eggert
b8617b9966
fix: adjust rcmgr limits for accelerated DHT client rt refresh (#8982)
* fix: adjust rcmgr limits for accelerated DHT client rt refresh

The Accelerated DHT client periodically refreshes its routing table,
including at startup, and if Resource Manager throttling causes the
client's routing table to be incomplete, then content routing may be
degraded or broken for users.

This adjusts the default limits to a level that empirically doesn't
cause Resource Manager throttling during initial DHT client
bootstrapping. Ideally the Accelerated DHT client would handle this
scenario more gracefully, but this works for now to unblock the 0.13
release.

* Set default outbound conns unconditionally

This also sets the default overall conns as a function of the outbound
and inbound conns, since they are adjusted dynamically, and it makes
the intention of the value clear.

* increase min FD limit
2022-06-02 10:23:42 -04:00
Lucas Molas
f156486208 fix(node/libp2p): disable rcmg checkImplicitDefaults 2022-05-12 15:30:31 +02:00
Marcin Rataj
5d712166b2
feat: detect changes in go-libp2p-resource-manager (#8857)
This adds simple check that will scream loud and clear every time
go-libp2p libraries change any of the implicit defaults
related to go-libp2p-resource-manager
2022-04-08 17:43:30 +02:00
Marten Seemann
514411bedb
feat: opt-in Swarm.ResourceMgr (go-libp2p v0.18) (#8680)
* update go-libp2p to v0.18.0

* initialize the resource manager

* add resource manager stats/limit commands

* load limit file when building resource manager

* log absent limit file

* write rcmgr to file when IPFS_DEBUG_RCMGR is set

* fix: mark swarm limit|stats as experimental

* feat(cfg): opt-in Swarm.ResourceMgr

This ensures we can safely test the resource manager without impacting
default behavior.

- Resource manager is disabled by default
    - Default for Swarm.ResourceMgr.Enabled is false for now
- Swarm.ResourceMgr.Limits allows user to tweak limits per specific
  scope in a way that is persisted across restarts
- 'ipfs swarm limit system' outputs human-readable json
- 'ipfs swarm limit system new-limits.json' sets new runtime limits
  (but does not change Swarm.ResourceMgr.Limits in the config)

Conventions to make libp2p devs life easier:
- 'IPFS_RCMGR=1 ipfs daemon' overrides the config and enables resource manager
- 'limit.json' overrides implicit defaults from libp2p (if present)

* docs(config): small tweaks

* fix: skip libp2p.ResourceManager if disabled

This ensures 'ipfs swarm limit|stats' work only when enabled.

* fix: use NullResourceManager when disabled

This reverts commit b19f7c9eca.
after clarification feedback from
https://github.com/ipfs/go-ipfs/pull/8680#discussion_r841680182

* style: rename IPFS_RCMGR to LIBP2P_RCMGR

preexisting libp2p toggles use LIBP2P_ prefix

* test: Swarm.ResourceMgr

* fix: location of opt-in limit.json and rcmgr.json.gz

Places these files inside of IPFS_PATH

* Update docs/config.md

* feat: expose rcmgr metrics when enabled (#8785)

* add metrics for the resource manager
* export protocol and service name in Prometheus metrics
* fix: expose rcmgr metrics only when enabled

Co-authored-by: Marcin Rataj <lidel@lidel.org>

* refactor: rcmgr_metrics.go

* refactor: rcmgr_defaults.go

This file defines implicit limit defaults used when Swarm.ResourceMgr.Enabled

We keep vendored copy to ensure go-ipfs is not impacted when go-libp2p
decides to change defaults in any of the future releases.

* refactor: adjustedDefaultLimits

Cleans up the way we initialize defaults and adds a fix for case
when connection manager runs with high limits.

It also hides `Swarm.ResourceMgr.Limits` until we have a better
understanding what syntax makes sense.

* chore: cleanup after a review

* fix: restore go-ipld-prime v0.14.2

* fix: restore go-ds-flatfs v0.5.1

Co-authored-by: Lucas Molas <schomatis@gmail.com>
Co-authored-by: Marcin Rataj <lidel@lidel.org>
2022-04-07 21:06:35 -04:00