Update tig-benchmarker master and README.

This commit is contained in:
FiveMovesAhead 2025-05-21 10:14:01 +01:00
parent a96d3e9def
commit 18a1f1db92
9 changed files with 173 additions and 89 deletions

View File

@ -2,6 +2,46 @@
Benchmarker for TIG. Expected setup is a single master and multiple slaves on different servers.
## Overview
Benchmarking in TIG works as follows:
1. Master submits a precommit to TIG protocol, with details of what they will benchmark:
* `block_id`: start of the benchmark
* `player_id`: address of the benchmarker
* `challenge_id`: challenge to benchmark
* `algorithm_id`: algorithm to benchmark
* `difficulty`: difficulty of instances to randomly generate
* `num_nonces`: number of instances to benchmark
2. TIG protocol confirms the precommit, and assigns it a random string
3. Master starts benchmarking:
* polls TIG protocol for the confirmed precommit + random string
* creates a benchmark job, splitting it into batches
* slaves poll the master for batches to compute
* slaves do the computation, and send results back to master
4. After benchmarking is finished, master submits benchmark to TIG protocol:
* `solution_nonces`: list of nonces for which a solution was found
* `merkle_root`: Merkle root of the tree constructed using results as leafs
5. TIG protocol confirms the benchmark, and randomly samples nonces requiring proof
6. Master prepares the proof:
* polls TIG protocol for confirmed benchmark + sampled nonces
* creates proof jobs
* slaves poll the master for nonces requiring proof
* slaves send Merkle branches back to master
7. Master submits proof to TIG protocol
8. TIG protocol confirms the proof, calculating the block from which the solutions will become "active" (eligible to earn rewards)
* Verification is performed in parallel
* Solutions will be inactive 120 blocks from when the benchmark started
* The delay is determined by number of blocks between the start and when proof was confirmed
* Each block, active solutions which qualify will earn rewards for the Benchmarker
# Starting Your Master
Simply run:
@ -41,9 +81,53 @@ See last section on how to find your player_id & api_key.
2. Delete the database: `rm -rf db_data`
3. Start your master
## Optimising your Master Config
## Master Config
See [docs.tig.foundation](https://docs.tig.foundation/benchmarking/benchmarker-config)
The master config defines how benchmarking jobs are selected, scheduled, and distributed to slaves. This config can be edited via the master UI or via API (`http://localhost:<MASTER_PORT>/update-config`).
```json
{
"player_id": "0x0000000000000000000000000000000000000000",
"api_key": "00000000000000000000000000000000",
"api_url": "https://mainnet-api.tig.foundation",
"time_between_resubmissions": 60000,
"max_concurrent_benchmarks": 4,
"algo_selection": [
{
"algorithm_id": "c001_a001",
"num_nonces": 40,
"difficulty_range": [0, 0.5],
"selected_difficulties": [],
"weight": 1,
"batch_size": 8,
"base_fee_limit": "10000000000000000"
},
...
],
"time_before_batch_retry": 60000,
"slaves": [
{
"name_regex": ".*",
"algorithm_id_regex": ".*",
"max_concurrent_batches": 1
}
]
}
```
**Explanation:**
* `player_id`: Your wallet address (lowercase)
* `api_key`: See last section on how to obtain your API key
* `api_url`: mainnet (https://mainnet-api.tig.foundation) or testnet (https://testnet-api.tig.foundation)
* `time_between_resubmissions`: Time in milliseconds to wait before resubmitting an benchmark/proof which has not confirmed into a block
* `max_concurrent_benchmarks`: Maximum number of benchmarks that can be "in flight" at once (i.e., benchmarks where the proof has not been computed yet).
* `algo_selection`: list of algorithms that can be picked for benchmarking. Each entry has:
* `algorithm_id`: id for the algorithm (e.g., c001_a001). Tip: use [list_algorithms](../scripts/list_algorithms.py) script to get list of algorithm ids
* `num_nonces`: Number of instances to benchmark for this algorithm
* `difficulty_range`: the bounds (0.0 = easiest, 1.0 = hardest) for a random difficulty sampling. Full range is `[0.0, 1.0]`
* `selected_difficulties`: A list of difficulties `[[x1,y1], [x2, y2], ...]`. If any of the difficulties are in valid range, one will be randomly selected instead of sampling from the difficulty range
* `weight`: Selection weight. An algorithm is chosen proportionally to `weight / total_weight`
* `batch_size`: Number of nonces per batch. Must be a power of 2. For example, if num_nonces = 40 and batch_size = 8, the benchmark is split into 5 batches
# Connecting Slaves
@ -64,11 +148,60 @@ See [docs.tig.foundation](https://docs.tig.foundation/benchmarking/benchmarker-c
```
**Notes:**
* If your master is on a different server to your slave, you need to add the option `--master <SERVER_IP>`
* To set the number of workers (threads), use the option `--workers <NUM_WORKERS>`
* To use a different port, use the option `--port <MASTER_PORT>`
* If your master is on a different server, add `--master <SERVER_IP>`
* Set a custom master port with `--port <MASTER_PORT>`
* To see all options, use `--help`
## Slave Config
You can control execution limits via a JSON config:
```json
{
"max_workers": 100,
"cpus": 8,
"gpus": 0,
"algorithms": [
{
"id_regex": ".*",
"cpu_cost": 1.0,
"gpu_cost": 0.0
}
]
}
```
**Explanation:**
* `max_workers`: maximum concurrent tig-runtime processes.
* `cpus` & `gpus`: total compute limits available to the slave.
* `algorithms`: rules for matching algorithms based on `id_regex`.
* An algorithm can only be executed if it stays within the total `cpu_cost` and `gpu_cost` limits.
* Regex matches algorithm ids (e.g., `c004_a[\d3]` matches all vector_search algorithms).
**Example:**
This example limits c001/c002/c003 to 2 concurrent instances per CPU. It also limits c004/c005 to 4 concurrent instances per GPU:
```json
{
"max_workers": 10,
"cpus": 4,
"gpus": 2,
"algorithms": [
{
"id_regex": "c00[123].*",
"cpu_cost": 0.5,
"gpu_cost": 0.0
},
{
"id_regex": "c00[45].*",
"cpu_cost": 0.0,
"gpu_cost": 0.25
}
]
}
```
# Finding your API Key
## Mainnet

View File

@ -71,11 +71,6 @@ class ClientManager:
async def update_config(request: Request):
logger.debug("Received config update")
new_config = await request.json()
for k, v in new_config["job_manager_config"]["default_batch_sizes"].items():
if v == 0:
raise HTTPException(status_code=400, detail=f"Batch size for {k} cannot be 0")
if (v & (v - 1)) != 0:
raise HTTPException(status_code=400, detail=f"Batch size for {k} must be a power of 2")
for x in new_config["algo_selection"]:
if x["batch_size"] == 0:
raise HTTPException(status_code=400, detail=f"Batch size for {x['algorithm_id']} cannot be 0")

View File

@ -49,7 +49,7 @@ class DataFetcher:
algorithms_data, benchmarks_data, challenges_data = list(executor.map(_get, tasks))
algorithms = {a["id"]: Algorithm.from_dict(a) for a in algorithms_data["algorithms"]}
wasms = {w["algorithm_id"]: Binary.from_dict(w) for w in algorithms_data["binarys"]}
binarys = {w["algorithm_id"]: Binary.from_dict(w) for w in algorithms_data["binarys"]}
precommits = {b["benchmark_id"]: Precommit.from_dict(b) for b in benchmarks_data["precommits"]}
benchmarks = {b["id"]: Benchmark.from_dict(b) for b in benchmarks_data["benchmarks"]}
@ -74,7 +74,7 @@ class DataFetcher:
self._cache = {
"block": block,
"algorithms": algorithms,
"wasms": wasms,
"binarys": binarys,
"precommits": precommits,
"benchmarks": benchmarks,
"proofs": proofs,

View File

@ -24,7 +24,7 @@ class JobManager:
proofs: Dict[str, Proof],
challenges: Dict[str, Challenge],
algorithms: Dict[str, Algorithm],
wasms: Dict[str, Binary],
binarys: Dict[str, Binary],
**kwargs
):
api_url = CONFIG["api_url"]
@ -69,20 +69,20 @@ class JobManager:
}
hash_threshold = self.hash_thresholds[x.details.block_started][x.settings.challenge_id]
wasm = wasms.get(x.settings.algorithm_id, None)
if wasm is None:
logger.error(f"no wasm found for algorithm_id {x.settings.algorithm_id}")
bin = binarys.get(x.settings.algorithm_id, None)
if bin is None:
logger.error(f"batch {x.id}: no binary-blob found for {x.settings.algorithm_id}. skipping job")
continue
if wasm.details.download_url is None:
logger.error(f"no download_url found for wasm {wasm.algorithm_id}")
if bin.details.download_url is None:
logger.error(f"batch {x.id}: no download_url found for {bin.algorithm_id}. skipping job")
continue
batch_size = next(
(s["batch_size"] for s in algo_selection if s["algorithm_id"] == x.settings.algorithm_id),
None
)
if batch_size is None:
batch_size = config["default_batch_size"][x.settings.challenge_id]
logger.error(f"No batch size found for algorithm_id {x.settings.algorithm_id}, using default {batch_size}")
logger.error(f"batch {x.id}: no batch size found for {x.settings.algorithm_id}. skipping job")
continue
num_batches = math.ceil(x.details.num_nonces / batch_size)
atomic_inserts = [
(
@ -112,11 +112,11 @@ class JobManager:
x.details.num_nonces,
num_batches,
x.details.rand_hash,
json.dumps(block.config["benchmarks"]["runtime_configs"]["wasm"]),
json.dumps(block.config["benchmarks"]["runtime_config"]),
batch_size,
c_name,
a_name,
wasm.details.download_url,
bin.details.download_url,
x.details.block_started,
hash_threshold
)

View File

@ -11,25 +11,12 @@ from master.client_manager import CONFIG
logger = logging.getLogger(os.path.splitext(os.path.basename(__file__))[0])
@dataclass
class AlgorithmSelectionConfig(FromDict):
algorithm: str
base_fee_limit: PreciseNumber
num_nonces: int
weight: float
@dataclass
class PrecommitManagerConfig(FromDict):
max_pending_benchmarks: int
algo_selection: Dict[str, AlgorithmSelectionConfig]
class PrecommitManager:
def __init__(self):
self.last_block_id = None
self.num_precommits_submitted = 0
self.algorithm_name_2_id = {}
self.challenge_name_2_id = {}
self.curr_base_fees = {}
def on_new_block(
self,
@ -43,11 +30,6 @@ class PrecommitManager:
):
self.last_block_id = block.id
self.num_precommits_submitted = 0
self.curr_base_fees = {
c.details.name: c.block_data.base_fee
for c in challenges.values()
if c.block_data is not None
}
benchmark_stats_by_challenge = {
c.details.name: {
"solutions": 0,
@ -94,12 +76,11 @@ class PrecommitManager:
"""
)["count"]
config = CONFIG["precommit_manager_config"]
algo_selection = CONFIG["algo_selection"]
num_pending_benchmarks = num_pending_jobs + self.num_precommits_submitted
if num_pending_benchmarks >= config["max_pending_benchmarks"]:
logger.debug(f"number of pending benchmarks has reached max of {config['max_pending_benchmarks']}")
if num_pending_benchmarks >= CONFIG["max_concurrent_benchmarks"]:
logger.debug(f"number of pending benchmarks has reached max of {CONFIG['max_concurrent_benchmarks']}")
return
logger.debug(f"Selecting algorithm from: {[(x['algorithm_id'], x['weight']) for x in algo_selection]}")
selection = random.choices(algo_selection, weights=[x["weight"] for x in algo_selection])[0]

View File

@ -98,15 +98,13 @@ class SlaveManager:
@app.route('/get-batches', methods=['GET'])
def get_batch(request: Request):
config = CONFIG["slave_manager_config"]
if (slave_name := request.headers.get('User-Agent', None)) is None:
return "User-Agent header is required", 403
if not any(re.match(slave["name_regex"], slave_name) for slave in config["slaves"]):
if not any(re.match(slave["name_regex"], slave_name) for slave in CONFIG["slaves"]):
logger.warning(f"slave {slave_name} does not match any regex. rejecting get-batch request")
raise HTTPException(status_code=403, detail="Unregistered slave")
slave = next((slave for slave in config["slaves"] if re.match(slave["name_regex"], slave_name)), None)
slave = next((slave for slave in CONFIG["slaves"] if re.match(slave["name_regex"], slave_name)), None)
concurrent = []
updates = []
@ -129,7 +127,7 @@ class SlaveManager:
if (
b["slave"] is None or
b["start_time"] is None or
(now - b["start_time"]) > config["time_before_batch_retry"]
(now - b["start_time"]) > CONFIG["time_before_batch_retry"]
):
b["slave"] = slave_name
b["start_time"] = now
@ -278,7 +276,6 @@ class SlaveManager:
return {"status": "OK"}
config = CONFIG["slave_manager_config"]
thread = Thread(target=lambda: uvicorn.run(app, host="0.0.0.0", port=5115))
thread.daemon = True
thread.start()

View File

@ -92,8 +92,6 @@ class SubmissionsManager:
)
def run(self, submit_precommit_req: Optional[SubmitPrecommitRequest]):
config = CONFIG["submissions_manager_config"]
now = int(time.time() * 1000)
if submit_precommit_req is None:
logger.debug("no precommit to submit")
@ -128,7 +126,7 @@ class SubmissionsManager:
INNER JOIN job_data B
ON A.benchmark_id = B.benchmark_id
""",
(config["time_between_retries"],)
(CONFIG["time_between_resubmissions"],)
)
if benchmark_to_submit:
@ -171,7 +169,7 @@ class SubmissionsManager:
INNER JOIN job_data B
ON A.benchmark_id = B.benchmark_id
""",
(config["time_between_retries"],)
(CONFIG["time_between_resubmissions"],)
)
if proof_to_submit:

View File

@ -106,21 +106,8 @@ SELECT '
"player_id": "0x0000000000000000000000000000000000000000",
"api_key": "00000000000000000000000000000000",
"api_url": "https://mainnet-api.tig.foundation",
"submissions_manager_config": {
"time_between_retries": 60000
},
"job_manager_config": {
"default_batch_sizes": {
"c001": 8,
"c002": 8,
"c003": 8,
"c004": 8,
"c005": 8
}
},
"precommit_manager_config": {
"max_pending_benchmarks": 4
},
"time_between_resubmissions": 60000,
"max_concurrent_benchmarks": 4,
"algo_selection": [
{
"algorithm_id": "c001_a001",
@ -128,8 +115,7 @@ SELECT '
"difficulty_range": [0, 0.5],
"selected_difficulties": [],
"weight": 1,
"batch_size": 8,
"base_fee_limit": "10000000000000000"
"batch_size": 8
},
{
"algorithm_id": "c002_a001",
@ -137,8 +123,7 @@ SELECT '
"difficulty_range": [0, 0.5],
"selected_difficulties": [],
"weight": 1,
"batch_size": 8,
"base_fee_limit": "10000000000000000"
"batch_size": 8
},
{
"algorithm_id": "c003_a001",
@ -146,8 +131,7 @@ SELECT '
"difficulty_range": [0, 0.5],
"selected_difficulties": [],
"weight": 1,
"batch_size": 8,
"base_fee_limit": "10000000000000000"
"batch_size": 8
},
{
"algorithm_id": "c004_a001",
@ -155,8 +139,7 @@ SELECT '
"difficulty_range": [0, 0.5],
"selected_difficulties": [],
"weight": 1,
"batch_size": 8,
"base_fee_limit": "10000000000000000"
"batch_size": 8
},
{
"algorithm_id": "c005_a001",
@ -164,19 +147,16 @@ SELECT '
"difficulty_range": [0, 0.5],
"selected_difficulties": [],
"weight": 1,
"batch_size": 8,
"base_fee_limit": "10000000000000000"
"batch_size": 8
}
],
"slave_manager_config": {
"time_before_batch_retry": 60000,
"slaves": [
{
"name_regex": ".*",
"algorithm_id_regex": ".*",
"max_concurrent_batches": 1
}
]
}
"time_before_batch_retry": 60000,
"slaves": [
{
"name_regex": ".*",
"algorithm_id_regex": ".*",
"max_concurrent_batches": 1
}
]
}'
WHERE NOT EXISTS (SELECT 1 FROM config);

View File

@ -67,7 +67,7 @@ serializable_struct_with_getters! {
lifespan_period: u32,
min_per_nonce_fee: PreciseNumber,
min_base_fee: PreciseNumber,
runtime_configs: HashMap<String, RuntimeConfig>,
runtime_config: RuntimeConfig,
target_solution_rate: u32,
hash_threshold_max_percent_delta: f64,
}