diff --git a/tig-benchmarker/README.md b/tig-benchmarker/README.md index 2fea2b0..33ac739 100644 --- a/tig-benchmarker/README.md +++ b/tig-benchmarker/README.md @@ -2,6 +2,46 @@ Benchmarker for TIG. Expected setup is a single master and multiple slaves on different servers. +## Overview + +Benchmarking in TIG works as follows: + +1. Master submits a precommit to TIG protocol, with details of what they will benchmark: + * `block_id`: start of the benchmark + * `player_id`: address of the benchmarker + * `challenge_id`: challenge to benchmark + * `algorithm_id`: algorithm to benchmark + * `difficulty`: difficulty of instances to randomly generate + * `num_nonces`: number of instances to benchmark + +2. TIG protocol confirms the precommit, and assigns it a random string + +3. Master starts benchmarking: + * polls TIG protocol for the confirmed precommit + random string + * creates a benchmark job, splitting it into batches + * slaves poll the master for batches to compute + * slaves do the computation, and send results back to master + +4. After benchmarking is finished, master submits benchmark to TIG protocol: + * `solution_nonces`: list of nonces for which a solution was found + * `merkle_root`: Merkle root of the tree constructed using results as leafs + +5. TIG protocol confirms the benchmark, and randomly samples nonces requiring proof + +6. Master prepares the proof: + * polls TIG protocol for confirmed benchmark + sampled nonces + * creates proof jobs + * slaves poll the master for nonces requiring proof + * slaves send Merkle branches back to master + +7. Master submits proof to TIG protocol + +8. TIG protocol confirms the proof, calculating the block from which the solutions will become "active" (eligible to earn rewards) + * Verification is performed in parallel + * Solutions will be inactive 120 blocks from when the benchmark started + * The delay is determined by number of blocks between the start and when proof was confirmed + * Each block, active solutions which qualify will earn rewards for the Benchmarker + # Starting Your Master Simply run: @@ -41,9 +81,53 @@ See last section on how to find your player_id & api_key. 2. Delete the database: `rm -rf db_data` 3. Start your master -## Optimising your Master Config +## Master Config -See [docs.tig.foundation](https://docs.tig.foundation/benchmarking/benchmarker-config) +The master config defines how benchmarking jobs are selected, scheduled, and distributed to slaves. This config can be edited via the master UI or via API (`http://localhost:/update-config`). + +```json +{ + "player_id": "0x0000000000000000000000000000000000000000", + "api_key": "00000000000000000000000000000000", + "api_url": "https://mainnet-api.tig.foundation", + "time_between_resubmissions": 60000, + "max_concurrent_benchmarks": 4, + "algo_selection": [ + { + "algorithm_id": "c001_a001", + "num_nonces": 40, + "difficulty_range": [0, 0.5], + "selected_difficulties": [], + "weight": 1, + "batch_size": 8, + "base_fee_limit": "10000000000000000" + }, + ... + ], + "time_before_batch_retry": 60000, + "slaves": [ + { + "name_regex": ".*", + "algorithm_id_regex": ".*", + "max_concurrent_batches": 1 + } + ] +} +``` + +**Explanation:** +* `player_id`: Your wallet address (lowercase) +* `api_key`: See last section on how to obtain your API key +* `api_url`: mainnet (https://mainnet-api.tig.foundation) or testnet (https://testnet-api.tig.foundation) +* `time_between_resubmissions`: Time in milliseconds to wait before resubmitting an benchmark/proof which has not confirmed into a block +* `max_concurrent_benchmarks`: Maximum number of benchmarks that can be "in flight" at once (i.e., benchmarks where the proof has not been computed yet). +* `algo_selection`: list of algorithms that can be picked for benchmarking. Each entry has: + * `algorithm_id`: id for the algorithm (e.g., c001_a001). Tip: use [list_algorithms](../scripts/list_algorithms.py) script to get list of algorithm ids + * `num_nonces`: Number of instances to benchmark for this algorithm + * `difficulty_range`: the bounds (0.0 = easiest, 1.0 = hardest) for a random difficulty sampling. Full range is `[0.0, 1.0]` + * `selected_difficulties`: A list of difficulties `[[x1,y1], [x2, y2], ...]`. If any of the difficulties are in valid range, one will be randomly selected instead of sampling from the difficulty range + * `weight`: Selection weight. An algorithm is chosen proportionally to `weight / total_weight` + * `batch_size`: Number of nonces per batch. Must be a power of 2. For example, if num_nonces = 40 and batch_size = 8, the benchmark is split into 5 batches # Connecting Slaves @@ -64,11 +148,60 @@ See [docs.tig.foundation](https://docs.tig.foundation/benchmarking/benchmarker-c ``` **Notes:** -* If your master is on a different server to your slave, you need to add the option `--master ` -* To set the number of workers (threads), use the option `--workers ` -* To use a different port, use the option `--port ` +* If your master is on a different server, add `--master ` +* Set a custom master port with `--port ` * To see all options, use `--help` +## Slave Config + +You can control execution limits via a JSON config: + +```json +{ + "max_workers": 100, + "cpus": 8, + "gpus": 0, + "algorithms": [ + { + "id_regex": ".*", + "cpu_cost": 1.0, + "gpu_cost": 0.0 + } + ] +} +``` + +**Explanation:** +* `max_workers`: maximum concurrent tig-runtime processes. +* `cpus` & `gpus`: total compute limits available to the slave. +* `algorithms`: rules for matching algorithms based on `id_regex`. + * An algorithm can only be executed if it stays within the total `cpu_cost` and `gpu_cost` limits. + * Regex matches algorithm ids (e.g., `c004_a[\d3]` matches all vector_search algorithms). + +**Example:** + +This example limits c001/c002/c003 to 2 concurrent instances per CPU. It also limits c004/c005 to 4 concurrent instances per GPU: + +```json +{ + "max_workers": 10, + "cpus": 4, + "gpus": 2, + "algorithms": [ + { + "id_regex": "c00[123].*", + "cpu_cost": 0.5, + "gpu_cost": 0.0 + }, + { + "id_regex": "c00[45].*", + "cpu_cost": 0.0, + "gpu_cost": 0.25 + } + ] +} +``` + # Finding your API Key ## Mainnet diff --git a/tig-benchmarker/master/master/client_manager.py b/tig-benchmarker/master/master/client_manager.py index 5921a27..6ef4e48 100644 --- a/tig-benchmarker/master/master/client_manager.py +++ b/tig-benchmarker/master/master/client_manager.py @@ -71,11 +71,6 @@ class ClientManager: async def update_config(request: Request): logger.debug("Received config update") new_config = await request.json() - for k, v in new_config["job_manager_config"]["default_batch_sizes"].items(): - if v == 0: - raise HTTPException(status_code=400, detail=f"Batch size for {k} cannot be 0") - if (v & (v - 1)) != 0: - raise HTTPException(status_code=400, detail=f"Batch size for {k} must be a power of 2") for x in new_config["algo_selection"]: if x["batch_size"] == 0: raise HTTPException(status_code=400, detail=f"Batch size for {x['algorithm_id']} cannot be 0") diff --git a/tig-benchmarker/master/master/data_fetcher.py b/tig-benchmarker/master/master/data_fetcher.py index 82a6642..fa4274d 100644 --- a/tig-benchmarker/master/master/data_fetcher.py +++ b/tig-benchmarker/master/master/data_fetcher.py @@ -49,7 +49,7 @@ class DataFetcher: algorithms_data, benchmarks_data, challenges_data = list(executor.map(_get, tasks)) algorithms = {a["id"]: Algorithm.from_dict(a) for a in algorithms_data["algorithms"]} - wasms = {w["algorithm_id"]: Binary.from_dict(w) for w in algorithms_data["binarys"]} + binarys = {w["algorithm_id"]: Binary.from_dict(w) for w in algorithms_data["binarys"]} precommits = {b["benchmark_id"]: Precommit.from_dict(b) for b in benchmarks_data["precommits"]} benchmarks = {b["id"]: Benchmark.from_dict(b) for b in benchmarks_data["benchmarks"]} @@ -74,7 +74,7 @@ class DataFetcher: self._cache = { "block": block, "algorithms": algorithms, - "wasms": wasms, + "binarys": binarys, "precommits": precommits, "benchmarks": benchmarks, "proofs": proofs, diff --git a/tig-benchmarker/master/master/job_manager.py b/tig-benchmarker/master/master/job_manager.py index 7c69016..b0b5687 100644 --- a/tig-benchmarker/master/master/job_manager.py +++ b/tig-benchmarker/master/master/job_manager.py @@ -24,7 +24,7 @@ class JobManager: proofs: Dict[str, Proof], challenges: Dict[str, Challenge], algorithms: Dict[str, Algorithm], - wasms: Dict[str, Binary], + binarys: Dict[str, Binary], **kwargs ): api_url = CONFIG["api_url"] @@ -69,20 +69,20 @@ class JobManager: } hash_threshold = self.hash_thresholds[x.details.block_started][x.settings.challenge_id] - wasm = wasms.get(x.settings.algorithm_id, None) - if wasm is None: - logger.error(f"no wasm found for algorithm_id {x.settings.algorithm_id}") + bin = binarys.get(x.settings.algorithm_id, None) + if bin is None: + logger.error(f"batch {x.id}: no binary-blob found for {x.settings.algorithm_id}. skipping job") continue - if wasm.details.download_url is None: - logger.error(f"no download_url found for wasm {wasm.algorithm_id}") + if bin.details.download_url is None: + logger.error(f"batch {x.id}: no download_url found for {bin.algorithm_id}. skipping job") continue batch_size = next( (s["batch_size"] for s in algo_selection if s["algorithm_id"] == x.settings.algorithm_id), None ) if batch_size is None: - batch_size = config["default_batch_size"][x.settings.challenge_id] - logger.error(f"No batch size found for algorithm_id {x.settings.algorithm_id}, using default {batch_size}") + logger.error(f"batch {x.id}: no batch size found for {x.settings.algorithm_id}. skipping job") + continue num_batches = math.ceil(x.details.num_nonces / batch_size) atomic_inserts = [ ( @@ -112,11 +112,11 @@ class JobManager: x.details.num_nonces, num_batches, x.details.rand_hash, - json.dumps(block.config["benchmarks"]["runtime_configs"]["wasm"]), + json.dumps(block.config["benchmarks"]["runtime_config"]), batch_size, c_name, a_name, - wasm.details.download_url, + bin.details.download_url, x.details.block_started, hash_threshold ) diff --git a/tig-benchmarker/master/master/precommit_manager.py b/tig-benchmarker/master/master/precommit_manager.py index 0830f7d..6a23670 100644 --- a/tig-benchmarker/master/master/precommit_manager.py +++ b/tig-benchmarker/master/master/precommit_manager.py @@ -11,25 +11,12 @@ from master.client_manager import CONFIG logger = logging.getLogger(os.path.splitext(os.path.basename(__file__))[0]) -@dataclass -class AlgorithmSelectionConfig(FromDict): - algorithm: str - base_fee_limit: PreciseNumber - num_nonces: int - weight: float - -@dataclass -class PrecommitManagerConfig(FromDict): - max_pending_benchmarks: int - algo_selection: Dict[str, AlgorithmSelectionConfig] - class PrecommitManager: def __init__(self): self.last_block_id = None self.num_precommits_submitted = 0 self.algorithm_name_2_id = {} self.challenge_name_2_id = {} - self.curr_base_fees = {} def on_new_block( self, @@ -43,11 +30,6 @@ class PrecommitManager: ): self.last_block_id = block.id self.num_precommits_submitted = 0 - self.curr_base_fees = { - c.details.name: c.block_data.base_fee - for c in challenges.values() - if c.block_data is not None - } benchmark_stats_by_challenge = { c.details.name: { "solutions": 0, @@ -94,12 +76,11 @@ class PrecommitManager: """ )["count"] - config = CONFIG["precommit_manager_config"] algo_selection = CONFIG["algo_selection"] num_pending_benchmarks = num_pending_jobs + self.num_precommits_submitted - if num_pending_benchmarks >= config["max_pending_benchmarks"]: - logger.debug(f"number of pending benchmarks has reached max of {config['max_pending_benchmarks']}") + if num_pending_benchmarks >= CONFIG["max_concurrent_benchmarks"]: + logger.debug(f"number of pending benchmarks has reached max of {CONFIG['max_concurrent_benchmarks']}") return logger.debug(f"Selecting algorithm from: {[(x['algorithm_id'], x['weight']) for x in algo_selection]}") selection = random.choices(algo_selection, weights=[x["weight"] for x in algo_selection])[0] diff --git a/tig-benchmarker/master/master/slave_manager.py b/tig-benchmarker/master/master/slave_manager.py index d04cfb5..1ce3c77 100644 --- a/tig-benchmarker/master/master/slave_manager.py +++ b/tig-benchmarker/master/master/slave_manager.py @@ -98,15 +98,13 @@ class SlaveManager: @app.route('/get-batches', methods=['GET']) def get_batch(request: Request): - config = CONFIG["slave_manager_config"] - if (slave_name := request.headers.get('User-Agent', None)) is None: return "User-Agent header is required", 403 - if not any(re.match(slave["name_regex"], slave_name) for slave in config["slaves"]): + if not any(re.match(slave["name_regex"], slave_name) for slave in CONFIG["slaves"]): logger.warning(f"slave {slave_name} does not match any regex. rejecting get-batch request") raise HTTPException(status_code=403, detail="Unregistered slave") - slave = next((slave for slave in config["slaves"] if re.match(slave["name_regex"], slave_name)), None) + slave = next((slave for slave in CONFIG["slaves"] if re.match(slave["name_regex"], slave_name)), None) concurrent = [] updates = [] @@ -129,7 +127,7 @@ class SlaveManager: if ( b["slave"] is None or b["start_time"] is None or - (now - b["start_time"]) > config["time_before_batch_retry"] + (now - b["start_time"]) > CONFIG["time_before_batch_retry"] ): b["slave"] = slave_name b["start_time"] = now @@ -278,7 +276,6 @@ class SlaveManager: return {"status": "OK"} - config = CONFIG["slave_manager_config"] thread = Thread(target=lambda: uvicorn.run(app, host="0.0.0.0", port=5115)) thread.daemon = True thread.start() diff --git a/tig-benchmarker/master/master/submissions_manager.py b/tig-benchmarker/master/master/submissions_manager.py index 16cf85f..4f8f35c 100644 --- a/tig-benchmarker/master/master/submissions_manager.py +++ b/tig-benchmarker/master/master/submissions_manager.py @@ -92,8 +92,6 @@ class SubmissionsManager: ) def run(self, submit_precommit_req: Optional[SubmitPrecommitRequest]): - config = CONFIG["submissions_manager_config"] - now = int(time.time() * 1000) if submit_precommit_req is None: logger.debug("no precommit to submit") @@ -128,7 +126,7 @@ class SubmissionsManager: INNER JOIN job_data B ON A.benchmark_id = B.benchmark_id """, - (config["time_between_retries"],) + (CONFIG["time_between_resubmissions"],) ) if benchmark_to_submit: @@ -171,7 +169,7 @@ class SubmissionsManager: INNER JOIN job_data B ON A.benchmark_id = B.benchmark_id """, - (config["time_between_retries"],) + (CONFIG["time_between_resubmissions"],) ) if proof_to_submit: diff --git a/tig-benchmarker/postgres/init.sql b/tig-benchmarker/postgres/init.sql index 7960733..a82698e 100644 --- a/tig-benchmarker/postgres/init.sql +++ b/tig-benchmarker/postgres/init.sql @@ -106,21 +106,8 @@ SELECT ' "player_id": "0x0000000000000000000000000000000000000000", "api_key": "00000000000000000000000000000000", "api_url": "https://mainnet-api.tig.foundation", - "submissions_manager_config": { - "time_between_retries": 60000 - }, - "job_manager_config": { - "default_batch_sizes": { - "c001": 8, - "c002": 8, - "c003": 8, - "c004": 8, - "c005": 8 - } - }, - "precommit_manager_config": { - "max_pending_benchmarks": 4 - }, + "time_between_resubmissions": 60000, + "max_concurrent_benchmarks": 4, "algo_selection": [ { "algorithm_id": "c001_a001", @@ -128,8 +115,7 @@ SELECT ' "difficulty_range": [0, 0.5], "selected_difficulties": [], "weight": 1, - "batch_size": 8, - "base_fee_limit": "10000000000000000" + "batch_size": 8 }, { "algorithm_id": "c002_a001", @@ -137,8 +123,7 @@ SELECT ' "difficulty_range": [0, 0.5], "selected_difficulties": [], "weight": 1, - "batch_size": 8, - "base_fee_limit": "10000000000000000" + "batch_size": 8 }, { "algorithm_id": "c003_a001", @@ -146,8 +131,7 @@ SELECT ' "difficulty_range": [0, 0.5], "selected_difficulties": [], "weight": 1, - "batch_size": 8, - "base_fee_limit": "10000000000000000" + "batch_size": 8 }, { "algorithm_id": "c004_a001", @@ -155,8 +139,7 @@ SELECT ' "difficulty_range": [0, 0.5], "selected_difficulties": [], "weight": 1, - "batch_size": 8, - "base_fee_limit": "10000000000000000" + "batch_size": 8 }, { "algorithm_id": "c005_a001", @@ -164,19 +147,16 @@ SELECT ' "difficulty_range": [0, 0.5], "selected_difficulties": [], "weight": 1, - "batch_size": 8, - "base_fee_limit": "10000000000000000" + "batch_size": 8 } ], - "slave_manager_config": { - "time_before_batch_retry": 60000, - "slaves": [ - { - "name_regex": ".*", - "algorithm_id_regex": ".*", - "max_concurrent_batches": 1 - } - ] - } + "time_before_batch_retry": 60000, + "slaves": [ + { + "name_regex": ".*", + "algorithm_id_regex": ".*", + "max_concurrent_batches": 1 + } + ] }' WHERE NOT EXISTS (SELECT 1 FROM config); \ No newline at end of file diff --git a/tig-structs/src/config.rs b/tig-structs/src/config.rs index 5479d6d..299bbcd 100644 --- a/tig-structs/src/config.rs +++ b/tig-structs/src/config.rs @@ -67,7 +67,7 @@ serializable_struct_with_getters! { lifespan_period: u32, min_per_nonce_fee: PreciseNumber, min_base_fee: PreciseNumber, - runtime_configs: HashMap, + runtime_config: RuntimeConfig, target_solution_rate: u32, hash_threshold_max_percent_delta: f64, }