Update tig-benchmarker master and README.

2026-02-21 17:27:21 +08:00 · 2025-05-21 10:14:01 +01:00 · 2025-05-21 10:14:01 +01:00 · 18a1f1db92
commit 18a1f1db92
parent a96d3e9def
9 changed files with 173 additions and 89 deletions
--- a/tig-benchmarker/README.md
+++ b/tig-benchmarker/README.md
@ -2,6 +2,46 @@

 Benchmarker for TIG. Expected setup is a single master and multiple slaves on different servers.

+## Overview
+
+Benchmarking in TIG works as follows:
+
+1. Master submits a precommit to TIG protocol, with details of what they will benchmark:
+   * `block_id`: start of the benchmark
+   * `player_id`: address of the benchmarker
+   * `challenge_id`: challenge to benchmark
+   * `algorithm_id`: algorithm to benchmark
+   * `difficulty`: difficulty of instances to randomly generate
+   * `num_nonces`: number of instances to benchmark
+
+2. TIG protocol confirms the precommit, and assigns it a random string
+
+3. Master starts benchmarking:
+   * polls TIG protocol for the confirmed precommit + random string
+   * creates a benchmark job, splitting it into batches
+   * slaves poll the master for batches to compute
+   * slaves do the computation, and send results back to master
+
+4. After benchmarking is finished, master submits benchmark to TIG protocol:
+   * `solution_nonces`: list of nonces for which a solution was found
+   * `merkle_root`: Merkle root of the tree constructed using results as leafs
+
+5. TIG protocol confirms the benchmark, and randomly samples nonces requiring proof
+
+6. Master prepares the proof:
+   * polls TIG protocol for confirmed benchmark + sampled nonces
+   * creates proof jobs
+   * slaves poll the master for nonces requiring proof
+   * slaves send Merkle branches back to master
+
+7. Master submits proof to TIG protocol
+
+8. TIG protocol confirms the proof, calculating the block from which the solutions will become "active" (eligible to earn rewards)
+   * Verification is performed in parallel
+   * Solutions will be inactive 120 blocks from when the benchmark started
+   * The delay is determined by number of blocks between the start and when proof was confirmed
+   * Each block, active solutions which qualify will earn rewards for the Benchmarker
+
 # Starting Your Master

 Simply run:
@ -41,9 +81,53 @@ See last section on how to find your player_id & api_key.
 2. Delete the database: `rm -rf db_data`
 3. Start your master

-## Optimising your Master Config
+## Master Config

-See [docs.tig.foundation](https://docs.tig.foundation/benchmarking/benchmarker-config)
+The master config defines how benchmarking jobs are selected, scheduled, and distributed to slaves. This config can be edited via the master UI or via API (`http://localhost:<MASTER_PORT>/update-config`).
+
+```json
+{
+  "player_id": "0x0000000000000000000000000000000000000000",
+  "api_key": "00000000000000000000000000000000",
+  "api_url": "https://mainnet-api.tig.foundation",
+  "time_between_resubmissions": 60000,
+  "max_concurrent_benchmarks": 4,
+  "algo_selection": [
+    {
+      "algorithm_id": "c001_a001",
+      "num_nonces": 40,
+      "difficulty_range": [0, 0.5],
+      "selected_difficulties": [],
+      "weight": 1,
+      "batch_size": 8,
+      "base_fee_limit": "10000000000000000"
+    },
+    ...
+  ],
+  "time_before_batch_retry": 60000,
+  "slaves": [
+    {
+      "name_regex": ".*",
+      "algorithm_id_regex": ".*",
+      "max_concurrent_batches": 1
+    }
+  ]
+}
+```
+
+**Explanation:**
+* `player_id`: Your wallet address (lowercase)
+* `api_key`: See last section on how to obtain your API key
+* `api_url`: mainnet (https://mainnet-api.tig.foundation) or testnet (https://testnet-api.tig.foundation)
+* `time_between_resubmissions`: Time in milliseconds to wait before resubmitting an benchmark/proof which has not confirmed into a block
+* `max_concurrent_benchmarks`: Maximum number of benchmarks that can be "in flight" at once (i.e., benchmarks where the proof has not been computed yet).
+* `algo_selection`: list of algorithms that can be picked for benchmarking. Each entry has:
+  * `algorithm_id`: id for the algorithm (e.g., c001_a001). Tip: use [list_algorithms](../scripts/list_algorithms.py) script to get list of algorithm ids
+  * `num_nonces`: Number of instances to benchmark for this algorithm
+  * `difficulty_range`: the bounds (0.0 = easiest, 1.0 = hardest) for a random difficulty sampling. Full range is `[0.0, 1.0]`
+  * `selected_difficulties`: A list of difficulties `[[x1,y1], [x2, y2], ...]`. If any of the difficulties are in valid range, one will be randomly selected instead of sampling from the difficulty range
+  * `weight`: Selection weight. An algorithm is chosen proportionally to `weight / total_weight`
+  * `batch_size`: Number of nonces per batch. Must be a power of 2. For example, if num_nonces = 40 and batch_size = 8, the benchmark is split into 5 batches

 # Connecting Slaves

@ -64,11 +148,60 @@ See [docs.tig.foundation](https://docs.tig.foundation/benchmarking/benchmarker-c
    ```

 **Notes:**
-* If your master is on a different server to your slave, you need to add the option `--master <SERVER_IP>`
-* To set the number of workers (threads), use the option `--workers <NUM_WORKERS>`
-* To use a different port, use the option `--port <MASTER_PORT>`
+* If your master is on a different server, add `--master <SERVER_IP>`
+* Set a custom master port with `--port <MASTER_PORT>`
 * To see all options, use `--help` 

+## Slave Config
+
+You can control execution limits via a JSON config:
+
+```json
+{
+    "max_workers": 100,
+    "cpus": 8,
+    "gpus": 0,
+    "algorithms": [
+        {
+            "id_regex": ".*",
+            "cpu_cost": 1.0,
+            "gpu_cost": 0.0
+        }
+    ]
+}
+```
+
+**Explanation:**
+* `max_workers`: maximum concurrent tig-runtime processes.
+* `cpus` & `gpus`: total compute limits available to the slave.
+* `algorithms`: rules for matching algorithms based on `id_regex`.
+    * An algorithm can only be executed if it stays within the total `cpu_cost` and `gpu_cost` limits.
+    * Regex matches algorithm ids (e.g., `c004_a[\d3]` matches all vector_search algorithms).
+
+**Example:**
+
+This example limits c001/c002/c003 to 2 concurrent instances per CPU. It also limits c004/c005 to 4 concurrent instances per GPU:
+
+```json
+{
+    "max_workers": 10,
+    "cpus": 4,
+    "gpus": 2,
+    "algorithms": [
+        {
+            "id_regex": "c00[123].*",
+            "cpu_cost": 0.5,
+            "gpu_cost": 0.0
+        },
+        {
+            "id_regex": "c00[45].*",
+            "cpu_cost": 0.0,
+            "gpu_cost": 0.25
+        }
+    ]
+}
+```
+
 # Finding your API Key

 ## Mainnet
--- a/tig-benchmarker/master/master/client_manager.py
+++ b/tig-benchmarker/master/master/client_manager.py
@ -71,11 +71,6 @@ class ClientManager:
        async def update_config(request: Request):
            logger.debug("Received config update")
            new_config = await request.json()
-            for k, v in new_config["job_manager_config"]["default_batch_sizes"].items():
-                if v == 0:
-                    raise HTTPException(status_code=400, detail=f"Batch size for {k} cannot be 0")
-                if (v & (v - 1)) != 0:
-                    raise HTTPException(status_code=400, detail=f"Batch size for {k} must be a power of 2")
            for x in new_config["algo_selection"]:
                if x["batch_size"] == 0:
                    raise HTTPException(status_code=400, detail=f"Batch size for {x['algorithm_id']} cannot be 0")
--- a/tig-benchmarker/master/master/data_fetcher.py
+++ b/tig-benchmarker/master/master/data_fetcher.py
@ -49,7 +49,7 @@ class DataFetcher:
            algorithms_data, benchmarks_data, challenges_data = list(executor.map(_get, tasks))

        algorithms = {a["id"]: Algorithm.from_dict(a) for a in algorithms_data["algorithms"]}
-        wasms = {w["algorithm_id"]: Binary.from_dict(w) for w in algorithms_data["binarys"]}
+        binarys = {w["algorithm_id"]: Binary.from_dict(w) for w in algorithms_data["binarys"]}
        
        precommits = {b["benchmark_id"]: Precommit.from_dict(b) for b in benchmarks_data["precommits"]}
        benchmarks = {b["id"]: Benchmark.from_dict(b) for b in benchmarks_data["benchmarks"]}
@ -74,7 +74,7 @@ class DataFetcher:
        self._cache = {
            "block": block,
            "algorithms": algorithms,
-            "wasms": wasms,
+            "binarys": binarys,
            "precommits": precommits,
            "benchmarks": benchmarks,
            "proofs": proofs,
--- a/tig-benchmarker/master/master/job_manager.py
+++ b/tig-benchmarker/master/master/job_manager.py
@ -24,7 +24,7 @@ class JobManager:
        proofs: Dict[str, Proof],
        challenges: Dict[str, Challenge],
        algorithms: Dict[str, Algorithm],
-        wasms: Dict[str, Binary],
+        binarys: Dict[str, Binary],
        **kwargs
    ):
        api_url = CONFIG["api_url"]
@ -69,20 +69,20 @@ class JobManager:
                }
            hash_threshold = self.hash_thresholds[x.details.block_started][x.settings.challenge_id]

-            wasm = wasms.get(x.settings.algorithm_id, None)
-            if wasm is None:
-                logger.error(f"no wasm found for algorithm_id {x.settings.algorithm_id}")
+            bin = binarys.get(x.settings.algorithm_id, None)
+            if bin is None:
+                logger.error(f"batch {x.id}: no binary-blob found for {x.settings.algorithm_id}. skipping job")
                continue
-            if wasm.details.download_url is None:
-                logger.error(f"no download_url found for wasm {wasm.algorithm_id}")
+            if bin.details.download_url is None:
+                logger.error(f"batch {x.id}: no download_url found for {bin.algorithm_id}. skipping job")
                continue
            batch_size = next(
                (s["batch_size"] for s in algo_selection if s["algorithm_id"] == x.settings.algorithm_id),
                None
            )
            if batch_size is None:
-                batch_size = config["default_batch_size"][x.settings.challenge_id]
-                logger.error(f"No batch size found for algorithm_id {x.settings.algorithm_id}, using default {batch_size}")
+                logger.error(f"batch {x.id}: no batch size found for {x.settings.algorithm_id}. skipping job")
+                continue
            num_batches = math.ceil(x.details.num_nonces / batch_size)
            atomic_inserts = [
                (
@ -112,11 +112,11 @@ class JobManager:
                        x.details.num_nonces,
                        num_batches,
                        x.details.rand_hash,
-                        json.dumps(block.config["benchmarks"]["runtime_configs"]["wasm"]),
+                        json.dumps(block.config["benchmarks"]["runtime_config"]),
                        batch_size,
                        c_name,
                        a_name,
-                        wasm.details.download_url,
+                        bin.details.download_url,
                        x.details.block_started,
                        hash_threshold
                    )
--- a/tig-benchmarker/master/master/precommit_manager.py
+++ b/tig-benchmarker/master/master/precommit_manager.py
@ -11,25 +11,12 @@ from master.client_manager import CONFIG

 logger = logging.getLogger(os.path.splitext(os.path.basename(__file__))[0])

-@dataclass
-class AlgorithmSelectionConfig(FromDict):
-    algorithm: str
-    base_fee_limit: PreciseNumber
-    num_nonces: int
-    weight: float
-
-@dataclass
-class PrecommitManagerConfig(FromDict):
-    max_pending_benchmarks: int
-    algo_selection: Dict[str, AlgorithmSelectionConfig]
-
 class PrecommitManager:
    def __init__(self):
        self.last_block_id = None
        self.num_precommits_submitted = 0
        self.algorithm_name_2_id = {}
        self.challenge_name_2_id = {}
-        self.curr_base_fees = {}

    def on_new_block(
        self,
@ -43,11 +30,6 @@ class PrecommitManager:
    ):
        self.last_block_id = block.id
        self.num_precommits_submitted = 0
-        self.curr_base_fees = {
-            c.details.name: c.block_data.base_fee
-            for c in challenges.values()
-            if c.block_data is not None
-        }
        benchmark_stats_by_challenge = {
            c.details.name: {
                "solutions": 0,
@ -94,12 +76,11 @@ class PrecommitManager:
            """
        )["count"]

-        config = CONFIG["precommit_manager_config"]
        algo_selection = CONFIG["algo_selection"]

        num_pending_benchmarks = num_pending_jobs + self.num_precommits_submitted
-        if  num_pending_benchmarks >= config["max_pending_benchmarks"]:
-            logger.debug(f"number of pending benchmarks has reached max of {config['max_pending_benchmarks']}")
+        if  num_pending_benchmarks >= CONFIG["max_concurrent_benchmarks"]:
+            logger.debug(f"number of pending benchmarks has reached max of {CONFIG['max_concurrent_benchmarks']}")
            return
        logger.debug(f"Selecting algorithm from: {[(x['algorithm_id'], x['weight']) for x in algo_selection]}")
        selection = random.choices(algo_selection, weights=[x["weight"] for x in algo_selection])[0]
--- a/tig-benchmarker/master/master/slave_manager.py
+++ b/tig-benchmarker/master/master/slave_manager.py
@ -98,15 +98,13 @@ class SlaveManager:

        @app.route('/get-batches', methods=['GET'])
        def get_batch(request: Request):
-            config = CONFIG["slave_manager_config"]
-
            if (slave_name := request.headers.get('User-Agent', None)) is None:
                return "User-Agent header is required", 403
-            if not any(re.match(slave["name_regex"], slave_name) for slave in config["slaves"]):
+            if not any(re.match(slave["name_regex"], slave_name) for slave in CONFIG["slaves"]):
                logger.warning(f"slave {slave_name} does not match any regex. rejecting get-batch request")
                raise HTTPException(status_code=403, detail="Unregistered slave")

-            slave = next((slave for slave in config["slaves"] if re.match(slave["name_regex"], slave_name)), None)
+            slave = next((slave for slave in CONFIG["slaves"] if re.match(slave["name_regex"], slave_name)), None)

            concurrent = []
            updates = []
@ -129,7 +127,7 @@ class SlaveManager:
                    if (
                        b["slave"] is None or
                        b["start_time"] is None or
-                        (now - b["start_time"]) > config["time_before_batch_retry"]
+                        (now - b["start_time"]) > CONFIG["time_before_batch_retry"]
                    ):
                        b["slave"] = slave_name
                        b["start_time"] = now
@ -278,7 +276,6 @@ class SlaveManager:

            return {"status": "OK"}
            
-        config = CONFIG["slave_manager_config"]
        thread = Thread(target=lambda: uvicorn.run(app, host="0.0.0.0", port=5115))
        thread.daemon = True
        thread.start()
--- a/tig-benchmarker/master/master/submissions_manager.py
+++ b/tig-benchmarker/master/master/submissions_manager.py
@ -92,8 +92,6 @@ class SubmissionsManager:
            )

    def run(self, submit_precommit_req: Optional[SubmitPrecommitRequest]):
-        config = CONFIG["submissions_manager_config"]
-
        now = int(time.time() * 1000)
        if submit_precommit_req is None:
            logger.debug("no precommit to submit")
@ -128,7 +126,7 @@ class SubmissionsManager:
            INNER JOIN job_data B
                ON A.benchmark_id = B.benchmark_id
            """,
-            (config["time_between_retries"],)
+            (CONFIG["time_between_resubmissions"],)
        )

        if benchmark_to_submit:
@ -171,7 +169,7 @@ class SubmissionsManager:
            INNER JOIN job_data B
                ON A.benchmark_id = B.benchmark_id
            """,
-            (config["time_between_retries"],)
+            (CONFIG["time_between_resubmissions"],)
        )

        if proof_to_submit:
--- a/tig-benchmarker/postgres/init.sql
+++ b/tig-benchmarker/postgres/init.sql
@ -106,21 +106,8 @@ SELECT '
  "player_id": "0x0000000000000000000000000000000000000000",
  "api_key": "00000000000000000000000000000000",
  "api_url": "https://mainnet-api.tig.foundation",
-  "submissions_manager_config": {
-    "time_between_retries": 60000
-  },
-  "job_manager_config": {
-    "default_batch_sizes": {
-      "c001": 8,
-      "c002": 8,
-      "c003": 8,
-      "c004": 8,
-      "c005": 8
-    }
-  },
-  "precommit_manager_config": {
-    "max_pending_benchmarks": 4
-  },
+  "time_between_resubmissions": 60000,
+  "max_concurrent_benchmarks": 4,
  "algo_selection": [
    {
      "algorithm_id": "c001_a001",
@ -128,8 +115,7 @@ SELECT '
      "difficulty_range": [0, 0.5],
      "selected_difficulties": [],
      "weight": 1,
-      "batch_size": 8,
-      "base_fee_limit": "10000000000000000"
+      "batch_size": 8
    },
    {
      "algorithm_id": "c002_a001",
@ -137,8 +123,7 @@ SELECT '
      "difficulty_range": [0, 0.5],
      "selected_difficulties": [],
      "weight": 1,
-      "batch_size": 8,
-      "base_fee_limit": "10000000000000000"
+      "batch_size": 8
    },
    {
      "algorithm_id": "c003_a001",
@ -146,8 +131,7 @@ SELECT '
      "difficulty_range": [0, 0.5],
      "selected_difficulties": [],
      "weight": 1,
-      "batch_size": 8,
-      "base_fee_limit": "10000000000000000"
+      "batch_size": 8
    },
    {
      "algorithm_id": "c004_a001",
@ -155,8 +139,7 @@ SELECT '
      "difficulty_range": [0, 0.5],
      "selected_difficulties": [],
      "weight": 1,
-      "batch_size": 8,
-      "base_fee_limit": "10000000000000000"
+      "batch_size": 8
    },
    {
      "algorithm_id": "c005_a001",
@ -164,19 +147,16 @@ SELECT '
      "difficulty_range": [0, 0.5],
      "selected_difficulties": [],
      "weight": 1,
-      "batch_size": 8,
-      "base_fee_limit": "10000000000000000"
+      "batch_size": 8
    }
  ],
-  "slave_manager_config": {
-    "time_before_batch_retry": 60000,
-    "slaves": [
-      {
-        "name_regex": ".*",
-        "algorithm_id_regex": ".*",
-        "max_concurrent_batches": 1
-      }
-    ]
-  }
+  "time_before_batch_retry": 60000,
+  "slaves": [
+    {
+      "name_regex": ".*",
+      "algorithm_id_regex": ".*",
+      "max_concurrent_batches": 1
+    }
+  ]
 }'
 WHERE NOT EXISTS (SELECT 1 FROM config);
--- a/tig-structs/src/config.rs
+++ b/tig-structs/src/config.rs
@ -67,7 +67,7 @@ serializable_struct_with_getters! {
        lifespan_period: u32,
        min_per_nonce_fee: PreciseNumber,
        min_base_fee: PreciseNumber,
-        runtime_configs: HashMap<String, RuntimeConfig>,
+        runtime_config: RuntimeConfig,
        target_solution_rate: u32,
        hash_threshold_max_percent_delta: f64,
    }