Clarify better_than_baseline in challenge descriptions, and add extra precision for knapsack.

2026-03-10 03:07:21 +08:00 · 2025-04-25 18:05:22 +01:00 · 2025-04-25 18:05:22 +01:00 · dbb6927ffa
commit dbb6927ffa
parent f4c233aeb4
3 changed files with 11 additions and 7 deletions
--- a/docs/challenges/knapsack.md
+++ b/docs/challenges/knapsack.md
@ -54,12 +54,14 @@ When evaluating this selection, we can confirm that the total weight is less tha

 This selection is 27% better than the baseline: 
 ```
-better_than_baseline = (total_value - baseline_value) / baseline_value 
-                     = (127 - 100) / 100 
+better_than_baseline = total_value / baseline_value - 1
+                     = 127 / 100 - 1
                     = 0.27
 ```

 # Our Challenge 
 In TIG, the baseline value is determined by a two-stage approach. First, items are selected based on their value-to-weight ratio, including interaction values, until the capacity is reached. Then, a tabu-based local search refines the solution by swapping items to improve value while avoiding reversals, with early termination for unpromising swaps.

-Each instance of TIG's knapsack problem contains 16 random sub-instances with their own baseline selection & baseline value. For each sub-instance, the total value of your selection is used to calculate a `better_than_baseline`. Your "average" `better_than_baseline` over the sub-instances must be greater than or equal to the specified difficulty `better_than_baseline`, where the average uses root mean square. Please see the challenge code for a precise specification.
+Each instance of TIG's knapsack problem contains 16 random sub-instances, each with its own baseline selection and baseline value. For each sub-instance, we calculate how much your selection's total value exceeds the baseline value, expressed as a percentage improvement. This improvement percentage is called `better_than_baseline`. Your overall performance is measured by taking the root mean square of these 16 `better_than_baseline` percentages. To pass a difficulty level, this overall score must meet or exceed the specified difficulty target.
+
+For precision, `better_than_baseline` is stored as an integer where each unit represents 0.01%. For example, a `better_than_baseline` value of 150 corresponds to 150/10000 = 1.5%.
--- a/docs/challenges/vehicle_routing.md
+++ b/docs/challenges/vehicle_routing.md
@ -75,15 +75,17 @@ When evaluating these routes, each route has demand less than 200, the number of

 These routes are 20.6% better than the baseline: 
 ```
-better_than_baseline = (baseline_total_distance - total_distance) / baseline_total_distance 
-                     = (3875 - 3074) / 3875 
+better_than_baseline = 1 - total_distance / baseline_total_distance 
+                     = 1 - 3074 / 3875 
                     = 0.206
 ```

 ## Our Challenge
 In TIG, the baseline route is determined by using Solomon's I1 insertion heuristic that iteratively inserts customers into routes based on a cost function that balances distance and time constraints. The routes are built one by one until all customers are served. 

-Each instance of TIG's vehicle routing problem contains 16 random sub-instances with their own baseline routes & baseline distance. For each sub-instance, the total distance of your routes is used to calculate a `better_than_baseline`. Your "average" `better_than_baseline` over the sub-instances must be greater than the specified difficulty `better_than_baseline`, where the average uses root mean square. Please see the challenge code for a precise specification.
+Each instance of TIG's vehicle routing problem contains 16 random sub-instances, each with its own baseline routes and baseline distance. For each sub-instance, we calculate how much your routes' total distance is shorter than the baseline distance, expressed as a percentage improvement. This improvement percentage is called `better_than_baseline`. Your overall performance is measured by taking the root mean square of these 16 `better_than_baseline` percentages. To pass a difficulty level, this overall score must meet or exceed the specified difficulty target.
+
+For precision, `better_than_baseline` is stored as an integer where each unit represents 0.1%. For example, a `better_than_baseline` value of 22 corresponds to 22/1000 = 2.2%.

 ## Applications
 * **Logistics & Delivery Services:** Optimizes parcel and ship routing by ensuring vehicles meet customer and operational time constraints, reducing operational costs and environmental impact [^1].
--- a/tig-challenges/src/knapsack.rs
+++ b/tig-challenges/src/knapsack.rs
@ -121,7 +121,7 @@ impl crate::ChallengeTrait<Solution, Difficulty, 2> for Challenge {
            / better_than_baselines.len() as f64)
            .sqrt()
            - 1.0;
-        let threshold = self.difficulty.better_than_baseline as f64 / 1000.0;
+        let threshold = self.difficulty.better_than_baseline as f64 / 10000.0;
        if average >= threshold {
            Ok(())
        } else {