Inter-Validator Merging
Multiple validators can run simultaneously on the Connito subnet. Inter-validator merging ensures that active validators converge on the same global model state. The system operates on two parallel tracks: it synchronizes model gradients among peer validators to maintain consensus on the foundation model, and it independently submits miner evaluation scores to the blockchain.
Why Merging Is Needed
Different validators may:
- Evaluate slightly different subsets of miners (due to network drops or checkpoint download failures)
- Have different timing relative to the submit phase
- Extract different active miner sub-groups depending on their local cutoff logic
Without merging, different validators would apply different gradient updates, causing the global model to fracture. By synchronizing explicitly, the validators stay aligned.
The Merging Protocol
1. Shared Random Seed
Before each evaluation cycle, validators use get_combined_validator_seed() to agree on a shared random seed:
combinded_seed = get_combined_validator_seed(config, subtensor)This seed is derived deterministically from chain state. It is used to instantiate the validation dataloader — ensuring all validators evaluate miner checkpoints on exactly the same data sequence to produce objective, uniform loss measurements for honest miners.
2. Local Eval & Aggregation
Each validator independently iterates over its downloaded miner checkpoints. It runs a forward pass and logs the validation loss. The validator then determines the top-performing miners (filtered by config.run.miner_score_cutoff) and explicitly loads their state dicts to calculate a local gradient update:
# Averages gradients from top-performing local miner checkpoints
await aggregate_miner_gradient_change(..., miner_jobs, score_aggregator)3. Gradient Synchronization (Hivemind DHT)
After local gradients are aggregated onto the global_model, validators connect to peer validators using a Decentralized Hash Table (DHT). They utilize hivemind.averaging.DecentralizedAverager to synchronize gradients across the network:
# Matchmakes with peers and performs a decentralized all-reduce
avg_step = avg.step(timeout=config.run.averager_step_timeout_sec)During this step, the validator packs its gradients into a buffer, finds a matchmaking group of peer validators evaluating the same expert group, and runs a decentralized all-reduce.
4. Merging Timeout & Fallback
The averager allows for a set number of attempts (averager_step_max_retries, default: 2) lasting averager_step_timeout_sec (default: 60s) each.
If the averager times out or fails to find a peer group, the validator simply catches the TimeoutError and logs the failure. It does not abort the entire loop. Instead, it proceeds to run the outer optimizer step using only its local aggregated gradients. This ensures progress is never permanently halted by offline peers.
5. Calling set_weights()
Once the global model optimization step is complete, the validator submits its evaluated miner scores to the blockchain.
Note: Validators do not submit the Hivemind-agreed weights to the chain. They submit the raw scores calculated during Step 2.
uid_weights = score_aggregator.uid_score_pairs(how="avg")
submit_weights(..., uid_weights=uid_weights, normalize=True, top_k=weight_submit_top_k)Bittensor's Yuma Consensus algorithm aggregates these submitted scores based on the validator's stake, resolving final miner emissions.
flowchart LR
%% Main components
subgraph DHT["Hivemind DHT"]
Match["Matchmaking"]
end
subgraph Validators["Active Validators"]
direction TB
V1["Validator 1"]
V2["Validator 2"]
V3["Validator 3"]
end
subgraph Sync["Gradient Synchronization"]
Averager["Decentralized Averager"]
end
%% Flow of operations
V1 -->|1. Searches for peers| Match
V2 -->|1. Searches for peers| Match
V3 -->|1. Searches for peers| Match
Match -->|2. Groups compatible peers| Averager
Averager -->|3. All-Reduce Gradients| V1
Averager -->|3. All-Reduce Gradients| V2
Averager -->|3. All-Reduce Gradients| V3
%% Note for clarity
classDef highlight fill:#f9f,stroke:#333,stroke-width:2px;
class Averager highlight;Figure: Inter-validator merging flow. Validators evaluate checkpoints to calculate local gradients and scores. Gradients are synchronized via the Hivemind DHT using an all-reduce averager to step the model, while evaluation scores are sent directly to the Bittensor chain.
The inter-validator communication is entirely distinct from miner-validator communication. Miners never participate in the Hivemind DHT — they only communicate via the validator's HTTP FastAPI server.