Skip to main content

Chain Synchronization

This page documents the chain interaction layer — the classes and patterns that coordinate miners and validators with the Bittensor blockchain across the 45-block training cycle.

WorkerChainCommit

WorkerChainCommit is the primary class for miner chain interaction. It manages all on-chain writes: posting commit hashes, reading peers' commits, and submitting final weights.

Key Methods

commit_hash(hash_bytes: bytes, block: int) → tx_hash

Posts the sha256 hash of the miner's upcoming submission to the blockchain during the commit phase. This locks in the submission before the weights are revealed, preventing front-running.

hash_bytes = hashlib.sha256(checkpoint_bytes).digest()
tx = chain_commit.commit_hash(hash_bytes, current_block)

The hash is stored on-chain as a metadata commitment. The validator reads it via get_chain_commits() before accepting any submission — if the submitted checkpoint doesn't match the committed hash, the submission is rejected.

get_chain_commits(netuid: int, block: int) → Dict[str, bytes]

Reads all committed hashes for the current cycle from chain state. Returns a mapping of {hotkey: hash_bytes}. Validators call this to know which miners have committed and what their expected checkpoint hashes are.

commits = chain_commit.get_chain_commits(netuid=42, block=current_block)
# {'5FHne...': b'\xab\xcd...', '5DAna...': b'\xef\x01...'}

submit_weights(uids: List[int], weights: List[float]) → tx_hash

Calls Bittensor's set_weights() to post the final miner weight vector to the blockchain. Only validators call this. The weight vector determines TAO emission distribution among miners.

chain_commit.submit_weights(
uids=[12, 34, 56],
weights=[0.45, 0.30, 0.25],
)

The Global WebSocket Lock

All chain interactions use a single WebSocket connection to the Bittensor subtensor. Concurrent writes to this connection from multiple threads or processes cause message corruption and connection resets.

BlockZero enforces a global WebSocket lock — a threading lock (or asyncio lock in async contexts) that serializes all chain writes:

with WEBSOCKET_LOCK:
result = subtensor.commit(wallet, netuid, data)

The lock is process-wide, not thread-wide. If you run multiple miners from the same process, they share the lock. If you run separate processes (one per miner), each process has its own lock and its own WebSocket connection — this is the recommended approach.

Why This Matters

Without the lock:

  • Two threads attempt to write to the WebSocket simultaneously
  • The messages are interleaved at the byte level
  • The subtensor server receives a malformed message and closes the connection
  • Both writes fail

With the lock:

  • One write completes fully before the next begins
  • Slightly slower but always correct

Subtensor Connection Pooling

BlockZero maintains a small pool of subtensor connections to avoid the overhead of repeated WebSocket handshakes:

# Pool size: typically 2-3 connections
connection_pool = SubtensorConnectionPool(
endpoint=chain_endpoint,
pool_size=3,
)

Before each operation, the pool provides a healthy connection:

  1. If a connection in the pool is idle and healthy, it is returned
  2. If all connections are busy, a new connection is opened (up to pool_size)
  3. Dead connections (ping timeout, WebSocket closed) are replaced automatically

Connection health is checked via a lightweight subtensor.block call. If the call fails or times out, the connection is marked dead and a fresh connection is opened.

Block-Phase Timing

Chain operations are phase-aware. The current block number determines which phase the cycle is in:

def get_current_phase(block: int, cycle_start: int) -> Phase:
offset = (block - cycle_start) % CYCLE_LENGTH # 45 blocks
if offset < 5:
return Phase.DISTRIBUTE
elif offset < 35:
return Phase.TRAIN
elif offset < 40:
return Phase.COMMIT
else:
return Phase.SUBMIT

model_io.py polls the chain block number periodically and triggers the appropriate thread (download, commit, submit) when the phase transitions.

Error Handling and Retries

Chain operations can fail due to network issues, chain congestion, or temporary validator downtime. All chain writes include automatic retry logic:

for attempt in range(MAX_RETRIES):
try:
result = chain_commit.commit_hash(hash_bytes, block)
break
except (WebSocketException, TimeoutError) as e:
wait = RETRY_BASE_DELAY * (2 ** attempt) # exponential backoff
logger.warning(f"Chain op failed (attempt {attempt+1}/{MAX_RETRIES}): {e}. Retrying in {wait}s")
time.sleep(wait)

Maximum retry time is bounded to ensure the operation completes within its phase window.

Don't miss the commit window

The commit phase lasts only ~5 blocks (~60 seconds). If retries exhaust this window, the miner misses the cycle entirely. Ensure your chain connection is reliable before starting.