Running Your Validator
Like miners, validators consist of two cooperating processes. Both must be running for the validator to function.
Starting Your Validator
The validator requires two processes running simultaneously. Use two terminals (or tmux/screen):
Terminal A — Constant Evaluation:
python mycelia/validator/run.py \
--path /path/to/checkpoints/validator/<your hotkey>/<run name>/
Terminal B — Model Serving:
python3 mycelia/shared/server.py \
--path /path/to/checkpoints/validator/<your hotkey>/<run name>/
Both processes must point at the same config directory (the --path argument). When --path is not provided, the default config from mycelia/config.py is used.
Keep both processes pointed at the same config directory. Use separate directories per validator/hotkey to keep artifacts clean.
Two-Process Architecture
Figure: The validator two-process architecture. server.py handles model distribution and checkpoint ingestion; run.py handles evaluation, synchronization, and chain interaction.
Process 1: run.py — Constant Evaluation
run.py is the evaluation and synchronization process. Each cycle it:
- Discovers miners ready for evaluation by polling on-chain state
- Fetches miner submissions and resolves model metadata (version/hash, hotkey, UID)
- Downloads or loads submitted checkpoints (with resume/retry for partial downloads)
- Runs Proof-of-Loss evaluation on each submission
- Aggregates scores per UID/hotkey and resets history on hotkey changes
- Synchronizes with peer validators via inter-validator merging
- Publishes scores to the chain via
set_weights()
Process 2: server.py — Model Distribution Server
server.py is the FastAPI HTTP server. It:
- Serves expert group slices to miners (
GET /model/partial) during the distribute phase - Accepts checkpoint uploads from miners (
POST /submit) during the submit phase - Serves the full model to peer validators (
GET /model/full) - Handles bearer token authentication for all endpoints
What to Watch in Logs
server.py logs
[INFO] Loading base model: Qwen/Qwen3-VL-30B-Instruct
[INFO] Model loaded. Expert mapping built: 4 groups, 64 experts/layer
[INFO] Server ready on port 8080
[INFO] GET /model/partial | hotkey: 5FHne... | group: 0 | 7.2GB served in 12.3s
[INFO] POST /submit | hotkey: 5FHne... | group: 0 | block: 12345 | checkpoint accepted
run.py logs
[INFO] Current block: 12345 | Phase: TRAIN
[INFO] Found 12 committed hashes for this cycle
[INFO] Submit phase opened at block 12379
[INFO] Downloading checkpoint: 5FHne... group 0
[INFO] Evaluating 5FHne... | loss_before: 4.321 | loss_after: 3.512 | w_i: 0.187
[INFO] Evaluated 12/12 submissions in 8.3 minutes
[INFO] Inter-validator merging complete: 3/3 validators synced
[INFO] set_weights() called at block 12389 | tx_hash: 0xabcd...
The Cycle from Your Perspective
Blocks 0–5 (Distribute): server.py serves partial models to miners making GET /model/partial requests.
Blocks 5–35 (Train): server.py is idle for new downloads. run.py monitors chain for committed hashes.
Blocks 35–40 (Commit): run.py reads all committed hashes from chain via get_chain_commits().
Blocks 40–45 (Submit/Evaluate):
server.pyaccepts checkpoint submissions viaPOST /submitrun.pydownloads and evaluates each submissionrun.pysynchronizes gradients with peer validators via inter-validator mergingrun.pycallsset_weights()with the agreed-upon weight vector
Configuration File
model_path: ./models/Qwen3-VL-30B-Instruct
chain_endpoint: wss://entrypoint-finney.opentensor.ai:443
netuid: 42
wallet:
name: my-wallet
hotkey: my-validator-hotkey
checkpoint_cache_dir: ./validator-cache
server_port: 8080
server_host: 0.0.0.0
# auth_token: set via BZ_AUTH_TOKEN environment variable
eval_dataset_path: /data/validation/held-out.jsonl
eval_batch_size: 8
score_ema_alpha: 0.9
consensus_timeout: 60
max_checkpoints_cached: 50
Monitoring
Validator metrics to watch:
eval/submissions_per_cycle— how many miners are submittingeval/avg_w_i— average quality score across submissionschain/set_weights_success— confirmation that weight setting succeededconsensus/agreement_rate— fraction of cycles with full validator agreement
For a production validator, run behind nginx with TLS termination, use systemd for process management, and set up alerting for chain/set_weights_success = false (indicates a synchronization or chain issue).