Skip to main content

Running Your Validator

Like miners, validators consist of two cooperating processes. Both must be running for the validator to function.

Starting Your Validator

The validator requires two processes running simultaneously. Use two terminals (or tmux/screen):

Terminal A — Constant Evaluation:

python mycelia/validator/run.py \
--path /path/to/checkpoints/validator/<your hotkey>/<run name>/

Terminal B — Model Serving:

python3 mycelia/shared/server.py \
--path /path/to/checkpoints/validator/<your hotkey>/<run name>/

Both processes must point at the same config directory (the --path argument). When --path is not provided, the default config from mycelia/config.py is used.

tip

Keep both processes pointed at the same config directory. Use separate directories per validator/hotkey to keep artifacts clean.

Two-Process Architecture

Figure: validator-two-process-architecture Figure: The validator two-process architecture. server.py handles model distribution and checkpoint ingestion; run.py handles evaluation, synchronization, and chain interaction.

Process 1: run.py — Constant Evaluation

run.py is the evaluation and synchronization process. Each cycle it:

  1. Discovers miners ready for evaluation by polling on-chain state
  2. Fetches miner submissions and resolves model metadata (version/hash, hotkey, UID)
  3. Downloads or loads submitted checkpoints (with resume/retry for partial downloads)
  4. Runs Proof-of-Loss evaluation on each submission
  5. Aggregates scores per UID/hotkey and resets history on hotkey changes
  6. Synchronizes with peer validators via inter-validator merging
  7. Publishes scores to the chain via set_weights()

Process 2: server.py — Model Distribution Server

server.py is the FastAPI HTTP server. It:

  • Serves expert group slices to miners (GET /model/partial) during the distribute phase
  • Accepts checkpoint uploads from miners (POST /submit) during the submit phase
  • Serves the full model to peer validators (GET /model/full)
  • Handles bearer token authentication for all endpoints

What to Watch in Logs

server.py logs

[INFO] Loading base model: Qwen/Qwen3-VL-30B-Instruct
[INFO] Model loaded. Expert mapping built: 4 groups, 64 experts/layer
[INFO] Server ready on port 8080
[INFO] GET /model/partial | hotkey: 5FHne... | group: 0 | 7.2GB served in 12.3s
[INFO] POST /submit | hotkey: 5FHne... | group: 0 | block: 12345 | checkpoint accepted

run.py logs

[INFO] Current block: 12345 | Phase: TRAIN
[INFO] Found 12 committed hashes for this cycle
[INFO] Submit phase opened at block 12379
[INFO] Downloading checkpoint: 5FHne... group 0
[INFO] Evaluating 5FHne... | loss_before: 4.321 | loss_after: 3.512 | w_i: 0.187
[INFO] Evaluated 12/12 submissions in 8.3 minutes
[INFO] Inter-validator merging complete: 3/3 validators synced
[INFO] set_weights() called at block 12389 | tx_hash: 0xabcd...

The Cycle from Your Perspective

Blocks 0–5 (Distribute): server.py serves partial models to miners making GET /model/partial requests.

Blocks 5–35 (Train): server.py is idle for new downloads. run.py monitors chain for committed hashes.

Blocks 35–40 (Commit): run.py reads all committed hashes from chain via get_chain_commits().

Blocks 40–45 (Submit/Evaluate):

  1. server.py accepts checkpoint submissions via POST /submit
  2. run.py downloads and evaluates each submission
  3. run.py synchronizes gradients with peer validators via inter-validator merging
  4. run.py calls set_weights() with the agreed-upon weight vector

Configuration File

model_path: ./models/Qwen3-VL-30B-Instruct
chain_endpoint: wss://entrypoint-finney.opentensor.ai:443
netuid: 42
wallet:
name: my-wallet
hotkey: my-validator-hotkey
checkpoint_cache_dir: ./validator-cache
server_port: 8080
server_host: 0.0.0.0
# auth_token: set via BZ_AUTH_TOKEN environment variable
eval_dataset_path: /data/validation/held-out.jsonl
eval_batch_size: 8
score_ema_alpha: 0.9
consensus_timeout: 60
max_checkpoints_cached: 50

Monitoring

Validator metrics to watch:

  • eval/submissions_per_cycle — how many miners are submitting
  • eval/avg_w_i — average quality score across submissions
  • chain/set_weights_success — confirmation that weight setting succeeded
  • consensus/agreement_rate — fraction of cycles with full validator agreement
Production deployment

For a production validator, run behind nginx with TLS termination, use systemd for process management, and set up alerting for chain/set_weights_success = false (indicates a synchronization or chain issue).