Running Your Validator
There are two ways to run the validator. Choose one:
- Bare-metal — run the processes directly on the host. Good for development and debugging.
- Docker — run as a container with automatic updates via Watchtower. Recommended for production.
Starting Your Validator (Bare-metal)
Like miners, the bare-metal validator consists of two cooperating processes. Both must be running for the validator to function.
The validator requires two processes running simultaneously. Use two terminals (or tmux/screen):
Terminal A — Constant Evaluation:
python connito/validator/run.py \
--path /path/to/checkpoints/validator/<your hotkey>/<run name>/Terminal B — Model Serving:
python connito/validator/server.py \
--path /path/to/checkpoints/validator/<your hotkey>/<run name>/Both processes must point at the same config directory (the --path argument). When --path is not provided, the default config from connito/config.py is used.
Keep both processes pointed at the same config directory. Use separate directories per validator/hotkey to keep artifacts clean.
Running with Docker
If you followed the Docker setup, you run the validator as a container with Watchtower handling automatic updates. This replaces the two-terminal bare-metal approach above.
Starting
From connito/validator/docker/:
docker compose up -dThis starts three services:
validator— constant evaluation process (equivalent torun.py).server— model distribution server (equivalent toserver.py).watchtower— polls the container registry every 5 minutes and auto-restartsvalidatorandserverwhen a new release is tagged.
Checking Logs
docker compose logs -f validator # evaluation / chain interaction
docker compose logs -f server # model serving / miner submissions
docker compose logs -f watchtower # upgrade decisionsOn a healthy startup you should see:
Docker detected — paths remapped— container path overrides are active.Wrote config— config loaded successfully.- Dataset downloads from Hugging Face (first run only — cached on subsequent runs).
(0) Commit new seed for next validation— the validator has entered its first cycle.
Common Startup Errors
| Error | Fix |
|---|---|
DatasetNotFoundError: must be authenticated | HF_TOKEN is missing in .env, or you haven't accepted the gated dataset license on Hugging Face. See Hugging Face Setup. |
FileNotFoundError: ... expert_groups/ | expert_groups/ directory not present at the repo root, or DATA_DIR in .env doesn't point at the repo root. |
wallet not found | WALLET_NAME / HOTKEY_NAME in .env don't match the wallet under BITTENSOR_WALLET_PATH. |
permission denied ... docker.sock | Run newgrp docker or log out and back in. See Docker setup. |
Monitoring Health
Check that all three services are running:
docker compose psAll services should show Up (or Up (healthy) once healthchecks pass).
To watch for auto-upgrades and any Watchtower errors:
docker compose logs -f watchtowerOn each poll you should see either Found new image (upgrade happening)
or a quiet cycle (no update available). If you see repeated errors, check
that the image is reachable (docker pull ghcr.io/connito-ai/connito-validator:stable).
Stopping
docker compose down # stops validator, server, and watchtowerAuto-Update Flow
Maintainer tags vX.Y.Z
→ GitHub Actions builds and pushes :stable to GHCR
→ Watchtower polls, detects new digest
→ Pulls new image, stops old container, starts new one
→ Validator resumes from latest checkpointNo manual intervention needed. The entire upgrade happens within
WATCHTOWER_POLL_INTERVAL seconds (default 5 minutes) of the release
being published.
Two-Process Architecture
flowchart LR
M(Miners)
BT("Bittensor Chain")
Peer("Peer Validators")
subgraph ValidatorNode ["Validator Node"]
direction TB
Server["server.py<br/>Model Distribution Server"]
FS[("Shared File System<br/>Checkpoints Directory")]
Run["run.py<br/>Constant Evaluation"]
Server -- "Saves miner submissions" --> FS
FS -- "Loads checkpoints for Proof-of-Loss" --> Run
Run -- "Saves agreed-upon global model" --> FS
FS -- "Loads models for distribution" --> Server
end
M -- "GET /model/partial<br/>POST /submit" --> Server
Run -- "Poll state<br/>get_chain_commits()" --> BT
Run -- "set_weights()" --> BT
Run <-->|"Inter-validator merging"| PeerFigure: The validator two-process architecture. server.py handles model distribution and checkpoint ingestion; run.py handles evaluation, synchronization, and chain interaction.
Process 1: run.py — Constant Evaluation
run.py is the evaluation and synchronization process. Each cycle it:
- Discovers miners ready for evaluation by polling on-chain state
- Fetches miner submissions and resolves model metadata (version/hash, hotkey, UID)
- Downloads or loads submitted checkpoints (with resume/retry for partial downloads)
- Runs Proof-of-Loss evaluation on each submission
- Aggregates scores per UID/hotkey and resets history on hotkey changes
- Synchronizes with peer validators via inter-validator merging
- Publishes scores to the chain via
set_weights()
Process 2: server.py — Model Distribution Server
server.py is the FastAPI HTTP server. It:
- Serves expert group slices to miners (
GET /model/get-checkpoint) during the distribute phase - Accepts checkpoint uploads from miners (
POST /submit-checkpoint) during the submit phase - Handles bearer token authentication for all endpoints
The Cycle from Your Perspective
Blocks 0–20 (Distribute): server.py serves partial models to miners making GET /model/partial requests.
Blocks 20–320 (Train): server.py is idle for new downloads. run.py monitors chain for committed hashes.
Blocks 320-336 (MinerCommit1 & 2): 16 blocks (8 + 8) dedicated for miners to commit model hashes and validators to commit a seed. run.py reads all committed hashes from chain via get320–328_chain_commits().
Blocks 336–456 (Submission/Validate/Merge):
server.pyaccepts checkpoint submissions viaPOST /submit(Submission Phase, 20 blocks)run.pydownloads and evaluates each submission (Validate Phase, 50 blocks)run.pysynchronizes gradients with peer validators via inter-validator merging (Merge Phase, 50 blocks)
Blocks 456–472 (ValidatorCommit1 & 2):
- 16 blocks (8 + 8) for validators to commit
signed_model_hashandmodel_hash. run.pycallsset_weights()with the agreed-upon weight vector.
Configuration
See Configuration for the full field reference and example YAML.