Running Your Validator

There are two ways to run the validator. Choose one:

Bare-metal — run the processes directly on the host. Good for development and debugging.
Docker — run as a container with automatic updates via Watchtower. Recommended for production.

Starting Your Validator (Bare-metal)

Like miners, the bare-metal validator consists of two cooperating processes. Both must be running for the validator to function.

The validator requires two processes running simultaneously. Use two terminals (or tmux/screen):

Terminal A — Constant Evaluation:

python connito/validator/run.py \
  --path /path/to/checkpoints/validator/<your hotkey>/<run name>/

Terminal B — Model Serving:

python connito/validator/server.py \
  --path /path/to/checkpoints/validator/<your hotkey>/<run name>/

Both processes must point at the same config directory (the --path argument). When --path is not provided, the default config from connito/config.py is used.

Keep both processes pointed at the same config directory. Use separate directories per validator/hotkey to keep artifacts clean.

Running with Docker

If you followed the Docker setup, you run the validator as a container with Watchtower handling automatic updates. This replaces the two-terminal bare-metal approach above.

Starting

From connito/validator/docker/:

docker compose up -d

This starts three services:

validator — constant evaluation process (equivalent to run.py).
server — model distribution server (equivalent to server.py).
watchtower — polls the container registry every 5 minutes and auto-restarts validator and server when a new release is tagged.

Checking Logs

docker compose logs -f validator   # evaluation / chain interaction
docker compose logs -f server      # model serving / miner submissions
docker compose logs -f watchtower  # upgrade decisions

On a healthy startup you should see:

Docker detected — paths remapped — container path overrides are active.
Wrote config — config loaded successfully.
Dataset downloads from Hugging Face (first run only — cached on subsequent runs).
(0) Commit new seed for next validation — the validator has entered its first cycle.

Common Startup Errors

Error	Fix
`DatasetNotFoundError: must be authenticated`	`HF_TOKEN` is missing in `.env`, or you haven't accepted the gated dataset license on Hugging Face. See Hugging Face Setup.
`FileNotFoundError: ... expert_groups/`	`expert_groups/` directory not present at the repo root, or `DATA_DIR` in `.env` doesn't point at the repo root.
`wallet not found`	`WALLET_NAME` / `HOTKEY_NAME` in `.env` don't match the wallet under `BITTENSOR_WALLET_PATH`.
`permission denied ... docker.sock`	Run `newgrp docker` or log out and back in. See Docker setup.

Monitoring Health

Check that all three services are running:

docker compose ps

All services should show Up (or Up (healthy) once healthchecks pass). To watch for auto-upgrades and any Watchtower errors:

docker compose logs -f watchtower

On each poll you should see either Found new image (upgrade happening) or a quiet cycle (no update available). If you see repeated errors, check that the image is reachable (docker pull ghcr.io/connito-ai/connito-validator:stable).

Stopping

docker compose down    # stops validator, server, and watchtower

Auto-Update Flow

Maintainer tags vX.Y.Z
  → GitHub Actions builds and pushes :stable to GHCR
  → Watchtower polls, detects new digest
  → Pulls new image, stops old container, starts new one
  → Validator resumes from latest checkpoint

No manual intervention needed. The entire upgrade happens within WATCHTOWER_POLL_INTERVAL seconds (default 5 minutes) of the release being published.

Two-Process Architecture

flowchart LR
    M(Miners)
    BT("Bittensor Chain")
    Peer("Peer Validators")

    subgraph ValidatorNode ["Validator Node"]
        direction TB
        Server["server.py<br/>Model Distribution Server"]
        FS[("Shared File System<br/>Checkpoints Directory")]
        Run["run.py<br/>Constant Evaluation"]

        Server -- "Saves miner submissions" --> FS
        FS -- "Loads checkpoints for Proof-of-Loss" --> Run
        Run -- "Saves agreed-upon global model" --> FS
        FS -- "Loads models for distribution" --> Server
    end

    M -- "GET /model/partial<br/>POST /submit" --> Server
    Run -- "Poll state<br/>get_chain_commits()" --> BT
    Run -- "set_weights()" --> BT
    Run <-->|"Inter-validator merging"| Peer

Figure: The validator two-process architecture. server.py handles model distribution and checkpoint ingestion; run.py handles evaluation, synchronization, and chain interaction.

Process 1: run.py — Constant Evaluation

run.py is the evaluation and synchronization process. Each cycle it:

Discovers miners ready for evaluation by polling on-chain state
Fetches miner submissions and resolves model metadata (version/hash, hotkey, UID)
Downloads or loads submitted checkpoints (with resume/retry for partial downloads)
Runs Proof-of-Loss evaluation on each submission
Aggregates scores per UID/hotkey and resets history on hotkey changes
Synchronizes with peer validators via inter-validator merging
Publishes scores to the chain via set_weights()

Process 2: server.py — Model Distribution Server

server.py is the FastAPI HTTP server. It:

Serves expert group slices to miners (GET /model/get-checkpoint) during the distribute phase
Accepts checkpoint uploads from miners (POST /submit-checkpoint) during the submit phase
Handles bearer token authentication for all endpoints

The Cycle from Your Perspective

Blocks 0–20 (Distribute): server.py serves partial models to miners making GET /model/partial requests.

Blocks 20–320 (Train): server.py is idle for new downloads. run.py monitors chain for committed hashes.

Blocks 320-336 (MinerCommit1 & 2): 16 blocks (8 + 8) dedicated for miners to commit model hashes and validators to commit a seed. run.py reads all committed hashes from chain via get320–328_chain_commits().

Blocks 336–456 (Submission/Validate/Merge):

server.py accepts checkpoint submissions via POST /submit (Submission Phase, 20 blocks)
run.py downloads and evaluates each submission (Validate Phase, 50 blocks)
run.py synchronizes gradients with peer validators via inter-validator merging (Merge Phase, 50 blocks)

Blocks 456–472 (ValidatorCommit1 & 2):

16 blocks (8 + 8) for validators to commit signed_model_hash and model_hash.
run.py calls set_weights() with the agreed-upon weight vector.

Configuration

See Configuration for the full field reference and example YAML.

Running Your Validator

On this page