Skip to main content

Running Your Miner

BlockZero miners consist of two cooperating processes. Both must be running for a complete, reward-generating setup.

Starting Your Miner

The miner requires two processes running simultaneously. Use two terminals (or tmux/screen):

Terminal A — Local Training:

env TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1 \
python mycelia/miner/train.py \
--path /path/to/checkpoints/miner/<your hotkey>/<run name>/

Terminal B — Model I/O (chain communication):

python mycelia/miner/model_io.py \
--path /path/to/checkpoints/miner/<your hotkey>/<run name>/

Both processes must point at the same config directory (the --path argument). When --path is not provided, the default config from mycelia/config.py is used.

tip

Use a separate directory per hotkey (e.g., hk1/, hk2/) to avoid mixing artifacts when running multiple miners.

Two-Process Architecture

Figure: miner-two-process-architecture Figure: The two-process miner architecture. model_io.py manages all chain communication and timing; train.py handles GPU training. They communicate via shared checkpoints and the subprocess coordination layer.

Process 1: train.py — GPU Training Loop

train.py is the GPU-bound process. It:

  • Runs the inner optimization loop (AdamW + cosine LR schedule, fp16 GradScaler)
  • Loads the expert group from the latest checkpoint written by model_io.py
  • Writes training checkpoints for model_io.py to pick up and submit
  • Supports multi-GPU via torch.multiprocessing.spawn for DDP

Process 2: model_io.py — Chain Sync Daemon

model_io.py manages all interaction with the Bittensor blockchain and validator. It runs three background threads:

ThreadWhat it doesWhen it runs
Download threadDownloads the latest expert group from the validator's HTTP endpointPhase 1 (Distribute)
Commit threadComputes a hash of the current best checkpoint and posts it to the blockchainPhase 3 (Commit)
Submit threadUploads the committed checkpoint to the validator serverPhase 4 (Submit)

The daemon is phase-aware — it watches the chain block number and activates each thread only during the correct window in the 45-block cycle.

What to Watch in Logs

model_io.py logs

[INFO] Chain sync ready, current block: 12340
[INFO] Entering DISTRIBUTE phase at block 12340
[INFO] Downloading expert group 0 from validator...
[INFO] Download complete: expert_group_0_step_1000.pt (7.2GB)
[INFO] Entering TRAIN phase at block 12345
[INFO] Entering COMMIT phase at block 12374
[INFO] Hash committed: sha256:abcdef... at block 12376
[INFO] Entering SUBMIT phase at block 12379
[INFO] Weights submitted: checkpoint_500_12379_group0.pt
[INFO] Submission accepted, evaluation_eta_blocks: 5

train.py logs

[INFO] Loading expert group from checkpoint...
[INFO] Starting training: step 0/500
[INFO] step 100/500 | loss: 8.234 | lr: 2.1e-5 | throughput: 12.3 tok/s
[INFO] step 200/500 | loss: 6.891 | lr: 1.8e-5
[INFO] step 500/500 | loss: 3.512 | lr: 8.4e-6
[INFO] Training cycle complete. Checkpoint saved: checkpoint_500_12379_group0.pt

The Cycle from Your Perspective

Each 45-block (~9 minute) cycle proceeds as follows:

  1. Blocks 0–5 (Distribute): model_io.py downloads your expert group from the validator. train.py waits for the download to complete.

  2. Blocks 5–35 (Train): train.py runs max_steps of gradient descent on your local domain data. model_io.py monitors the checkpoint directory.

  3. Blocks 35–40 (Commit): model_io.py takes the best checkpoint from train.py, computes its hash, and posts the hash to the blockchain.

  4. Blocks 40–45 (Submit/Evaluate): model_io.py uploads the checkpoint to the validator. The validator scores your submission via Proof-of-Loss and updates weights.

Monitoring

Enable W&B logging for live training curves:

wandb: true
wandb_project: blockzero-miner

Key metrics to watch:

  • train/loss — should decrease each cycle as your expert improves
  • chain/w_i_score — your Proof-of-Loss score for the last cycle (higher = more reward)
  • chain/submission_statusaccepted, rejected, or missed

Stopping Gracefully

kill -SIGTERM <model_io_pid>  # allows clean chain state save
kill -SIGTERM <train_pid> # allows checkpoint write

Avoid kill -9 (SIGKILL) — it can leave a WebSocket lock held on the chain connection.