Running Your Miner
Connito miners consist of two cooperating processes. Both must be running for a complete, reward-generating setup.
Starting Your Miner
The miner requires two processes running simultaneously. Use two terminals (or tmux/screen):
The --path argument /path/to/checkpoints/miner/<your hotkey>/<run name>/ is the directory that was automatically generated when you ran python connito/shared/config.py in the Configuration step. Here, /path/to refers to the absolute path where you cloned the Connito repository.
Terminal A — Local Training:
env TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1 \
python connito/miner/train.py \
--path /path/to/checkpoints/miner/<your hotkey>/<run name>/Terminal B — Model I/O (chain communication):
python connito/miner/model_io.py \
--path /path/to/checkpoints/miner/<your hotkey>/<run name>/Both processes must point at the same config directory (the --path argument). When --path is not provided, the default config from config.py is used.
Use a separate directory per hotkey (e.g., hk1/, hk2/) to avoid mixing artifacts when running multiple miners.
Two-Process Architecture
flowchart LR
Net(("Validator &\nBlockchain"))
IO["model_io.py\n(Sync Process)"]
Dir[("Checkpoint Dir\n(--path)")]
Train["train.py\n(GPU Process)"]
Net -->|1. Distribute| IO
IO -->|2. Write Baseline| Dir
Dir -->|3. Load| Train
Train -->|4. Train & Save| Dir
Dir -->|5. Read Best| IO
IO -->|6. Commit & Submit| Net
classDef process fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000;
classDef storage fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000;
classDef external fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000;
class IO,Train process;
class Dir storage;
class Net external;Figure: The two-process miner architecture. model_io.py manages all chain communication and timing across three phase-aware workers; train.py handles Distributed Data Parallel (DDP) GPU training and telemetry. They communicate via the shared --path checkpoint directory.
Process 1: train.py — GPU Training Loop
train.py is the GPU-bound process. It:
- Runs the inner optimization loop (AdamW + cosine LR scheduling, optionally utilizing an fp16 or bf16-mixed
GradScaler). - Loads the correct model baseline for your expert group written by
model_io.py. - Evaluates and writes periodic checkpoints to disk for
model_io.pyto pick up and submit. - Automatically spins up
torch.multiprocessing.spawnfor DDP ifworld_size > 1. - Runs a built-in
TelemetryManagerandSystemStatePollerto monitor the subnet's current cycle phase directly.
Process 2: model_io.py — Chain Sync Daemon
model_io.py manages all interactions with the Bittensor blockchain and your assigned validator. Using a central scheduler, it orchestrates jobs across three background threads:
| Thread | What it does | When it runs |
|---|---|---|
| Download thread | Downloads the optimal model metadata and checkpoint from the chain validator | distribute phase |
| Commit thread | Computes a secure hash of the best checkpoint (model_expgroup_{id}.pt) and commits it to the blockchain (miner_commit_1 and miner_commit_2) | miner_commit_1 and miner_commit_2 phases |
| Submit thread | Posts the actual signed checkpoint file straight to the validator's HTTP endpoint | submission phase |
The daemon uses wait_till() to ensure it stays perfectly synchronized with the subnet's block cycle, activating each thread only during exactly the correct block window.
What to Watch in Logs
model_io.py logs
[info ] <distribute> downloaded model metadata from chain: <ModelCheckpoint ...>
[info ] <distribute> task completed.
[info ] <miner_commit_1> committing hash=abcdef... model_version=15 path=/path/to/...
[info ] <miner_commit_2> committing hash=abcdef... model_version=15 path=/path/to/...
[info ] <submission> submitted model block=2150493 destination={'hotkey1...'} hash=abcdef... path=/path/to/model_expgroup_0.pt
[info ] <submission> task completed.train.py logs
[info ] (0) Setup training
[info ] setup_training: success!
[info ] batch loss inner_opt_step=0 loss=8.234
[info ] batch loss inner_opt_step=1 loss=7.891
[info ] (1) Start epoch training current_model_meta=... gradient_accumulation_steps=1 inner_opt_step=2 is_inner_optimizer_step=True step=2
[info ] GradScaler for optimizer step grad_norm=1.5 grad_sum=120.4 scale_after=None scale_before=65536.0 skipped=False
[info ] Checkpoint saved checkpoint_interval=100 inner_opt_step=100The Cycle from Your Perspective
A typical miner cycle proceeds through strict phase lengths in the subnet as follows:
- Distribute Phase:
model_io.pysearches the network for a qualifying validator destination to download your assigned expert group's state.train.pydetects this synchronization and begins calculating gradients against the fresh baseline. - Train Phase:
train.pystrictly handles gradient descent over your local dataset. As it identifies improving checkpoints, it writes them into your--pathdirectory. - Commit Phase: During
miner_commit_1andminer_commit_2,model_io.pycalculates a unique hash of your best training checkpoint (model_expgroup_{id}.pt), signs it with your wallet, and securely registers the commit on the blockchain. - Submit Phase:
model_io.pyfinds the correct destination validator and securely POSTs the physical.ptfile weights. The validator verifies the checkpoint against your blockchain-committed hash to guarantee proof-of-work.
Monitoring
If enabled in your config via wandb settings, train.py utilizes the integrated MetricLogger to push live training metrics seamlessly to Weights & Biases:
Key metrics to watch directly correspond to your local training state and network submissions:
train/loss— should decrease as inner optimization proceeds.- Proof-of-loss equivalents calculated post-submission.
Stopping Gracefully
kill -SIGTERM <model_io_pid>
kill -SIGTERM <train_pid> A SIGTERM interrupt allows train.py to cleanly catch the shutdown, flush any lingering states to a mycelia_final.pt emergency save, and clear GPU cache (free_cuda_models). Likewise, model_io.py uses this generic interrupt signal to cleanly join background threads via a poison-pill pattern.
Avoid kill -9 (SIGKILL) — it prevents the synchronization cleanup and can leave locked file or memory states.