Connito AI
Miner Guide

Running Your Miner

Connito miners consist of two cooperating processes. Both must be running for a complete, reward-generating setup.

Starting Your Miner

The miner requires two processes running simultaneously. Use two terminals (or tmux/screen):

The --path argument /path/to/checkpoints/miner/<your hotkey>/<run name>/ is the directory that was automatically generated when you ran python connito/shared/config.py in the Configuration step. Here, /path/to refers to the absolute path where you cloned the Connito repository.

Terminal A — Local Training:

env TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1 \
python connito/miner/train.py \
  --path /path/to/checkpoints/miner/<your hotkey>/<run name>/

Terminal B — Model I/O (chain communication):

python connito/miner/model_io.py \
  --path /path/to/checkpoints/miner/<your hotkey>/<run name>/

Both processes must point at the same config directory (the --path argument). When --path is not provided, the default config from config.py is used.

Use a separate directory per hotkey (e.g., hk1/, hk2/) to avoid mixing artifacts when running multiple miners.

Two-Process Architecture

flowchart LR
    Net(("Validator &\nBlockchain"))
    IO["model_io.py\n(Sync Process)"]
    Dir[("Checkpoint Dir\n(--path)")]
    Train["train.py\n(GPU Process)"]

    Net -->|1. Distribute| IO
    IO -->|2. Write Baseline| Dir
    Dir -->|3. Load| Train
    Train -->|4. Train & Save| Dir
    Dir -->|5. Read Best| IO
    IO -->|6. Commit & Submit| Net
    
    classDef process fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000;
    classDef storage fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000;
    classDef external fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000;
    
    class IO,Train process;
    class Dir storage;
    class Net external;

Figure: The two-process miner architecture. model_io.py manages all chain communication and timing across three phase-aware workers; train.py handles Distributed Data Parallel (DDP) GPU training and telemetry. They communicate via the shared --path checkpoint directory.

Process 1: train.py — GPU Training Loop

train.py is the GPU-bound process. It:

  • Runs the inner optimization loop (AdamW + cosine LR scheduling, optionally utilizing an fp16 or bf16-mixed GradScaler).
  • Loads the correct model baseline for your expert group written by model_io.py.
  • Evaluates and writes periodic checkpoints to disk for model_io.py to pick up and submit.
  • Automatically spins up torch.multiprocessing.spawn for DDP if world_size > 1.
  • Runs a built-in TelemetryManager and SystemStatePoller to monitor the subnet's current cycle phase directly.

Process 2: model_io.py — Chain Sync Daemon

model_io.py manages all interactions with the Bittensor blockchain and your assigned validator. Using a central scheduler, it orchestrates jobs across three background threads:

ThreadWhat it doesWhen it runs
Download threadDownloads the optimal model metadata and checkpoint from the chain validatordistribute phase
Commit threadComputes a secure hash of the best checkpoint (model_expgroup_{id}.pt) and commits it to the blockchain (miner_commit_1 and miner_commit_2)miner_commit_1 and miner_commit_2 phases
Submit threadPosts the actual signed checkpoint file straight to the validator's HTTP endpointsubmission phase

The daemon uses wait_till() to ensure it stays perfectly synchronized with the subnet's block cycle, activating each thread only during exactly the correct block window.

What to Watch in Logs

model_io.py logs

[info     ] <distribute> downloaded model metadata from chain: <ModelCheckpoint ...>
[info     ] <distribute> task completed.
[info     ] <miner_commit_1> committing           hash=abcdef... model_version=15 path=/path/to/...
[info     ] <miner_commit_2> committing           hash=abcdef... model_version=15 path=/path/to/...
[info     ] <submission> submitted model          block=2150493 destination={'hotkey1...'} hash=abcdef... path=/path/to/model_expgroup_0.pt
[info     ] <submission> task completed.

train.py logs

[info     ] (0) Setup training
[info     ] setup_training: success!
[info     ] batch loss                     inner_opt_step=0 loss=8.234
[info     ] batch loss                     inner_opt_step=1 loss=7.891
[info     ] (1) Start epoch training       current_model_meta=... gradient_accumulation_steps=1 inner_opt_step=2 is_inner_optimizer_step=True step=2
[info     ] GradScaler for optimizer step  grad_norm=1.5 grad_sum=120.4 scale_after=None scale_before=65536.0 skipped=False
[info     ] Checkpoint saved               checkpoint_interval=100 inner_opt_step=100

The Cycle from Your Perspective

A typical miner cycle proceeds through strict phase lengths in the subnet as follows:

  1. Distribute Phase: model_io.py searches the network for a qualifying validator destination to download your assigned expert group's state. train.py detects this synchronization and begins calculating gradients against the fresh baseline.
  2. Train Phase: train.py strictly handles gradient descent over your local dataset. As it identifies improving checkpoints, it writes them into your --path directory.
  3. Commit Phase: During miner_commit_1 and miner_commit_2, model_io.py calculates a unique hash of your best training checkpoint (model_expgroup_{id}.pt), signs it with your wallet, and securely registers the commit on the blockchain.
  4. Submit Phase: model_io.py finds the correct destination validator and securely POSTs the physical .pt file weights. The validator verifies the checkpoint against your blockchain-committed hash to guarantee proof-of-work.

Monitoring

If enabled in your config via wandb settings, train.py utilizes the integrated MetricLogger to push live training metrics seamlessly to Weights & Biases:

Key metrics to watch directly correspond to your local training state and network submissions:

  • train/loss — should decrease as inner optimization proceeds.
  • Proof-of-loss equivalents calculated post-submission.

Stopping Gracefully

kill -SIGTERM <model_io_pid>  
kill -SIGTERM <train_pid>     

A SIGTERM interrupt allows train.py to cleanly catch the shutdown, flush any lingering states to a mycelia_final.pt emergency save, and clear GPU cache (free_cuda_models). Likewise, model_io.py uses this generic interrupt signal to cleanly join background threads via a poison-pill pattern.

Avoid kill -9 (SIGKILL) — it prevents the synchronization cleanup and can leave locked file or memory states.