Distributed System Design
Connito runs as a subnet on the Bittensor network. The distributed system orchestrates how expert training tasks are assigned, executed by miners, validated, and merged back into the model — all within a repeating 45-block cycle (~9 minutes).
This section covers the system architecture and the expert partitioning strategy that makes it work.
Subnet Architecture Overview
The training cycle has four phases that repeat every 45 blocks:
- Distribute — validators serve expert slices and training data to miners
- Train — miners fine-tune their assigned experts on domain data
- Commit — miners submit a hash of their updates (two-phase commit to prevent copying)
- Evaluate — validators score updates using Proof-of-Loss, sync gradients across the validator set, and run a DiLoCo-style outer optimizer to merge the best contributions
Three roles coordinate the process: the subnet owner (configures tasks, manages datasets, monitors model health), miners (train expert modules and submit weight updates), and validators (distribute model slices, score submissions, and maintain consensus on the canonical model state).
The two-phase commit prevents miners from copying each other's work, and Proof-of-Loss provides an objective, lightweight quality signal without requiring expensive inference-based evaluation.
Expert Partitioning & Assignment
A 100B+ parameter MoE model can't fit on a single miner's GPU. Expert partitioning solves this by dividing the model's experts into groups, where each group is small enough to fit in a miner's memory (total_params / num_groups).
The system uses an ExpertManager to handle group assignment and an ExpertMapping structure to track which miner trains which experts. Within each group, miners compete — but across groups, work proceeds in parallel without conflicts.
A two-level selection process determines what each miner works on:
- Group assignment — the miner is assigned to an expert group based on availability and load balancing
- ESFT selection — within that group, Targeted Expert Fine-Tuning selects only the experts relevant to the current domain task
This design keeps per-miner hardware requirements low (mid-tier GPUs can participate meaningfully) while enabling the full model to be trained collaboratively across the network.