The Future of AI Is Continuous Improvement —
From model releases to reusable expertise
For a long time, AI teams have treated the model as the product. If the model needed to become better at math, the answer was to train the model. If it needed stronger coding, safer behavior, better tool use, or a new enterprise domain, the answer was still some version of the same thing: update the checkpoint, evaluate the new version, and ship it.
That made sense when the main story of AI progress was scale. Bigger models, more data, larger training runs, broader post-training. The model improved because the whole model changed. But as models become more capable, that assumption starts to feel too blunt.
Not every customer-facing product problem is a whole-model problem. A company may want to improve coding without weakening safety. A customer may need a private legal capability without a full model fork. A data partner may want its contribution included for some users and excluded for others. A model may need a new language or domain without disturbing everything it already knows.
Today, we are moving from improving models as monolithic systems to improving them as collections of reusable capabilities.
That is why modular models are starting to look less like an efficiency trick and more like product architecture. The important shift is not only that Mixture-of-Experts models activate fewer parameters per token. It is that expertise is becoming a first-class asset inside the model.
Across these research threads, the same pattern is becoming visible. LoRAHub and Microsoft’s LoRA library work treat adapters as reusable expert modules. BAR and Branch-Train-Stitch move that logic into model architecture, recombining specialists through routers or stitch layers. EMO goes further, asking whether experts can emerge during training itself. Together, they suggest that expertise is becoming something models can store, compose, govern, and improve over time.
This is the direction Connito is exploring: how to make modular expertise easier to identify, organize, and reuse. If models are becoming collections of capabilities, then the next infrastructure problem is not only training better experts. It is helping teams understand which experts exist, what they are good at, and how they can compound over time.
LoRA Libraries as Modular Model Architecture
The first real “expert library” pattern we have seen in LLM architecture comes from LoRA: small, task-specific adapters that can be trained separately, stored, reused, and composed around a shared base model.
The most interesting shift in modular LLM research is that “model capability” no longer has to live inside one monolithic checkpoint. In Towards Modular LLMs by Building and Reusing a Library of LoRAs, Microsoft researchers frame LoRA adapters as reusable experts: train lightweight adapters for tasks, organize them into a library, and route new inputs to the most relevant modules. Their Model-Based Clustering method groups tasks by similarity in LoRA parameters, while Arrow provides zero-shot routing, selecting useful adapters without retraining a router or requiring access to the original training data. This turns specialization into an architectural layer: capabilities can be added, clustered, reused, and composed rather than baked permanently into the base model.
MoLoRA extends this idea further. Instead of routing an entire request to a single adapter, it routes at the token level, allowing one response to draw on multiple specialized LoRAs. This matters for mixed-capability tasks: “write code to solve this equation” may need both mathematical reasoning and code-generation expertise. MoLoRA’s core claim is that specialization can beat scale: smaller models equipped with composable adapters can outperform larger general models on targeted benchmarks.
As product architecture, this suggests a new pattern: ship a stable base model, maintain a growing library of domain adapters, and use routing as the orchestration layer. The product surface becomes modular, extensible, and cheaper to update — closer to a plugin ecosystem than a single model release.
Branching as Product Architecture
BAR, short for Branch-Adapt-Route, is a practical example of modular model development. Instead of treating every improvement as a whole-model update, BAR starts with an existing post-trained model, branches it into separate domain experts, adapts those experts independently, and then routes between them inside a shared Mixture-of-Experts system.
The important shift is architectural. Math, code, tool use, and safety are not treated as generic benchmark categories. They become reusable capabilities that can be trained, evaluated, upgraded, and governed on their own timelines. The original model remains as an anchor expert, preserving general behavior, while new experts specialize around domains where the system needs to improve.
In the paper’s 7B experiments, BAR reaches an average score of 49.1 across 19 benchmarks. That is stronger than continual post-training and other modular baselines, and surprisingly close to a much more expensive retraining pipeline with mid-training, which scores 50.5. Mid-training is the phase between general pretraining and final post-training, where a model is exposed to focused domain data such as code, math, legal, or scientific text so it develops stronger specialist knowledge before instruction tuning. BAR suggests that modular post-training can capture much of that benefit through separate experts, without forcing every capability back through the same full-model training pipeline.
| Approach | Score | Requirement |
|---|---|---|
| Full retrain | 47.8 | Retrain without extra mid-training |
| BAR | 49.1 | Branch experts, adapt separately, train router |
| Full retrain + mid-training | 50.5 | Retrain with additional domain mid-training |
The broader point is that a model roadmap does not have to be a sequence of monolithic checkpoint releases. A code expert can be upgraded without retraining the safety expert. A math expert can improve without disturbing tool use. BAR makes expertise look less like a hidden property of a checkpoint and more like infrastructure that can compound over time.
Branch-Train-Stitch from Meta AI pushes the same idea through a different mechanism. Instead of merging specialists through routing, it branches a seed model into independently trained experts, freezes them, and learns lightweight stitch layers that connect their representations back into one generalist system. BAR routes experts; BTS stitches them together.
Emergent Modularity
But what if we don’t already know what the experts should be?
BAR starts with predefined capabilities: math, code, tool use, safety. EMO approaches the same modular future from the opposite direction. Instead of assigning experts to domains upfront, it asks whether useful expert structure can emerge during pretraining itself.
The intuition is simple. Tokens from the same document usually belong to the same broad context. A code file, a math proof, a scientific article, and a general web page each carry different patterns of knowledge. EMO uses this document-level structure as a weak signal. During training, each document is routed through a shared pool of experts, encouraging tokens from the same document to rely on similar expert groups. Over time, those groups begin to specialize around broader themes without needing manually labeled domains.
The result is a model that behaves less like one undifferentiated MoE and more like a system with recoverable expert subsets. In the paper’s experiments, EMO retains nearly full performance when using only part of its expert library: keeping 25% of experts leads to about a 1% absolute performance drop, while keeping 12.5% leads to about a 3% drop.
| EMO expert subset used | Experts removed | Reported performance drop |
|---|---|---|
| 100% | 0% | 0% |
| 25% | 75% | About 1% absolute |
| 12.5% | 87.5% | About 3% absolute |
That matters because it changes what an MoE can be used for. The point is not only sparse activation during inference. It is the possibility of identifying smaller expert groups that carry useful capabilities. BAR builds an expert library deliberately; EMO shows that, under the right training pressure, part of that library can emerge from the data itself.
Connito’s Place in the Modular Stack
The research direction is becoming clear: models are starting to separate into reusable capabilities. The next question is how those capabilities become operational.
An expert is only useful if a team can find it, understand what it does, trust its performance, and decide when to reuse it. Without that layer, modularity risks becoming another internal training artifact: technically interesting, but hard to manage as a product system.
This is where Connito fits into the modular model movement. Connito is building the infrastructure around expert discovery, training, validation, and reuse. Rather than treating every customer request as a new model project, Connito helps identify the specialist capability a task actually needs, route training through a distributed Bittensor network, evaluate the resulting expert, and fold successful work back into a growing capability library.
The goal is not only to make individual models better. It is to make model improvement more cumulative. A legal expert, a coding expert, or a customer-specific domain expert should not vanish after one deployment. It should become something that can be tested, governed, combined with other experts, and improved as new demand appears.
For Connito, the opportunity is to turn modular model behavior into a repeatable product layer. The checkpoint still matters, but the durable asset is the system around it: the expert library, the validation process, and the network that keeps expanding what the library can do.