Skip to main content

3 posts tagged with "Distributed Training"

Topics covering distributed ML training, parallelism strategies, and communication efficiency.

View All Tags

TEFT: Targeted Expert Fine-Tuning — How We Reduce Communication Overhead by Orders of Magnitude

· 7 min read
Research & Engineering

This post introduces TEFT (Targeted Expert Fine-Tuning) — the protocol at the core of BlockZero — and explains how it achieves communication-efficient, quality-gated distributed MoE fine-tuning over a permissionless network. This is the research paper translated into blog form.

Why We Built BlockZero on Mixture-of-Experts (And Why Data Parallel Would Have Failed)

· 6 min read
Research & Engineering

The single most important architectural decision in BlockZero's design was the choice of expert parallelism over data parallel or pipeline parallel training. This post explains why the other approaches would have failed — and why MoE expert parallelism is uniquely suited to decentralized networks.