3 posts tagged with "Distributed Training"

Topics covering distributed ML training, parallelism strategies, and communication efficiency.

The Distributed MoE Landscape: A Practical Survey of What Works and What Doesn't

February 19, 2025 · 7 min read

Research & Engineering

Before building BlockZero, we surveyed the literature on distributed Mixture-of-Experts training. This is a practical synthesis of what we found — what methods exist, what problems they solve, where they fall short, and what gap BlockZero fills.

TEFT: Targeted Expert Fine-Tuning — How We Reduce Communication Overhead by Orders of Magnitude

January 22, 2025 · 7 min read

BlockZero Team

Research & Engineering

This post introduces TEFT (Targeted Expert Fine-Tuning) — the protocol at the core of BlockZero — and explains how it achieves communication-efficient, quality-gated distributed MoE fine-tuning over a permissionless network. This is the research paper translated into blog form.

Why We Built BlockZero on Mixture-of-Experts (And Why Data Parallel Would Have Failed)

January 15, 2025 · 6 min read

BlockZero Team

Research & Engineering

The single most important architectural decision in BlockZero's design was the choice of expert parallelism over data parallel or pipeline parallel training. This post explains why the other approaches would have failed — and why MoE expert parallelism is uniquely suited to decentralized networks.