build on the architecture of the original Hibiki but introduce a new training method based on RL. While Hibiki relied on complex heuristics to create aligned synthetic data, Hibiki-Zero only requires sentence-level alignment and learns word-level alignments through RL. This simplifies the synthetic data creation process, decreases the latency, and seamlessly scales to multiple languages.
story on X: https://x.com/kyutai_labs/status/2022007408898511113
github: https://github.com/kyutai-labs/hibiki-zero
arvix: https://arxiv.org/abs/2602.11072