Passing rate on TB2 vs TB2-Fn, neutral tasks only (n = 55), by model, pooled over Terminus 2, Kira, and the model's native CLI agent. The red bracket marks the TB2 → TB2-Fn drop.
† Error bars show the margin of error (MOE): 95% confidence interval half-width (±1.96σ).