All harnesses

Pass rate on the original task (TB2), TB2-Fn-I (strict difficulty-invariant set), TB2-Fn-C (difficulty-controlled set), and TB2-Fn. Red deltas are performance change from original TB2. † Error bars show the MOE: 95% CI half-width (±1.96σ).