Comparison by task

Mean reward and mean USD cost for n = 3 trials. Click a row for the side-by-side traces. The Average row reports the per-arm mean across tasks and the mean quality and cost delta of Sonnet+MCP over Fable. Gap = Sonnet with Sourcegraph minus Fable on a plain checkout.

TaskSonnet + SourcegraphFable, without MCPgap
qualitycostqualitycost
ccx-crossorg-2170.695$0.460.104$1.07+0.591compare →
ccx-vuln-remed-1350.611$0.230.231$0.93+0.380compare →
ccx-migration-2890.681$0.850.574$1.32+0.107compare →
ccx-agentic-2230.625$0.560.537$0.99+0.088compare →
ccx-migration-2741.000$0.940.928$1.09+0.072compare →
ccx-vuln-remed-1260.759$1.340.735$0.84+0.024compare →
ccx-config-trace-0101.000$0.181.000$0.460.000compare →
ccx-incident-1450.172$0.550.172$1.190.000compare →
ccx-crossorg-2880.551$1.250.833$1.45-0.282compare →
Average
9 tasks
0.677$0.710.568$1.04
+0.109 reward
-$0.33 cost