Replay playground
Step through a real recorded agent solve — issue → localized patch → deterministic test verdict → public trace. Replay-first: no live model call, no quota risk; the run was scored once, deterministically, and is reproducible.
🟢 Replay mode — these are real, previously-graded runs from the leaderboard. (A guarded
live runner on a Hugging Face Space is the optional, rate-limited counterpart.)
Loading recorded runs…