ForgeJudge

Replay playground

Step through a real recorded agent solve — issue → localized patch → deterministic test verdict → public trace. Replay-first: no live model call, no quota risk; the run was scored once, deterministically, and is reproducible.

🟢 Replay mode — these are real, previously-graded runs from the leaderboard. (A guarded live runner on a Hugging Face Space is the optional, rate-limited counterpart.)

Loading recorded runs…