Frontier models have rendered traditional open Capture The Flag events ineffective. A recent Hacker News thread with 188 points and 153 comments documents how current LLMs complete most public challenges without human intervention.
What Happened to Open CTF
Organizers released standard web, crypto, and reverse-engineering tasks. Models such as Claude 3.5 Sonnet and GPT-4o completed the majority of these within minutes. Participants reported that flag submission rates for unaided teams dropped while AI-assisted entries rose sharply.
The core issue is scale. One model can test hundreds of payload variations per minute, removing the time pressure that once separated skilled players from novices.
Technical Breakdown
LLMs now chain tools directly. They call Ghidra for disassembly, run z3 solvers for constraints, and craft SQL injection strings without manual debugging. Success rates on medium-difficulty challenges reached 70-85 % in the reported tests.
Earlier models required heavy scaffolding. Current frontier systems need only a short system prompt listing available tools and a copy of the challenge description.
New Formats Already in Testing
Several organizers have shifted to private or AI-prohibited events. These require live video verification or hardware tokens that models cannot access. Others introduced “AI-assisted” tracks that explicitly allow model use and score on solution elegance instead of speed.
| Format | Traditional Open CTF | AI-Prohibited Events | AI-Assisted Tracks |
|---|---|---|---|
| Time limit | 48 hours | 6 hours | 48 hours |
| Verification | Flag submission | Video + token | Code review |
| Typical winner | Top 5 % humans | Top 10 % humans | Hybrid teams |
| Model success | 70-85 % | <10 % | 100 % allowed |
Pros and Cons of the Shift
- Public challenges now serve mainly as benchmarks rather than competitions.
- New verification methods raise setup costs for organizers.
- Hybrid events reward prompt engineering alongside traditional skills.
- Smaller CTF clubs lose visibility when events move behind closed doors.
Who Should Adapt
CTF organizers running public leaderboards should move to private qualifiers or hardware-bound rounds within the next two seasons. Security researchers who use CTFs for training can continue with public sets but must treat them as model benchmarks, not skill measures. Students preparing for job interviews should still practice on platforms such as Hack The Box and PicoCTF, noting that many tasks now include model-solved solutions in public write-ups.
Practical Next Steps
Event creators can start with lightweight changes: require a short video of the final exploit or add a one-time hardware token. Tool builders should publish updated agent scaffolds that integrate Ghidra and z3 so teams can focus on higher-level strategy.
Bottom line: Open CTF as a public, unaided competition is no longer viable; organizers must choose between verification overhead or explicit AI-inclusive scoring.
Frontier models will continue to compress the time required for standard exploitation tasks, pushing the community toward either closed events or entirely new challenge categories that models cannot yet address.

Top comments (0)