Darwinian natural selection breeds AI browser agents that survive hostile websites. Can evolution teach an AI to avoid traps?
| Metric | Naive | Evolved | Delta |
|---|---|---|---|
| Task Completion | 50% | 100% | +50% |
| Efficiency | 50% | 80% | +30% |
| Resilience (LLM Judge) | 60% | 100% | +40% |
| Strategy (LLM Judge) | 30% | 90% | +60% |
| Composite Fitness | 47.5% | 94.5% | +47% |
The Gauntlet is a hostile e-commerce site with 4 difficulty levels. At Level 3 (Nightmare):
Create random 6-gene browser agent genomes. Each gene controls a different skill.
Run agents against The Gauntlet. Gemini 3 Flash judges their screenshots.
Kill the weak. Survivors breed via crossover + mutation. LLM evolves genes.
| Agent Brain | Gemini 2.5 Flash (via browser-use) |
| LLM Judges | Gemini 3 Flash Preview (multimodal, screenshot-grounded) |
| Evolution Engine | Custom Darwinian system (crossover + mutation + LLM gene evolution) |
| Observability | W&B Weave (full trace lineage) |
| Gauntlet | Flask hostile website (4 mutation levels) |
| Deployment | Vercel (gauntlet) + Railway (agent) |
Launch real browser agents against the Nightmare Gauntlet (Level 3). Watch the naive agent get trapped while the evolved agent completes checkout.