Gemini/DeepMind Competition - BrowserUse Track

Browser Evolution

Darwinian natural selection breeds AI browser agents that survive hostile websites. Can evolution teach an AI to avoid traps?

Head-to-Head: Naive vs Evolved

Naive Prompt
47.5%
FAIL
30 actions | Got trapped
VS
Evolved Genome
94.5%
PASS
16 actions | ORDER_CONFIRMED
MetricNaiveEvolvedDelta
Task Completion 50% 100% +50%
Efficiency 50% 80% +30%
Resilience (LLM Judge) 60% 100% +40%
Strategy (LLM Judge) 30% 90% +60%
Composite Fitness 47.5% 94.5% +47%

The Gauntlet - A Website That Fights Back

The Gauntlet is a hostile e-commerce site with 4 difficulty levels. At Level 3 (Nightmare):

  • 💥 4 trap buttons styled identically to real buttons — all lead to dead-end error pages
  • 🔎 Real buttons are tiny grey links — barely visible, easily missed
  • 📐 Misleading form labels — "Phone Number" field is actually for email
  • 📦 Fake success page — shows "PAYMENT_PENDING" instead of confirmation
  • 💬 Popup overlays — newsletter modals with tempting "GET MY DISCOUNT" button

How Evolution Works

1

Spawn Population

Create random 6-gene browser agent genomes. Each gene controls a different skill.

2

Evaluate & Judge

Run agents against The Gauntlet. Gemini 3 Flash judges their screenshots.

3

Cull & Breed

Kill the weak. Survivors breed via crossover + mutation. LLM evolves genes.

The Evolved Genome - 6 Genes

Navigation
Goal decomposition with alternative path scanning
Element Selection
Semantic understanding + trap awareness: big buttons = traps
Error Recovery
Adaptive strategy switching, trap page detection
Distraction Handling
Immediate popup dismissal via dismiss links
Form Interaction
Field name attributes over misleading labels
Verification
Defensive checking, PAYMENT_PENDING detection

Tech Stack

Agent BrainGemini 2.5 Flash (via browser-use)
LLM JudgesGemini 3 Flash Preview (multimodal, screenshot-grounded)
Evolution EngineCustom Darwinian system (crossover + mutation + LLM gene evolution)
ObservabilityW&B Weave (full trace lineage)
GauntletFlask hostile website (4 mutation levels)
DeploymentVercel (gauntlet) + Railway (agent)

Live Demo — Run Agents Now

Launch real browser agents against the Nightmare Gauntlet (Level 3). Watch the naive agent get trapped while the evolved agent completes checkout.

Naive Agent
Ready to launch
Evolved Agent
Ready to launch
View W&B Weave Traces Try The Gauntlet