Snake AI

Trained with Proximal Policy Optimization

0 Current Score
1 Episode
0 Food Eaten

This agent learned to play Snake through 37.1 million timesteps of reinforcement learning on a 20×20 grid. The model achieves a best score of 71 cells (17.8% grid fill) and an average of 32 cells across continuous play.

Training was conducted directly on the target grid size without curriculum learning, using 8 parallel environments and PPO with entropy regularization to encourage exploration.

Best Performance
71 cells
Training Steps
37.1M
Grid Fill
17.8%
Architecture
MLP

Architecture

The agent demonstrates learned strategies including food-seeking behavior, obstacle avoidance, and basic space management. Performance continues to improve with extended training, approaching the theoretical ceiling for feature-based representations.