Swarm
Emergent flocking from local rewards
Move your mouse over the simulation to guide the swarm. Each agent sees only its nearest neighbors, yet the group moves as one—splitting, merging, and flowing like a living fluid.
We trained a PPO agent where each particle uses the same policy network, receiving only local information: its velocity, direction to an attractor, and the relative positions and velocities of its 7 nearest neighbors. The reward encourages following the attractor while maintaining cohesion and separation—the neural network discovers how to balance them.
The Three Rules
Flocking behavior emerges from three simple local rules, first described by Craig Reynolds in 1986:
- Cohesion — Steer toward the average position of neighbors
- Separation — Avoid crowding neighbors
- Alignment — Match the heading of neighbors
Rather than hardcoding these rules, we provide them as reward signals and let the neural network learn the appropriate response to any local configuration:
Local Observations
Each agent observes a 32-dimensional vector containing only local information:
No agent knows the global state. No agent is given explicit rules. Yet from these simple rewards, the shared policy discovers how to create fluid collective motion.
Emergent Properties
Watch the simulation and you'll notice emergent properties:
- Fluid motion — The swarm moves smoothly, with direction changes propagating like waves
- Shape-shifting — The swarm constantly changes shape while maintaining cohesion
- No leader — Any agent can influence the group's direction
- Robustness — Even scattered agents form a cohesive swarm
This demonstration uses a shared PPO policy with discrete actions (8 directions + stay). Training was done with PyTorch; inference runs in-browser with TensorFlow.js.