Swarm

Emergent flocking from local rewards

An interactive demonstration

Move your mouse over the simulation to guide the swarm. Each agent sees only its nearest neighbors, yet the group moves as one—splitting, merging, and flowing like a living fluid.

We trained a PPO agent where each particle uses the same policy network, receiving only local information: its velocity, direction to an attractor, and the relative positions and velocities of its 7 nearest neighbors. The reward encourages following the attractor while maintaining cohesion and separation—the neural network discovers how to balance them.

The Three Rules

Flocking behavior emerges from three simple local rules, first described by Craig Reynolds in 1986:

Cohesion — Steer toward the average position of neighbors
Separation — Avoid crowding neighbors
Alignment — Match the heading of neighbors

Rather than hardcoding these rules, we provide them as reward signals and let the neural network learn the appropriate response to any local configuration:

reward = attractor_following + cohesion_bonus + alignment_bonus − separation_penalty

Local Observations

Each agent observes a 32-dimensional vector containing only local information:

[velocity, attractor_dir, neighbor₁_pos, neighbor₁_vel, ..., neighbor₇_pos, neighbor₇_vel]

No agent knows the global state. No agent is given explicit rules. Yet from these simple rewards, the shared policy discovers how to create fluid collective motion.

            Key insight: The policy learns a continuous mapping from local observations to actions.
            Unlike rule-based boids, it can adapt its behavior to different configurations—turning sharply
            when crowded, cruising smoothly when aligned.
        

Emergent Properties

Watch the simulation and you'll notice emergent properties:

Fluid motion — The swarm moves smoothly, with direction changes propagating like waves
Shape-shifting — The swarm constantly changes shape while maintaining cohesion
No leader — Any agent can influence the group's direction
Robustness — Even scattered agents form a cohesive swarm

This demonstration uses a shared PPO policy with discrete actions (8 directions + stay). Training was done with PyTorch; inference runs in-browser with TensorFlow.js.