Double Pendulum Swingup

A neural network learns to swing up and balance two chaotic pendulums

Loading policy weights…

Simulation · Left/Right to perturb Swing-up

Episode

Return

—

Avg (20)

—

Mean |x|

—m

Both pendulums start hanging downward. The agent controls only the cart — pushing it left and right to pump energy into the system. Within about 1.5 seconds it swings both links to vertical, recenters the cart, and holds them there for the remaining 8.5 seconds of each episode. You can tap the left and right arrow keys to shove the cart and watch the policy recover.

The base policy was trained with SAC (Soft Actor-Critic) using curriculum learning. The shipped weights are then refined with a closed-loop distillation pass that teaches the same 256×256 actor to recenter the cart after swing-up without any runtime stabilization hack.

System parameters

Cart mass1.0 kg

Rod masses0.5 / 0.5 kg

Rod lengths1.0 / 1.0 m

Max force100 N

PolicyMLP 256×256

ActivationReLU → tanh

Timestep0.02 s (RK4)

Episode500 steps