Double Pendulum Swingup

A neural network learns to swing up and balance two chaotic pendulums

Loading policy weights…
Simulation Swing-up
Episode
0
Return
/ 1000
Avg (20)
Score
%

Both pendulums start hanging downward. The agent controls only the cart — pushing it left and right to pump energy into the system. Within about 1.5 seconds it swings both links to vertical and holds them there for the remaining 8.5 seconds of each episode.

Trained with SAC (Soft Actor-Critic) using curriculum learning: the agent first learned to swing up the outer pole while the inner pole was held near vertical, then gradually tackled both poles from hanging. 2M training steps on a single CPU.

System parameters

Cart mass1.0 kg
Rod masses0.5 / 0.5 kg
Rod lengths1.0 / 1.0 m
Max force100 N
PolicyMLP 256×256
ActivationReLU → tanh
Timestep0.02 s (RK4)
Episode500 steps