A neural network learns to swing up and balance two chaotic pendulums
Both pendulums start hanging downward. The agent controls only the cart — pushing it left and right to pump energy into the system. Within about 1.5 seconds it swings both links to vertical and holds them there for the remaining 8.5 seconds of each episode.
Trained with SAC (Soft Actor-Critic) using curriculum learning: the agent first learned to swing up the outer pole while the inner pole was held near vertical, then gradually tackled both poles from hanging. 2M training steps on a single CPU.