Inverted Pendulum Swingup

Learning continuous control with Soft Actor-Critic

Move cart · R Reset

Training Details

This demonstration shows a policy trained using SAC (Soft Actor-Critic) to swing up and balance an inverted pendulum from the downward position. The task requires learning to build energy through coordinated cart movements, then stabilize once upright.

Training Steps 1,000,000
Algorithm SAC
Balance Rate 87.7%
Network [256, 256]

The continuous action space allows smooth cart movements, mapped to target velocities in [-5, 5] m/s. Shaped rewards combining upright positioning, energy, and height enabled convergence within 1M steps.