Inverted Pendulum Swingup
Learning continuous control with Soft Actor-Critic
← → Move cart · R Reset
Training Details
This demonstration shows a policy trained using SAC (Soft Actor-Critic) to swing up and balance an inverted pendulum from the downward position. The task requires learning to build energy through coordinated cart movements, then stabilize once upright.
Training Steps
1,000,000
Algorithm
SAC
Balance Rate
87.7%
Network
[256, 256]
The continuous action space allows smooth cart movements, mapped to target velocities in [-5, 5] m/s. Shaped rewards combining upright positioning, energy, and height enabled convergence within 1M steps.