Proximal Policy Optimization for robotic manipulation
Success
0Episodes
0%Success Rate
How it works
This demo showcases a Proximal Policy Optimization (PPO) agent trained with Stable Baselines3 to control a two-segment robot arm. The agent autonomously:
Navigates the arm to reach the red block
Closes the claw to securely grasp the block
Lifts the block above the green target line
The model was trained for 3M timesteps using PPO, a state-of-the-art policy gradient method. The trained PyTorch model was exported to TensorFlow.js for browser-based inference.
Technical details: The agent receives 11-dimensional observations (joint angles, block position, claw state, and valid action masks) and outputs discrete actions to control the arm. PPO's clipped objective ensures stable learning while maximizing task completion rewards.