PPO Robot Arm

How it works

This demo showcases a Proximal Policy Optimization (PPO) agent trained with Stable Baselines3 to control a two-segment robot arm. The agent autonomously:

Navigates the arm to reach the red block
Closes the claw to securely grasp the block
Lifts the block above the green target line

The model was trained for 3M timesteps using PPO, a state-of-the-art policy gradient method. The trained PyTorch model was exported to TensorFlow.js for browser-based inference.

Technical details: The agent receives 11-dimensional observations (joint angles, block position, claw state, and valid action masks) and outputs discrete actions to control the arm. PPO's clipped objective ensures stable learning while maximizing task completion rewards.

Robot Arm Control with PPO

How it works