Training Steps: 0

How It Works

Proximal Policy Optimization (PPO)

This AI uses PPO, a deep reinforcement learning algorithm that learns through trial and error. PPO is more stable than traditional policy gradient methods because it limits how much the AI's policy can change in each training step, preventing catastrophic forgetting.

Self-Play Training

The AI learns by playing against itself. Both the top and bottom paddles are controlled by the same neural network, but they learn from different perspectives. This self-play approach allows the AI to discover and adapt to increasingly sophisticated strategies.

Training Curriculum

The AI learns in three stages:

  1. Hit Puck: First learns basic puck interaction (500 successful hits to advance)
  2. Score Goal: Learns to aim shots toward the goal (50 goals to advance)
  3. Strategy: Develops advanced offensive and defensive tactics

Reward System

The AI receives rewards for:

It receives penalties for:

Toggle Button

The toggle button switches between:

Learning Time

The AI typically needs about 20,000 training steps to develop basic gameplay skills. During this time, it progresses from random movements to purposeful hits, and eventually to strategic gameplay.