Mountain Car

About This Model

This agent was trained using Deep Q-Network (DQN) with Stable Baselines3, a state-of-the-art reinforcement learning library in Python. The trained model was then converted to TensorFlow.js to run directly in your browser.

Training Details

Algorithm: DQN (Deep Q-Network) with sparse rewards
Training Duration: 120,000 timesteps (~40 seconds on CPU)
Network Architecture: 2 inputs → [256, 256] hidden → 3 actions
Best Performance: Achieved at 60,000 steps (mean reward -148)
Episode Length: Solves in exactly 136 steps consistently
Success Rate: 100% after convergence

How It Learned

The agent learned through trial and error, discovering the optimal strategy through 120,000 environment interactions. It receives a small penalty for each time step, incentivizing it to reach the goal quickly.

Through exploration and learning, it discovered that building momentum by rocking back and forth is key to reaching the goal on the right peak.

Technical Implementation

Training Framework: PyTorch + Stable Baselines3
Inference Framework: TensorFlow.js (browser)
Model Size: 543 KB (256x256 network weights)
Conversion Accuracy: <0.000005 error vs original model

View the code:
Custom Gym Environment | Training Script | Model Converter | Browser Inference Code