Abstract/Preview:
Can AI play games like Pacman? Yes, they can. Reinforcement learning algorithms can get a computer to learn to play games like Pacman, all by itself. As the AI plays the game millions of times, the algorithm trains it to predict the outcome of actions and learn useful features from the game screen.
Main post:
You may have seen many advances in AI in recent years. ChatGPT for text generation, and Midjourney for image generation are two examples. But can AI plan and make decisions like humans?
Yes, it can. Enter reinforcement learning. Today, we will go step by step into roughly how a reinforcement learning AI can learn to play Pacman, all by itself. The way it does this is to play the game millions of times, detect objects on the game’s screen, and predict the outcome of actions in terms of points scored.
First, if you don’t know what Pacman is, it is a game where you try to eat as many pellets as you can while avoiding the ghosts.
Now, let’s first consider an AI that has already learned to play Pacman. One way to detect the objects of the screen is to use an object detection model called CNN. Imagine sliding a ghost detector across the game screen. The detector is a grid of numbers that you multiply with the values of the pixels the detector is on top of. The detector is designed so that when it is on top of a ghost, the product is large, and so the AI can guess that the ghost is there. We can have these detectors for other objects too.
Now that the AI knows where the important objects are, how does it decide what actions to take?
Well, it keeps track of the values of actions in a model called the action-value function, often denoted by Q. Given where all the important objects are in the game, Q predicts from experience the sum of points it will obtain in the next several timesteps for each action you can take. Now, all the AI needs to do is to take the action that Q predicts will have the highest points.
So now we know how the AI can play Pacman given that it has a CNN to detect the objects and Q model to predict the points. But how do we get these models in the first place?
Our AI starts with a CNN that detects random features of the screen, and a Q model that outputs random values for the sum of points it will obtain in the future. It first makes these predictions, and then it plays the game and observes the points it gets. It checks how good the predictions were by taking the difference between its previous predictions and the points it got plus its current prediction of the sum of future points. It then updates both Q and CNN at the same time so that if it were to measure this difference again, it would be smaller. It does this by gradient descent if you’ve heard of it before.
As we repeat this millions of times, Q starts making better predictions, and the CNN starts detecting more useful objects for Q to make the predictions from. Eventually, our AI would be getting many points in Pacman!
Joel Woodfield
The University of Queensland