From the lab

AI as self-learning gamer

Through reinforcement learning, the team wanted to see if an AI could learn to play Snake better than a human − without any instructions other than a carrot-and-stick approach.

Tech used

  • Stable Baselines3
  • Gymnasium
  • PettingZoo

Press to view video

Background

Reinforcement learning (RL) is a branch of machine learning that involves assigning a model the task of learning something on its own through rewards and punishments − a kind of “carrot-and-stick” philosophy. This technique can be used in various fields such as games, robotics, and self-driving cars.

Armed with readily available libraries and algorithms, Christina, Lucia, Adam, Philipp, Balthazar, and Joakim explored whether they could build this type of model without any prior experience in the field. The goal was to create an agent that could learn how to play a game better than a human.

The Process & Challenges

After initial research, the team began testing various game environments from the Gymnasium library, built on Open AI’s Gym API, along with algorithms from Stable Baselines3.

After training the model in both Snake and Asteroids through quick iteration loops, they soon realized that the Asteroids environment required significantly more training data. With limited time, the choice fell on Snake, and all focus was directed towards developing that game environment to train multiple agents simultaneously.

One challenge was finding a reasonable balance between rewards and punishments. The training process also took some time, requiring both patience and hardware resources.

Result & Key takeaways

Despite having no prior experience with reinforcement learning, the team found it easy to get started with the help of the libraries available. It was also possible to train the models on the team’s own equipment. While the tests overall were time-consuming, the feedback loop for iterations was relatively quick.

Even though RL is a broad field, another important lesson was that it's possible to explore the area step by step and still make real progress.

Future Actions

The team would like to build on the project by creating several parallel and simultaneous training environments to explore other scenarios – which would call for more adaptive and independent agents.

In the future, this could lead to increasingly complex simulations with additional parameters in various contexts, to truly challenge the models’ – and the team’s − full potential.