Random Bingo Game

This random bingo solver was a project designed to explore reinforcement learning principles. I came across a minigame that consists of a 5x5 grid where the player tries to get as many "bingo" lines as possible with limited moves. The player would pick a tile to flip and then a random unflipped tile on the board would get flipped. After 8 user interactions (16 total tiles) the board would be over and the number of bingos would determine the score.

In order to optimize my score in the minigame to get the maximum reward I thought it could be fun to implement a reinforcement learning algorithm. I had not had good results in the past with reinforcement learning and I thought this task would be simple enough to implement a reward structure that would be easy to optimize. While I knew the game was simple enough to solve with a dynamic programing algorithm, I wanted to see if the reinforcement learning algorithm could find a similar solution.

Project Goals

Clone the random bingo game.
Implement reinforcement learning to improve game strategies.
Implement a dynamic programing solution to compare with reinforcement learning.
Implement a supervised learning algorithm on the same decision network to compare with reinforcement learning.

Tech Stack

Frontend: React, Chakra UI
Backend: Python with FastAPI for game logic API
Machine Learning: DDQN reinforcement learning model using PyTorch

Key Features

Reinforcement Learning Integration: The model learns optimal moves to maximize player scores.
Optimal Strategy Comparison: ML model can be compared to dynamic programming and supervised learning solutions.
Bayesian Optimization of Hyperparameters Hyperparameters of the model are optimized using Pareto Frontier Bayesian optimization.
Game Example: Front end implementation of the game with a simple UI.

Playable Example

Bingo Game

Flips Remaining: 8

Score: 0

RL Model Suggestion

0.000

DP Model Suggestion

0.000

Challenges

Initially, I struggled to get the algorithm to select unflipped tiles. I had mildly punished the algorithm for selecting flipped tiles, but it was not enough to get the algorithm to select unflipped tiles. I found that by increasing the punishment and adding a reward for selecting unflipped tiles, the algorithm was able to learn the optimal strategy.

My initial implementation of the game in python was done in a that wasn't as efficient as it could be. This lead to the reinforcement learning model taking a long time to train. I didn't realize this until I had started implementing the dynamic programing solution which also ran extremely slow. I refactored the game logic to be more efficient and reduced the dynamic programing solution time by 10x. I have not retrained the reinforcement learning model yet, but I expect it to be much faster as well.

Lessons Learned

Through this project, I learned how to integrate machine learning concepts, such as reinforcement learning and multi-objective Bayesian optimization, into a real-world application. The project also improved my frontend skills with React and Chakra UI.

Future Improvements

Add multiplayer functionality for competitive bingo games.