Work was based on “mastering chess and shogi by self-play with a general reinforcement learning algorithm”.
- Used a game matrix to map the state of the game, and a CNN to predict the utility of average moves.
- To solve a sparse-reward problem, I started by optimizing the RL algorithm to take as many opponent pieces as possible, then to take as many pieces as possible without losing own pieces, and then finally optimized it to win the game.
- Algorithm performed decently, but had clear strategic flaws. I was limited in compute for this project.
Leave a Reply