In 1992, Gerald Tesauro at IBM’s Thomas J. Watson Research Center developed TD-Gammon, a groundbreaking computer backgammon program that used artificial neural networks and temporal difference learning. Wikipedia — TD-Gammon explains how TD-Gammon achieved world-class play through self-play and introduced novel strategies that human experts adopted. By 1993, version 2.1 of TD-Gammon had played 1.5 million games and was nearly on par with top human players. Its success in backgammon demonstrated the potential of reinforcement learning and neural networks, influencing future AI developments like AlphaGo.

Why it matters: TD-Gammon’s achievement marked a significant milestone in AI, showing that complex games could be mastered through self-play and reinforcement learning. It not only improved backgammon strategy but also paved the way for advancements in machine learning and game AI.

Further reading:

TD-Gammon’s legacy is evident in the continued use of reinforcement learning in modern AI systems.