AlphaZero
E40166
AlphaZero is a DeepMind-developed artificial intelligence system that mastered complex games like chess, shogi, and Go through self-play reinforcement learning without human-crafted strategies.
Aliases (3)
Statements (53)
| Predicate | Object |
|---|---|
| instanceOf |
artificial intelligence system
→
game‑playing program → |
| architectureType |
deep neural network with Monte Carlo tree search
→
|
| basedOn |
Monte Carlo tree search
→
deep learning → reinforcement learning → |
| contrastWith |
programs relying on human expert knowledge
→
traditional chess engines using alpha‑beta search → |
| countryOfOrigin |
United Kingdom
→
|
| creatorOrganizationType |
AI research lab
→
|
| defeated |
Elmo shogi engine
→
Stockfish 8 → previous Go programs based on AlphaGo Zero → |
| designedFor |
Go
→
chess → shogi → |
| developer |
DeepMind
→
Google DeepMind → |
| doesNotUse |
endgame tablebases for search guidance
→
human‑crafted opening books → |
| evaluationFunction |
learned value function
→
|
| field |
artificial intelligence
→
computer Go → computer chess → computer shogi → machine learning → |
| firstPublicAnnouncementDate |
2017-12-06
→
|
| firstPublicAnnouncementYear |
2017
→
|
| gameRepresentation |
board positions encoded for neural networks
→
|
| generalizationProperty |
single algorithm applied to multiple games
→
|
| hardwareUsed |
TPUs
→
|
| learningObjective |
maximize expected game outcome
→
|
| learningParadigm |
tabula rasa learning
→
|
| notableFor |
mastering Go through self‑play
→
mastering chess through self‑play → mastering shogi through self‑play → |
| outperforms |
AlphaGo Zero
→
Elmo → Stockfish → |
| parentProject |
AlphaGo project
→
|
| policyRepresentation |
probability distribution over moves
→
|
| publicationTitle |
A general reinforcement learning algorithm that masters chess, shogi, and Go through self‑play
→
|
| publishedIn |
Science
→
|
| rewardSignal |
game result win‑draw‑loss
→
|
| searchGuidance |
policy network priors
→
value network evaluations → |
| searchTechnique |
Monte Carlo tree search guided by neural networks
→
|
| trainingDataSource |
self‑generated game data
→
|
| trainingMethod |
self‑play
→
|
| trainingRegime |
self‑play reinforcement learning without human examples
→
|
| uses |
neural networks
→
policy network → value network → |
Referenced by (15)
| Subject (surface form when different) | Predicate |
|---|---|
|
David Silver
→
DeepMind → Demis Hassabis → |
knownFor |
|
DeepMind
("AlphaGo Zero")
→
DeepMind → |
developed |
|
AlphaGo
("AlphaGo Zero")
→
AlphaGo → |
inspired |
|
AlphaGo
("AlphaGo Zero")
→
AlphaGo → |
successor |
|
MuZero
→
|
comparedTo |
|
MuZero
→
|
inspiredBy |
|
David Silver
("Mastering chess and shogi by self-play with a general reinforcement learning algorithm")
→
|
notablePaper |
|
David Silver
→
|
notableWork |
|
AlphaZero
("A general reinforcement learning algorithm that masters chess, shogi, and Go through self‑play")
→
|
publicationTitle |
|
AlphaStar
→
|
relatedTo |