MuZero
E42386
MuZero is a DeepMind reinforcement learning algorithm that learns to plan and master complex games like Go, chess, and Atari without being given the rules in advance.
Observed surface forms (3)
Statements (48)
| Predicate | Object |
|---|---|
| instanceOf |
DeepMind algorithm
ⓘ
model-based reinforcement learning algorithm ⓘ reinforcement learning algorithm ⓘ |
| achieves |
superhuman performance in Go
ⓘ
superhuman performance in chess ⓘ superhuman performance in shogi ⓘ |
| architectureComponent |
dynamics function
ⓘ
prediction function ⓘ representation function ⓘ |
| basedOn |
Monte Carlo tree search
ⓘ
surface form:
Monte Carlo Tree Search
deep neural networks ⓘ model-based planning ⓘ |
| canPlay |
Atari 2600 games
ⓘ
Go ⓘ chess ⓘ shogi ⓘ |
| category |
game-playing AI system
ⓘ
planning algorithm ⓘ |
| comparedTo | AlphaZero ⓘ |
| countryOfOrigin | United Kingdom ⓘ |
| developer | DeepMind ⓘ |
| differenceFromAlphaZero | does not require known game rules for planning ⓘ |
| field |
artificial intelligence
ⓘ
machine learning ⓘ reinforcement learning ⓘ |
| handles | discrete action spaces ⓘ |
| inputType | raw observations such as images ⓘ |
| inspiredBy |
AlphaGo
ⓘ
AlphaGo Zero ⓘ AlphaZero ⓘ |
| keyFeature |
does not require prior knowledge of game rules
ⓘ
learns environment dynamics from data ⓘ plans using a learned model ⓘ searches in latent state space ⓘ uses value, policy, and reward prediction ⓘ |
| learningSignal | game outcomes ⓘ |
| notableFor |
planning with a learned model without access to true environment dynamics
ⓘ
state-of-the-art performance on Atari benchmark at time of publication ⓘ |
| optimizationObjective | maximize expected cumulative reward ⓘ |
| organization |
DeepMind
ⓘ
surface form:
Google DeepMind
|
| outperforms | prior model-free algorithms on Atari ⓘ |
| publicationYear | 2019 ⓘ |
| publishedIn | Nature ⓘ |
| titleOfPaper |
MuZero
self-linksurface differs
ⓘ
surface form:
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
|
| trainingMethod |
reinforcement learning
ⓘ
self-play ⓘ |
| uses | gradient-based optimization ⓘ |
| usesAlgorithm |
Monte Carlo tree search
ⓘ
surface form:
Monte Carlo Tree Search
|
Referenced by (8)
Full triples — surface form annotated when it differs from this entity's canonical label.
this entity surface form:
Mastering the game of Go without human knowledge
this entity surface form:
Mastering Atari, Go, chess and shogi by planning with a learned model
this entity surface form:
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model