value-based reinforcement learning method
C9067
concept
A value-based reinforcement learning method is an approach that learns a value function estimating expected future rewards for states or state-action pairs and derives a policy by selecting actions that maximize these estimated values.
All labels observed (15)
| Label | Occurrences |
|---|---|
| reinforcement learning technique | 7 |
| value-based reinforcement learning method canonical | 5 |
| Deep Q-Network variant | 2 |
| DQN extension | 1 |
| Deep Q-Learning variant | 1 |
| Q-learning variant | 1 |
| batch reinforcement learning method | 1 |
| goal-conditioned value function model | 1 |
| model-free algorithm | 1 |
| off-policy algorithm | 1 |
| off-policy learning algorithm | 1 |
| off-policy value-based method | 1 |
| temporal-difference learning algorithm | 1 |
| temporal-difference learning method | 1 |
| value function learning method | 1 |
Instances (16)
| Instance | Via concept surface |
|---|---|
| Double DQN | — |
| Generalized Advantage Estimation | reinforcement learning technique |
| Rainbow DQN | — |
| Atari deep Q-network | — |
|
HER
surface form:
Hindsight Experience Replay
|
reinforcement learning technique |
|
Rachel Fong
surface form:
Hindsight Experience Replay
|
reinforcement learning technique |
| Universal Value Function Approximators | goal-conditioned value function model |
|
Jonas Schneider
surface form:
Hindsight Experience Replay
|
reinforcement learning technique |
| Deep Q-Learning | — |
| Q-learning | temporal-difference learning method |
|
Josh Tobin
surface form:
Hindsight Experience Replay
|
reinforcement learning technique |
|
TD(lambda)
surface form:
TD(λ)
|
temporal-difference learning algorithm |
|
neural fitted Q-iteration (NFQ)
surface form:
Neural Fitted Q-Iteration
|
off-policy value-based method |
| Dueling DQN | — |
| Prioritized Experience Replay DQN | Deep Q-Network variant |
| Hindsight Experience Replay | reinforcement learning technique |