Q-learning

Q-learning Even the value iteration algorithm is not the solution to every problem especially where the cost and the transition probability functions are unknown a priori, so the value iteration algorithm can not be used to compute the optimal value function. Instead...

Bellman Equation

Bellman Equation When the agent is in state i at time slot n and takes action a, it transitions to the next state j at time slot n+1 with probability and incurs expected cost c(i, a). However, given the available actions , it is not enough to select the action that...

Markov Decision Process (MDP)

Markov Decision Process (MDP) Here, we consider a discrete-time Markov Decision Process where time is slotted into intervals of equal duration indexed by Definition 1. A Markov Decision Process M is a 5-tuple (S;A; p; c; ) where S is the set of states i, A is the set...