From Sutton and Barto's book Reinforcement Learning (Adaptive Computation and Machine Learning series), the following definition is given for Q-Learning :
I', planning to ask a question in combining the above algorithm with policy gradient learning but I'm struggling to format it correctly with MathJax. Here is what I have so far which looks awful in comparison to above algorithm:
$$ Algorithm \hspace{1mm} parameters: step size \hspace{1mm} \alpha \in (0 , 1] , \epsilon > 0 \\ Initialize \hspace{1mm} Q \hspace{1mm} ( s, a ), \ \forall s \in S^+ , a \in A ( s ), arbitrarily \hspace{1mm} except \hspace{1mm} that \hspace{1mm} Q ( terminal , . ) = 0 \\ Loop \hspace{1mm} for \hspace{1mm} each \hspace{1mm} step \hspace{1mm} of \hspace{1mm} episode: \\ Choose \hspace{1mm} A \hspace{1mm} from \hspace{1mm} S \hspace{1mm} using \hspace{1mm} some \hspace{1mm} policy \hspace{1mm} derived \hspace{1mm} from \hspace{1mm} Q (eg \hspace{1mm} \epsilon \hspace{1mm} greedy) $$
Can some pointers in writing out RL algorithms with MathJax be shared ? Ideally can my mathjax code be amended such that it renders the same output as above Q-Learning algorithm (in image) ?
Is mathjax being used in this site or some other math notation rendering library ?
