Tag Archive: Time Difference

[RL Notes] 时序差分学习

Author: nex3z 2019-10-26

在预测问题中，我们的目标是估计价值函数 \begin{equation} v_\pi(s) \doteq \mathbb{E}[G_t|S_t = s] \tag{1} \end{equation} 即从给定状态开始能获得的回报。在使用蒙特卡洛方法进行策略评估时，可以通过下式增量地对估计值进行更新 \begin{equation} V(S_t) \leftarrow V(S…
Read more

Reinforcement Learning

Prediction, Reinforcement Learning, Time Difference

一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31