[RL Notes] 时序差分的目标
1. 半梯度下降 对于通用 SGD 方法 \begin{equation} \boldsymbol{\mathrm{w}}_{t+1} \doteq \boldsymbol{\mathrm{w}}_t + \alpha \big[U_t – \hat{v}(s, \boldsymbol{\mathrm{w}}_t)^2\big] \nabla \hat{v}(s, …
Read more
learn, build, evaluate
1. 半梯度下降 对于通用 SGD 方法 \begin{equation} \boldsymbol{\mathrm{w}}_{t+1} \doteq \boldsymbol{\mathrm{w}}_t + \alpha \big[U_t – \hat{v}(s, \boldsymbol{\mathrm{w}}_t)^2\big] \nabla \hat{v}(s, …
Read more