论文部分内容阅读
平均奖赏强化学习是强化学习中的一类重要的非折扣最优性框架,目前大多工作都主要是在离散域进行.本文尝试将平均奖赏强化学习算法和函数估计结合来解决连续状态空间的问题,并根据状态域的改变,相应修改 R-learning 和 G-learning 中参数的更新条件.此外对结合函数估计的 G-learning 算法的性能表现及其对各种参数的敏感程度进行针对性研究.最后给出实验结果及分析.实验结果证明 R-learning 和 G-learning 在ε较小的情况下解容易发散,同时也说明特征抽取方法 Tile coding 的有效性,且可作为其它特征抽取方法的参考标准.
The average rewards reinforcement learning is an important non-discount optimal framework in reinforcement learning, and most of the work is mainly done in the discrete domain at present.This paper attempts to solve the problem of continuous state space by combining the average rewards reinforcement learning algorithm and function estimation , And modify the parameters updating conditions in R-learning and G-learning according to the change of state domain.In addition, the performance of G-learning algorithm and the sensitivity to various parameters are studied. Finally, the experimental results and analysis are given.The experimental results show that the solution of R-learning and G-learning is easy to diverge when ε is small, and also illustrates the effectiveness of the feature extraction method Tile coding and can be used as a reference for other feature extraction methods standard.