Convergent learning in unknown hypergraphical games
Abstract:
The combination of classical game-theoretical learning algorithms with reinforcement learning of joint action values results in provably convergent learning algorithms even in situations where the rewards are unknown and only observed under noise. In particular we prove the convergence of Q-learned fictitious play in potential games and Q-learned adaptive play in weakly acyclic games. Learning of joint action values and observation of opponent strategies for any game relevant to distributed control is, however, likely to be prohibitive. We therefore demonstrate how to take advantage of the hypergraphical structure inherent in distributed optimisation frameworks to alleviate the problem.