Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets - Details

author：

Wan, Zongqi (Wan, Zongqi.) ^[1] | Zhang, Zhijie (Zhang, Zhijie.) ^[2] | Li, Tongyang (Li, Tongyang.) ^[3] | Zhang, Jialin (Zhang, Jialin.) ^[4] | Sun, Xiaoming (Sun, Xiaoming.) ^[5]

Indexed by：

Abstract：

Multi-arm　bandit　(MAB)　and　stochastic　linear　bandit　(SLB)　are　important　models　in　reinforcement　learning,　and　it　is　well-known　that　classical　algorithms　for　bandits　with　time　horizon　T　suffer　(√T)　regret.　In　this　paper,　we　study　MAB　and　SLB　with　quantum　reward　oracles　and　propose　quantum　algorithms　for　both　models　with　O(poly(log　T))　regrets,　exponentially　improving　the　dependence　in　terms　of　T.　To　the　best　of　our　knowledge,　this　is　the　first　provable　quantum　speedup　for　regrets　of　bandit　problems　and　in　general　exploitation　in　reinforcement　learning.　Compared　to　previous　literature　on　quantum　exploration　algorithms　for　MAB　and　reinforcement　learning,　our　quantum　input　model　is　simpler　and　only　assumes　quantum　oracles　for　each　individual　arm.　Copyright　©　2023,　Association　for　the　Advancement　of　Artificial　Intelligence　(www.aaai.org).　All　rights　reserved.

Keyword：

Learning algorithms Learning systems Quantum theory Reinforcement learning Stochastic models Stochastic systems

Community：

[ 1 ] [Wan, Zongqi]Institute of Computing Technology, Chinese Academy of Sciences, China
[ 2 ] [Wan, Zongqi]University of Chinese Academy of Sciences, China
[ 3 ] [Zhang, Zhijie]Center for Applied Mathematics of Fujian Province, School of Mathematics and Statistics, Fuzhou University, China
[ 4 ] [Li, Tongyang]Center on Frontiers of Computing Studies, Peking University, China
[ 5 ] [Li, Tongyang]School of Computer Science, Peking University, China
[ 6 ] [Zhang, Jialin]Institute of Computing Technology, Chinese Academy of Sciences, China
[ 7 ] [Zhang, Jialin]University of Chinese Academy of Sciences, China
[ 8 ] [Sun, Xiaoming]Institute of Computing Technology, Chinese Academy of Sciences, China
[ 9 ] [Sun, Xiaoming]University of Chinese Academy of Sciences, China

Reprint 's Address：

Email：

Show more details

Related Keywords：

Online Identification of Nonlinear Systems With Separable Structure
2024，IEEE Transactions on Neural Networks and Learning Systems
Review of Power System Transient Stability Control Strategies Based on Deep Reinforcement Learning
2023，High Voltage Engineering
Coins Game - A Novel Decision Problem and Its Solving Using Reinforcement Learning
2024，4th Asia Conference on Information Engineering, ACIE 2024
Voltage Control Method of Distribution Network with Soft Open Point Based on Deep Reinforcement Learning
2024，High Voltage Engineering

Source ：

Year： 2023

Volume： 37

Page： 10087-10094

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

数学与统计学院本学院/部未明确归属的数据

Get Fulltext

Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to