How Good is Forced-Exploration in Linear Stochastic Bandits?


Invited Talk
Csaba Szepesvari (University of Alberta)

Abstract:

Assume that in a stochastic bandit problem the payoff is a linear function of the actions.  What is a good algorithm for this class of problems?  In this talk we will investigate an algorithm that randomly explores the action space for some number of time steps, otherwise it plays the greedy action.  It turns out that for this class of problems this algorithm is not as bad as one might think first.  The theory will be illustrated by some numerical experiments on the ad allocation problem.