Preselection Bandits under the Plackett-Luce Model
V. Bengs, E. Hüllermeier, ArXiv:1907.06123 (n.d.).
Download
No fulltext has been uploaded.
Preprint
| Accepted
| English
Department
Abstract
In this paper, we introduce the Preselection Bandit problem, in which the
learner preselects a subset of arms (choice alternatives) for a user, which
then chooses the final arm from this subset. The learner is not aware of the
user's preferences, but can learn them from observed choices. In our concrete
setting, we allow these choices to be stochastic and model the user's actions
by means of the Plackett-Luce model. The learner's main task is to preselect
subsets that eventually lead to highly preferred choices. To formalize this
goal, we introduce a reasonable notion of regret and derive lower bounds on the
expected regret. Moreover, we propose algorithms for which the upper bound on
expected regret matches the lower bound up to a logarithmic term of the time
horizon.
Publishing Year
Journal Title
arXiv:1907.06123
LibreCat-ID
Cite this
Bengs V, Hüllermeier E. Preselection Bandits under the Plackett-Luce Model. arXiv:190706123.
Bengs, V., & Hüllermeier, E. (n.d.). Preselection Bandits under the Plackett-Luce Model. ArXiv:1907.06123.
@article{Bengs_Hüllermeier, title={Preselection Bandits under the Plackett-Luce Model}, journal={arXiv:1907.06123}, author={Bengs, Viktor and Hüllermeier, Eyke} }
Bengs, Viktor, and Eyke Hüllermeier. “Preselection Bandits under the Plackett-Luce Model.” ArXiv:1907.06123, n.d.
V. Bengs and E. Hüllermeier, “Preselection Bandits under the Plackett-Luce Model,” arXiv:1907.06123. .
Bengs, Viktor, and Eyke Hüllermeier. “Preselection Bandits under the Plackett-Luce Model.” ArXiv:1907.06123.