{"user_id":"15278","department":[{"_id":"27"}],"title":"Multi-Armed Bandits with Censored Consumption of Resources","author":[{"first_name":"Viktor","full_name":"Bengs, Viktor","last_name":"Bengs"},{"first_name":"Eyke","last_name":"Hüllermeier","full_name":"Hüllermeier, Eyke"}],"year":"2020","language":[{"iso":"eng"}],"publication":"arXiv:2011.00813","_id":"32242","citation":{"chicago":"Bengs, Viktor, and Eyke Hüllermeier. “Multi-Armed Bandits with Censored Consumption of Resources.” ArXiv:2011.00813, 2020.","short":"V. Bengs, E. Hüllermeier, ArXiv:2011.00813 (2020).","ieee":"V. Bengs and E. Hüllermeier, “Multi-Armed Bandits with Censored Consumption of Resources,” arXiv:2011.00813. 2020.","ama":"Bengs V, Hüllermeier E. Multi-Armed Bandits with Censored Consumption of Resources. arXiv:201100813. Published online 2020.","bibtex":"@article{Bengs_Hüllermeier_2020, title={Multi-Armed Bandits with Censored Consumption of Resources}, journal={arXiv:2011.00813}, author={Bengs, Viktor and Hüllermeier, Eyke}, year={2020} }","apa":"Bengs, V., & Hüllermeier, E. (2020). Multi-Armed Bandits with Censored Consumption of Resources. In arXiv:2011.00813.","mla":"Bengs, Viktor, and Eyke Hüllermeier. “Multi-Armed Bandits with Censored Consumption of Resources.” ArXiv:2011.00813, 2020."},"date_updated":"2022-06-28T07:27:19Z","external_id":{"arxiv":["2011.00813"]},"date_created":"2022-06-28T07:26:54Z","project":[{"_id":"52","name":"PC2: Computing Resources Provided by the Paderborn Center for Parallel Computing"}],"status":"public","type":"preprint","abstract":[{"lang":"eng","text":"We consider a resource-aware variant of the classical multi-armed bandit\r\nproblem: In each round, the learner selects an arm and determines a resource\r\nlimit. It then observes a corresponding (random) reward, provided the (random)\r\namount of consumed resources remains below the limit. Otherwise, the\r\nobservation is censored, i.e., no reward is obtained. For this problem setting,\r\nwe introduce a measure of regret, which incorporates the actual amount of\r\nallocated resources of each learning round as well as the optimality of\r\nrealizable rewards. Thus, to minimize regret, the learner needs to set a\r\nresource limit and choose an arm in such a way that the chance to realize a\r\nhigh reward within the predefined resource limit is high, while the resource\r\nlimit itself should be kept as low as possible. We derive the theoretical lower\r\nbound on the cumulative regret and propose a learning algorithm having a regret\r\nupper bound that matches the lower bound. In a simulation study, we show that\r\nour learning algorithm outperforms straightforward extensions of standard\r\nmulti-armed bandit algorithms."}]}