--- res: bibo_abstract: - "We consider a resource-aware variant of the classical multi-armed bandit\r\nproblem: In each round, the learner selects an arm and determines a resource\r\nlimit. It then observes a corresponding (random) reward, provided the (random)\r\namount of consumed resources remains below the limit. Otherwise, the\r\nobservation is censored, i.e., no reward is obtained. For this problem setting,\r\nwe introduce a measure of regret, which incorporates the actual amount of\r\nallocated resources of each learning round as well as the optimality of\r\nrealizable rewards. Thus, to minimize regret, the learner needs to set a\r\nresource limit and choose an arm in such a way that the chance to realize a\r\nhigh reward within the predefined resource limit is high, while the resource\r\nlimit itself should be kept as low as possible. We derive the theoretical lower\r\nbound on the cumulative regret and propose a learning algorithm having a regret\r\nupper bound that matches the lower bound. In a simulation study, we show that\r\nour learning algorithm outperforms straightforward extensions of standard\r\nmulti-armed bandit algorithms.@eng" bibo_authorlist: - foaf_Person: foaf_givenName: Viktor foaf_name: Bengs, Viktor foaf_surname: Bengs - foaf_Person: foaf_givenName: Eyke foaf_name: Hüllermeier, Eyke foaf_surname: Hüllermeier dct_date: 2020^xs_gYear dct_language: eng dct_title: Multi-Armed Bandits with Censored Consumption of Resources@ ...