---
res:
bibo_abstract:
- "We consider a resource-aware variant of the classical multi-armed bandit\r\nproblem:
In each round, the learner selects an arm and determines a resource\r\nlimit.
It then observes a corresponding (random) reward, provided the (random)\r\namount
of consumed resources remains below the limit. Otherwise, the\r\nobservation is
censored, i.e., no reward is obtained. For this problem setting,\r\nwe introduce
a measure of regret, which incorporates the actual amount of\r\nallocated resources
of each learning round as well as the optimality of\r\nrealizable rewards. Thus,
to minimize regret, the learner needs to set a\r\nresource limit and choose an
arm in such a way that the chance to realize a\r\nhigh reward within the predefined
resource limit is high, while the resource\r\nlimit itself should be kept as low
as possible. We derive the theoretical lower\r\nbound on the cumulative regret
and propose a learning algorithm having a regret\r\nupper bound that matches the
lower bound. In a simulation study, we show that\r\nour learning algorithm outperforms
straightforward extensions of standard\r\nmulti-armed bandit algorithms.@eng"
bibo_authorlist:
- foaf_Person:
foaf_givenName: Viktor
foaf_name: Bengs, Viktor
foaf_surname: Bengs
- foaf_Person:
foaf_givenName: Eyke
foaf_name: HÃ¼llermeier, Eyke
foaf_surname: HÃ¼llermeier
dct_date: 2020^xs_gYear
dct_language: eng
dct_title: Multi-Armed Bandits with Censored Consumption of Resources@
...