---
res:
  bibo_abstract:
  - "We consider a resource-aware variant of the classical multi-armed bandit\r\nproblem:
    In each round, the learner selects an arm and determines a resource\r\nlimit.
    It then observes a corresponding (random) reward, provided the (random)\r\namount
    of consumed resources remains below the limit. Otherwise, the\r\nobservation is
    censored, i.e., no reward is obtained. For this problem setting,\r\nwe introduce
    a measure of regret, which incorporates the actual amount of\r\nallocated resources
    of each learning round as well as the optimality of\r\nrealizable rewards. Thus,
    to minimize regret, the learner needs to set a\r\nresource limit and choose an
    arm in such a way that the chance to realize a\r\nhigh reward within the predefined
    resource limit is high, while the resource\r\nlimit itself should be kept as low
    as possible. We derive the theoretical lower\r\nbound on the cumulative regret
    and propose a learning algorithm having a regret\r\nupper bound that matches the
    lower bound. In a simulation study, we show that\r\nour learning algorithm outperforms
    straightforward extensions of standard\r\nmulti-armed bandit algorithms.@eng"
  bibo_authorlist:
  - foaf_Person:
      foaf_givenName: Viktor
      foaf_name: Bengs, Viktor
      foaf_surname: Bengs
  - foaf_Person:
      foaf_givenName: Eyke
      foaf_name: Hüllermeier, Eyke
      foaf_surname: Hüllermeier
  dct_date: 2020^xs_gYear
  dct_language: eng
  dct_title: Multi-Armed Bandits with Censored Consumption of Resources@
...