<?xml version="1.0" encoding="UTF-8"?>

<modsCollection xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/mods/v3" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
<mods version="3.3">

<genre>preprint</genre>

<titleInfo><title>Multi-Armed Bandits with Censored Consumption of Resources</title></titleInfo>





<name type="personal">
  <namePart type="given">Viktor</namePart>
  <namePart type="family">Bengs</namePart>
  <role><roleTerm type="text">author</roleTerm> </role></name>
<name type="personal">
  <namePart type="given">Eyke</namePart>
  <namePart type="family">Hüllermeier</namePart>
  <role><roleTerm type="text">author</roleTerm> </role></name>







<name type="corporate">
  <namePart></namePart>
  <identifier type="local">34</identifier>
  <role>
    <roleTerm type="text">department</roleTerm>
  </role>
</name>

<name type="corporate">
  <namePart></namePart>
  <identifier type="local">7</identifier>
  <role>
    <roleTerm type="text">department</roleTerm>
  </role>
</name>

<name type="corporate">
  <namePart></namePart>
  <identifier type="local">355</identifier>
  <role>
    <roleTerm type="text">department</roleTerm>
  </role>
</name>





<name type="corporate">
  <namePart>Computing Resources Provided by the Paderborn Center for Parallel Computing</namePart>
  <role><roleTerm type="text">project</roleTerm></role>
</name>



<abstract lang="eng">We consider a resource-aware variant of the classical multi-armed bandit
problem: In each round, the learner selects an arm and determines a resource
limit. It then observes a corresponding (random) reward, provided the (random)
amount of consumed resources remains below the limit. Otherwise, the
observation is censored, i.e., no reward is obtained. For this problem setting,
we introduce a measure of regret, which incorporates the actual amount of
allocated resources of each learning round as well as the optimality of
realizable rewards. Thus, to minimize regret, the learner needs to set a
resource limit and choose an arm in such a way that the chance to realize a
high reward within the predefined resource limit is high, while the resource
limit itself should be kept as low as possible. We derive the theoretical lower
bound on the cumulative regret and propose a learning algorithm having a regret
upper bound that matches the lower bound. In a simulation study, we show that
our learning algorithm outperforms straightforward extensions of standard
multi-armed bandit algorithms.</abstract>

<originInfo><dateIssued encoding="w3cdtf">2020</dateIssued>
</originInfo>
<language><languageTerm authority="iso639-2b" type="code">eng</languageTerm>
</language>



<relatedItem type="host"><titleInfo><title>arXiv:2011.00813</title></titleInfo>
<part>
</part>
</relatedItem>


<extension>
<bibliographicCitation>
<bibtex>@article{Bengs_Hüllermeier_2020, title={Multi-Armed Bandits with Censored Consumption of Resources}, journal={arXiv:2011.00813}, author={Bengs, Viktor and Hüllermeier, Eyke}, year={2020} }</bibtex>
<mla>Bengs, Viktor, and Eyke Hüllermeier. “Multi-Armed Bandits with Censored Consumption of Resources.” &lt;i&gt;ArXiv:2011.00813&lt;/i&gt;, 2020.</mla>
<short>V. Bengs, E. Hüllermeier, ArXiv:2011.00813 (2020).</short>
<apa>Bengs, V., &amp;#38; Hüllermeier, E. (2020). Multi-Armed Bandits with Censored Consumption of Resources. &lt;i&gt;ArXiv:2011.00813&lt;/i&gt;.</apa>
<ama>Bengs V, Hüllermeier E. Multi-Armed Bandits with Censored Consumption of Resources. &lt;i&gt;arXiv:201100813&lt;/i&gt;. 2020.</ama>
<chicago>Bengs, Viktor, and Eyke Hüllermeier. “Multi-Armed Bandits with Censored Consumption of Resources.” &lt;i&gt;ArXiv:2011.00813&lt;/i&gt;, 2020.</chicago>
<ieee>V. Bengs and E. Hüllermeier, “Multi-Armed Bandits with Censored Consumption of Resources,” &lt;i&gt;arXiv:2011.00813&lt;/i&gt;. 2020.</ieee>
</bibliographicCitation>
</extension>
<recordInfo><recordIdentifier>21536</recordIdentifier><recordCreationDate encoding="w3cdtf">2021-03-18T11:27:37Z</recordCreationDate><recordChangeDate encoding="w3cdtf">2022-01-06T06:55:03Z</recordChangeDate>
</recordInfo>
</mods>
</modsCollection>
