TY - JOUR
AB - AbstractThe efficiency of state-of-the-art algorithms for the dueling bandits problem is essentially due to a clever exploitation of (stochastic) transitivity properties of pairwise comparisons: If one arm is likely to beat a second one, which in turn is likely to beat a third one, then the first is also likely to beat the third one. By now, however, there is no way to test the validity of corresponding assumptions, although this would be a key prerequisite to guarantee the meaningfulness of the results produced by an algorithm. In this paper, we investigate the problem of testing different forms of stochastic transitivity in an online manner. We derive lower bounds on the expected sample complexity of any sequential hypothesis testing algorithm for various forms of stochastic transitivity, thereby providing additional motivation to focus on weak stochastic transitivity. To this end, we introduce an algorithmic framework for the dueling bandits problem, in which the statistical validity of weak stochastic transitivity can be tested, either actively or passively, based on a multiple binomial hypothesis test. Moreover, by exploiting a connection between weak stochastic transitivity and graph theory, we suggest an enhancement to further improve the efficiency of the testing algorithm. In the active setting, both variants achieve an expected sample complexity that is optimal up to a logarithmic factor.
AU - Haddenhorst, Björn
AU - Bengs, Viktor
AU - Hüllermeier, Eyke
ID - 25035
JF - Machine Learning
SN - 0885-6125
TI - On testing transitivity in online preference learning
ER -