Using Reinforcement Learning for Per-Instance Algorithm Configuration on the TSP
Automated Algorithm Configuration (AAC) usually takes a global perspective: it identifies a parameter configuration for an (optimization) algorithm that maximizes a performance metric over a set of instances. However, the optimal choice of parameters strongly depends on the instance at hand and should thus be calculated on a per-instance basis. We explore the potential of Per-Instance Algorithm Configuration (PIAC) by using Reinforcement Learning (RL). To this end, we propose a novel PIAC approach that is based on deep neural networks. We apply it to predict configurations for the Lin\textendash Kernighan heuristic (LKH) for the Traveling Salesperson Problem (TSP) individually for every single instance. To train our PIAC approach, we create a large set of 100000 TSP instances with 2000 nodes each \textemdash currently the largest benchmark set to the best of our knowledge. We compare our approach to the state-of-the-art AAC method Sequential Model-based Algorithm Configuration (SMAC). The results show that our PIAC approach outperforms this baseline on both the newly created instance set and established instance sets.
361 - 368
361 - 368