"Momentum Annealing" (MA), a new algorithm for CMOS annealing, is an optimization algorithm for fully-connected models that quickly finds practical solutions to large-scale scheduling optimization and portfolio optimization problems. We implemented this algorithm on the GPU^{*1} and verified it using a 100,000-variable combinatorial optimization problem (fully-connected problem), and confirmed that it is 250 times faster than the conventional algorithm.
MA is characterized by the transformation of QUBO with connections between all variables (Figure 1a) into a fully bipartite graphically connected QUBO (Figure 1b) while preserving the global optimal solution.
Simulated Annealing (SA), which is employed by conventional CMOS annealing machines, is an optimal solution search algorithm based on the Markov chain Monte Carlo method. In this case, we expect to reach a global optimal solution or its approximate solution by sequentially and stochastically updating the variables. Due to the theoretical background of the Markov chain Monte Carlo method, linked variables cannot be updated simultaneously. Therefore, when solving an fully-connected problem with connections between all variables, only one variable can be updated at a time.
However, in MA, the connections between variables are fully bipartite, and the variables on the left side of Figure 1b are not connected to each other, so they can be updated simultaneously, reducing computation time through parallel computing. The same applies to the variables on the right. This kind of parallel computing can be used to speed up the solution search.
The MA searches for solutions using a model with new connections shown by the red line in Figure 1b. If the effect of this connection is excessively large, it becomes easy to get trapped in local solutions, and difficult to search for good quality solutions in various states. On the other hand, if the coupling is too small, the QUBO to be solved originally (Fig. 1a) is not appropriate because it is different from the global optimal solution of the QUBO dealt with in MA (Fig. 1b). The optimal solution to each problem should be consistent, while making the new connections that are created as small as possible. Therefore, we derived an inequality using the maximum eigenvalue of the matrix that represents the connection between variables, and made it possible to search for good quality solutions by appropriately setting the size of the connection of the red lines.
Based on these, we have created a program that utilizes GPU, which are widely used in scientific and technical computing, such as deep learning, as a fast way to execute a QUBO solution search with connections between all variables. We ran MA on four NVIDIA Tesla P100^{*2} to deal with 100,000 variables of QUBO with connections between variables represented by single-precision floats. We reported that the computation time to reach a solution accuracy equivalent to the approximate solution obtained by one of the optimization algorithms, Sahni-Gonzales (SG), was 1/250th of the computation time compared to SA run on IBM Power 8 (Figure 2).