Fine-grained analysis of mode connectivity and beyond for overparameterized neural networks


One of the most intriguing findings of neural network landscape is the mode connectivity: For any two global minima, there exists a path connecting them without barrier. In this paper, we make a fine-grained analysis of the mode connectivity. Specifically, we show that in overparameterized case, the connecting path can be as simple as a two-piece linear path and the path length can be made nearly equal to the Euclidean distance. These suggest that the landscape should be still nearly convex in some sense. In addition, we also identify a star-shape structure on minima manifold: For any finite number of minima, there exists a center on minima manifold connecting all them simultaneously via linear paths. These findings provably hold for (deep) linear networks and two-layer ReLU networks under a teacher-student setup, and are empirically justified for models trained on MNIST and CIFAR-10. The key ingredient is to exploit the two-piece simplicity of connecting path, which allows us to only look at a much smaller path space when estimating the path length and proving star-shape structures.

Working Paper. (Will be on arxiv soon.)