The survey of linear regions

文怡 杨
6 min readJun 19, 2021

Abstract

The maximum number of linear regions is regarded as the representational power of the network. This paper reviews works about linear regions on DNN belonging to the family of PWL functions in recent year, which I categorize as: study the number of linear regions bound theoretically, study the number of linear regions bound empirically. These ideas provoke thought and discussion regarding the themes of understanding the interaction between neurons, network architecture and how interaction of parameters affect linear regions.

  1. Why this survey is important?

Over the past few years, the question about how to design weights and bias for a neural network with a certain architecture or how to design an architecture with less parameters to obtain maximum activation patterns possible arises a wide range of thinking. The concept of linear regions is introduced as mathematical modeling of expressive power of DNNs with piecewise linear activations, in other words, the greater the number of linear regions means more activation patterns may exist. Latest works mainly discuss the factors that will affect the number of linear regions to restrict the boundary of the number of linear regions both theoretically and empirically. These works contribute to rethink the network structure and how does each unit can affect the representational power of the whole neural network, the future direction may focus more on how to set weights and bias to make the network have better generalization and precision by the study of linear regions.

2. Summary and Insight

2.1 Study Shallow Neural Networks

2.2 Study Deep Neural Networks theoretically

The idea used in shallow neural networks can be generalized into multiple layers. This can be intuitively interpreted as space folding, each hidden layer which can be thought as a folding operator folds the space of activations of the previous layer. In turn, a deep neural network effectively folds its input-space recursively, starting with the first layer. The key idea is that the -th layer’s linear regions are obtained by operating on the space produced by to layers. The predecessors did a lot of work to approximate the number of linear regions.

2.2 Study Deep Neural Networks empirically

In this section, I mainly summarize the two views to study expressiveness of network by exploring linear regions. The first one is a method to perform exact enumeration or counting of the number of regions by modeling the DNN with a mixed-integer linear formulation. Serra et al. (2018) The interaction between neurons may possible be rethink by this way. The second one provides a new way to consider linear regions by exploring weights and bias for any network structures. (Hanin and Rolnick 2019a; 2019b).

The solutions on can be enumerated using the one-tree algorithm (Danna et al. 2007). Finding a feasible solution to a MILP is NP-complete (Cook 1971) and thus optimization is NP-hard. However, a feasible solution in this case can be obtained from any valid input (Fischetti and Jo 2018). So based on MBound (Gomes, Sabharwal, and Selman 2006a), Serra (2020) denote MIPBound algorithm to solve it. The basic idea is dividing the solution space by adding constraint until it unsolvable.

Insights on the discussion: In my opinion, the key idea from the first view can be summarized as following 2 points. The first point is that the author uses mathematical formulars to describe the entire DNN with Relu activation function. This means restrictions or combinations may exist in different units, in other words, units are dependent. This can be proved from feature visualization, (Olah et al.2017) which proposed individual neuron activations as the basis vectors of this activation space and a combination of neuron activations is a vector in this space. Another intuitive way to understand is that if units are independent, the network can be overfit, while dependent enables network be generalized. I infer this may be connected to reasons why during training, the maximal number of linear regions goes down before they increase. The second point is the introduction of the concept about ‘stable units’ (for some or all possible inputs, some units in network will always be active or always be inactive) and ‘unstable units’. Serra (2020) based these concept and MILP proposed a new perspective on compressing the network architecture without loss. By removing parts of neural network while not changing the output that is produced, this hints us a different way to consider different role played by a single neuron in the network.

3. Relative Work and Further Work

Besides studying the maximal number of linear regions, Dongrui Wu (2020) study their local properties and the relevance of the surrounding regions.

To better understand the behaviors of DNNs, the future work can concentrate on three directions: Each unit’s contribution on learning; Interaction between neurons and How to design weights and bias to get more activation patterns.

4. Reference

Olah, C., Mordvintsev, A., & Schubert, L. (2017). Feature Visualization. Distill, 2(11), e7. https://doi.org/10.23915/distill.00007

The, T. O. N. (2020). S Tudies on the P Revention of P Eriodontal D Iseases. 24(1), 10–16.

Hanin, B., & Rolnick, D. (2019). Complexity of linear regions in deep networks. 36th International Conference on Machine Learning, ICML 2019, 2019-June, 4585–4600.

Serra, T., Kumar, A., & Ramalingam, S. (2020). Lossless Compression of Deep Neural Networks. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12296 LNCS, 417–430. https://doi.org/10.1007/978-3-030-58942-4_27

Serra, T., & Ramalingam, S. (2018). Empirical bounds on linear regions of deep rectifier networks. ArXiv. https://doi.org/10.1609/aaai.v34i04.6016

Serra, T., Tjandraatmadja, C., & Ramalingam, S. (2017). Bounding and counting linear regions of deep neural networks. ArXiv.

Mont, G. (2014). On the Number of Linear Regions of Deep Neural Networks. 1–9.

Montúfar, G., Pascanu, R., Cho, K., & Bengio, Y. (2014). On the number of linear regions of deep neural networks. Advances in Neural Information Processing Systems, 4(January), 2924–2932.

Pascanu, R., Montúfar, G., & Bengio, Y. (2014). On the number of response regions of deep feedforward networks with piecewise linear activations. 2nd International Conference on Learning Representations, ICLR 2014 — Conference Track Proceedings, 1–17.

Danna, E., Fenelon, M., Gu, Z., and Wunderling, R. Gener- ating multiple solutions for mixed integer programming problems. In IPCO, pp. 280–294. 2007.

Cook, S. A. 1971. The complexity of theorem-proving procedures. In STOC.

Fischetti, M., and Jo, J. 2018. Deep neural networks as 0–1 mixed integer linear programs: A feasibility study. In CPAIOR.

Gomes, C. P.; Sabharwal, A.; and Selman, B. 2006a. Model count- ing: A new strategy for obtaining good bounds. In AAAI

--

--