Spread Maximization - A Novel Unsupervised Learning Paradigm Applied to Convolutional Neural Networks
Summary
Unsupervised learning provides a way to extract features from data which can be used to pre-train Artificial Neural Networks (ANNs) improving on the performance of such networks. Convolutional Neural Networks (CNNs) are a kind of ANN designed for image processing. Existing unsupervised learning techniques for CNNs are frequently based on reconstructions of the input from the output of a network layer.
We seek to establish a new paradigm for unsupervised learning in CNNs. Can CNNs extract helpful features by training based on the output of the network rather than the input? We formulate an objective which is defined by some basic properties we would like an ANN to have. Our objective propels the outputs of a layer in a neural network to be spread out through the output space, while keeping the weights low.
A first interpretation of spread maximization is dichotomization, which leads to an approach which maximizes the determinant of the covariance matrix of the layer outputs, while a second interpretation is uniformization, which leads to an approach which minimizes the distance between the current output distribution and the uniform distribution.
A third overarching approach is provided which can be used for either kind of spread maximization. It is based on a generalization of Hebbian learning, which reduces to the Generalized Hebbian Algorithm (GHA) for a specific simple kind of ANN. Since GHA causes the weight vectors of the network to converge to the first n principal loading vectors, which is the objective of Principal Component Analysis (PCA), we say that the application of generalized Hebbian learning to CNNs performs Pooled Convolutional Component Analysis.
Experimental results show that spread maximization techniques can outperform conventional techniques such as Pooling Convolutional Auto-Encoders. One method for dichotomization achieves an error rate of 1.3% on the MNIST dataset with the LeNet-1 network structure, beating the 1.7% it has been reported to achieve without unsupervised learning.
In order to maximize the potential of our unsupervised learning techniques, we introduce a new kind of pooling function which reduces the error of purely supervised learning on LeNet-1 to 1.44% and a supervised pretraining stage which seems to be beneficial to any type of unsupervised pre-training technique.