Online Learning of Sparse Network Architectures
Summary
Modern neural network architectures can have as many as hundreds of millions of parameters. This makes them power and memory-hungry, and impedes running networks on resource-constrained devices such as phones. Sparse networks can achieve performance similar to that of dense networks, with a fraction of the parameters. However, sparsification is usually done as an afterthought, without benefits in the learning phase. In this thesis, we propose to simultaneously optimise network architecture and parameters. Apart from sparsity benefits, this eliminates the need to choose a particular network architecture in advance.