Improving Sound EventDetection Using Neural NetworkTrees
MetadataShow full item record
Sound Event Detection (SED), also known as Audio Event Detection, is the task of labelling audio files with what information is represented in the audio, and when. Most current state-of-the-art SED systems use neural networks. To try and improve on these state-of-the-art methods this work introduces a new method called Branching Neural Networks, that uses multiple neural networks grouped into a tree structure to create a SED. The properties of this branching neural network are researched by testing the branching neural network method on multiple subsets of a dataset created for testing SED systems during the Detection and Classification of Acoustic Scenes and Events (DCASE) competition in 2017, the DCASE 2017 task 4 dataset. The branching neural network system seems less negatively influenced by us-ing a non-optimal threshold. Where the threshold is the minimum confidence value for which the SED system accepts that any sound event has occurred. In some subsets, the branching neural network system has shown to have significantly better results for all threshold values. The input to this branching neural network system is the input to a standardised system, where the classes to train on is not presented as a list, but a tree structure instead. The tree structure, when used by the branching neural network, has the added benefit of providing intermediate results that have much-increased accuracy over the end results. The disadvantage of the tree structure is the requirement for extra tuning, as non-optimal trees can significantly decrease performance when compared to the system with an optimal tree.