dc.description.abstract | Traditional Requirement Engineering (RE) has relied heavily on the communication between stakeholders. Today, the increasing size and usage of software products has complicated this process. This gives rise to a trend of a more data-centered approach to RE where user data is collected and elicited on a larger scale. This research uses Natural Language Processing (NLP) techniques to automatically organize a large collection of textual user feedback in order to help requirements analysts cope with vast amounts of information. We use several combinations of the Part-of-Speech tags Nouns, Verbs, Named-Entities and Adjectives and combine these with a supervised and an unsupervised clustering algorithm. We are interested in the difference in effect between algorithms that generate a fixed number of clusters and those that determine an optimal number of clusters, we use the K-Means and Meanshift clustering algorithms for this purpose. We test the use of NLP and clustering by conducting an experiment with requirements analysts in a large software development company. We use two cluster-evaluation metrics and several non-parametric tests. The obtained results, although preliminary seem to indicate that a combination of nouns and Named Entities is most informative for requirements analysts, yet no statistical evidence is found. We designed and tested an early prototype of a dashboard that enables requirements analysts to navigate a large collection of user feedback that has been automatically organized using POS tags and clustering. A preliminary evaluation with experienced analysts shows that the dashboard can effectively be used to explore a corpus of user feedback and group textually related items and it is stated how the dashboard could benefit from additional interactive features. | |