Data-Driven Requirements Engineering
Summary
Traditional Requirement Engineering (RE) has relied heavily on the communication between stakeholders. Today, the increasing size and usage of software products has complicated this process. This gives rise to a trend of a more data-centered approach to RE where user data is collected and elicited on a larger scale. This research uses Natural Language Processing (NLP) techniques to automatically organize a large collection of textual user feedback in order to help requirements analysts cope with vast amounts of information. We use several combinations of the Part-of-Speech tags Nouns, Verbs, Named-Entities and Adjectives and combine these with a supervised and an unsupervised clustering algorithm. We are interested in the difference in effect between algorithms that generate a fixed number of clusters and those that determine an optimal number of clusters, we use the K-Means and Meanshift clustering algorithms for this purpose. We test the use of NLP and clustering by conducting an experiment with requirements analysts in a large software development company. We use two cluster-evaluation metrics and several non-parametric tests. The obtained results, although preliminary seem to indicate that a combination of nouns and Named Entities is most informative for requirements analysts, yet no statistical evidence is found. We designed and tested an early prototype of a dashboard that enables requirements analysts to navigate a large collection of user feedback that has been automatically organized using POS tags and clustering. A preliminary evaluation with experienced analysts shows that the dashboard can effectively be used to explore a corpus of user feedback and group textually related items and it is stated how the dashboard could benefit from additional interactive features.