A thorough comparison of NLP tools for requirements quality improvement

Arendse, B.

View/Open

brian_arendse_thesis_final.pdf (2.025Mb)

Publication date

2016

Author

Arendse, B.

Metadata

Show full item record

Summary

One of the main reasons why software projects fail is because of the poor quality of requirements, or the lack of any documented requirements. In the last two decades there has been an increasing interest in the development of tools using Natural Language Processing (NLP) to support the formulation, documentation and verification of NL requirements. Over the years numerous NLP tools have been developed to improve the quality of requirements. Because of the many approaches available it is not clear how the approaches relate to each other. The goal of this thesis is therefore to get a clear overview of the performance of the main approaches taken by NLP tools in the requirements engineering (RE) landscape, and to create a theoretical tool that synergistically integrates the best approaches. The scope is on finding defects and deviations in natural language requirements. A literature study is performed to identify the main 50 NLP tools in the (RE) landscape. After an initial analysis 3 tools are selected for further analysis. Derived from the features of these 3 tools a requirement standard is created to specify what a quality defect is for each feature. Using the requirement standard 4 datasets are tagged for quality defects. These tagged datasets are compared against the output of the tools using the metrics precision and recall to measure the performance of the features of the 3 tools. Based on the performance of the features and a qualitative analysis of the approaches of those features a set of good and bad practices is derived: 1. Different tokenizers: The choice to which tokenizer to use can have an effect (both positive and negative) on the performance of a tool 2. Dictionary vs. Parsing: Using a dictionary is a safe and simple method to detect defects. Parsing is a more complicated approaches, and when not performed correctly it can have a negative effect on the performance of a tool 3. What is in the dictionary: The size and content of a dictionary can have an effect on the performance (both recall and precision) of a tool, the bigger the dictionary, the better The performance of the features and the set of good and bad practices lead to the design of a next generation tool. This tool incorporates the best performing approaches (regarding recall) for each feature specified in the requirement standard. NLP tool developers can use the set of good and bad practices and the design of the next generation tool for the development of their own NLP tools.

URI

https://studenttheses.uu.nl/handle/20.500.12932/23654

Collections

Theses