The dual-use of BERT for regulatory compliance
Summary
Dual-use goods are items that have both commercial and military or proliferation applications. This thesis investigates whether BERT, a contextual language model, can be used to identify dual-use goods based on a short description of the good and whether it can be improved by augmenting it with relational dual-use knowledge. Two methods from augmenting BERT are explored: by further pre-training BERT on relevant synthetic sentences from the KELM corpus and by augmenting it with knowledge graph embeddings (KGEs) created from Wikidata. The use of KGEs can improve the performance of a logistic regression model on the dual-use identification task. All implementations of BERT perform well on the dual-use identification task and have a better performance compared to the logistic regression models. None of the BERT implementations augmented with relational dual-use knowledge outperformed the plain implementation of BERT.