You Want the Truth – Adapt Training Domain to Improve Q&A on Technical Text

Recent advancements in Natural Language Processing (NLP) have been driven by the confluence of large, pre-trained Transformer-based language models and increased availability of and attention to unstructured text data. Unlike numerical “big data,” text data lacks the necessary structure that typical business analytics datastores such as SQL rely on to efficiently extract valuable insights.

In this whitepaper, we experiment with several recent methods from the NLP literature for adapting state-of-the-art, out-of-the-box, open-domain QA systems to a large, highly technical text corpus. These methods include Domain Adaptive Pretraining and Synthetic QA Fine-Tuning for adapted Machine Reading Comprehension, as well as adapted Dense Passage Retrieval for domain-specific Information Retrieval. We highlight some technical challenges that we encountered in improving performance with domain adaptation, and recommend how to best use these systems in practical settings.

View the White Paper