Many estimates claim that over 80% of the world’s data is stored as unstructured text. Whatever the exact proportion, there is no denying that a significant amount of valuable information is stored within free-text documents such as reports, memorandums, and correspondence.
In this research brief, we share lessons learned from use cases across multiple industries and describe a successful case study using call center text to improve a model of churn propensity for a mobile phone provider.
- Natural language contained in unstructured text presents a significant challenge for automated analytical approaches due to the wide variety in forms of expression including technical language, sarcasm, and colloquialisms.
- The information contained in unstructured text is highly valuable when considered alone, and the value multiplies when the text can be associated with available structured data such as case outcomes or numerical measures of performance.
- Successful strategies for extracting value from text always include processing at two levels: first, at the word level to identify key concepts and second, at the document level to associate the collection of concepts contained within a document with a specific business outcome.