Text Mining Overview
Text Mining, also called Text Analytics, is the science of leveraging textual data, like whole web pages or narrative fields within a database, for data mining. Text, a type of unstructured data, is challenging due to the richness and complexity of language, but holds enormous potential for forward-thinking firms, due to the sheer volume and depth of available textual data. These opportunities often fall into the categories of Learning from Text or Process Automation.
Learning from Text
Some projects can leverage text analytics to identify important patterns or construct forecasting systems. Projects that require identifying interesting information in text are normally difficult to automate as they take a great deal of tweaking and trail and error by humans to find something useful. Some examples of learning from information gained through text mining are:
- Consumer Products: A manufacturer could leverage text mining of warranty claims to identify linkages between certain recurring complaints and particular facilities that manufactured the products in question.
- Military intelligence: Analysts learn that a certain person Bob who is being monitored does in fact know Sally based on text mining of web pages.
- Sentiment Analysis: Learning that more and more people are talking about a specific product or brand online in a positive or negative way. This application of text mining is highly useful to brand managers and marketing executives and is a new and emerging area.
Computers are much faster than humans at analyzing large amounts of data, but have a more difficult time understanding the context of a situation or applying common sense to make judgments. Thus, computers can be used to mine textual data in order to focus the attention of analysts on the "haystack" with the most "needles." Some examples of using text mining to automate processes are:
- Insurance Claims Approval Automation: Developing a process to automatically allow high-risk claims to pass through approval process for a quick, automated decision.
- Email Sorting: Sort email automatically into subfolders by mining text within email (junk mail filters are one of the earliest examples of a text mining application).
- Topic Discovery: Automatically finding most relevant web pages for a human analyst to inspect to greatly increase productivity by bringing the documents most likely to have interesting cases to the top of the queue.
Promise, and the Challenge
The reason text mining is interesting is because there is a vast amount of text available and the amount of textual and web data is growing at an ever increasing rate. But, this source of Big Data is also the problem as it presents new challenges to text miners. The figures are staggering, but probably a more meaningful way of talking about text size is to talk about typical projects.
- Some text mining projects have 60 to 100 megabytes, while some have a several gigabytes, and web scale projects are in the terabytes of data.
- To put this in perspective, the Bible (old and new testaments) is a little over four megabytes of data.
Improving Claims Approval Speed and Accuracy
Elder Research combined state-of-the-art text mining algorithms with traditional statistical techniques to create a solution for the Social Security Administration to rank disability claims for approval. The results were more accurate and more consistent than any single doctor’s decision and allowed 20% of the claims to be approved immediately. Read the Case Study.
Automating Textual Data Discovery And Analysis
In a project for a Department of Defense agency, Elder Research built a text mining and web scraping technology to monitor the web for information about Animal Infectious Diseases. The system Elder Research designed has a focused crawler that locates and compiles information from blogs, online news sources, and other content on the internet. Read the Case Study
Text Mining to Detect Unusual Behavior Patterns for Homeland Security
For a Department of Homeland Security agency, Elder Research applied text mining to detect unusual pattern of activity in land border crossings in order to identify criminal activity. In addition, the project also involved mining free-form text to accurately identify commodities in sea-going transport. By using established agency codes, Elder Research was able to map free-form text descriptions to discover unusual patterns of behavior.
Request a consultation to learn more about how text mining can benefit your organization. In a typical engagement, Elder Research analytics consultants would:
- Brainstorm possible solutions with key decision makers from your organization
- Next, our analysts assess the current state of your data to identify the amount of time and resources needed to begin a text mining project.
- Once the brainstorming and data assessment phases are completed, Elder Research will propose a pilot project to show the power and effectiveness of text mining.