Automating Network Entity Detection

Ryan McGibony

April 5, 2019

BLOG_Automating Data Pipelines and Network Entity Detection

Elder Research developed an automated data pipeline to cleanse data and feed a data visualization tool used to identify and explore document preparer network relationships. The solution enabled the client to automate significant portions of work, make data-driven decisions, prioritize resources, and gain new business value from the data.

The Challenge

Elder Research was tasked to identify networks of document preparers who worked together in a given year. Documents with particular information in common indicated a possible network connection. The goals were to enhance the client’s capabilities in two ways:

  1. Improve efficiency of finding documents with incorrect preparer identification and re-assigning them to the correct preparer identification when possible.
  2. Enable the analysts to consider preparer networks instead of only individual preparers to better deploy investigative resources.

The Solution

The project required several interrelated stages of data analysis using multiple data sources and formats. Since it was common for the documents to have typographical and other errors preparers could appear to be linked when they did not actually work together. Elder Research used extensive data validation procedures to account for missing data and ensure that the documents identified the proper preparer.

For the preliminary phase of network identification Elder Research quantified the strength of relationship between each pair of document preparers. The idea behind this technique came via Elder Research’s extensive experience with text mining, as exemplified by the award-winning book, Practical Text Mining, co-authored by Dr. John Elder and five others (Elsevier, 2012).

Further phases of network analysis techniques were used to sort through the web of connections and boil down the links to the most likely networks of preparers who worked together, as shown in the example below.

Network of preparers 1

To make it easy for end-users to search and explore preparer relationships Elder Research deployed its proprietary browser-based network visualization tool. This enabled the client to interactively explore and visualize relationships among preparers and to also explore preparer connections based on the raw document data (without the pre-processing analytics).


Advanced analytics and data visualization automated 40% of the cases being investigated for improper preparer identification, reducing case investigation from 20 minutes per case to less than a minute per case, significantly improving investigative asset utilization.

 Download This Case Study

About the Author

Ryan McGibony Senior Data Scientist Ryan McGibony came to Elder Research after earning a Master of Science in Analytics from North Carolina State University. In his previous work, he conducted and analyzed custom marketing research for corporate and non-profit clients at a full-service research firm. While there, he gained experience in customer segmentation and predictive modeling techniques. Ryan also spent two years in Mongolia as a Peace Corps volunteer, supporting the staff of a local chamber of commerce in improving their service delivery to member businesses. Throughout his career, Ryan has enjoyed working with a wide variety of clients, taking care to understand their needs, and finding solutions to their problems.