The project required several interrelated stages of data analysis using multiple data sources and formats. Since it was common for the documents to have typographical and other errors preparers could appear to be linked when they did not actually work together. Elder Research used extensive data validation procedures to account for missing data and ensure that the documents identified the proper preparer.For the preliminary phase of network identification Elder Research quantified the strength of relationship between each pair of document preparers. The idea behind this technique came via Elder Research’s extensive experience with text mining, as exemplified by the award-winning book, Practical Text Mining, co-authored by Dr. John Elder and five others (Elsevier, 2012).Further phases of network analysis techniques were used to sort through the web of connections and boil down the links to the most likely networks of preparers who worked together, as shown in the example below.

To make it easy for end-users to search and explore preparer relationships Elder Research deployed its proprietary browser-based network visualization tool. This enabled the client to interactively explore and visualize relationships among preparers and to also explore preparer connections based on the raw document data (without the pre-processing analytics).