Elder Research creates a sustainable competitive advantage for our clients by delivering effective and actionable solutions to challenging real-world analytical problems. There is much more to delivering effective analytics solutions than just “looking at the data”. Our disciplined, time-tested approach starts by framing the client's business objective, exploring the data, building usable models, and validating them against real-world criteria. We then develop and deploy powerful, user-friendly analytics solutions that our customers can use.
This rigorous experiment-driven design and analysis framework is Elder Research’s Agile Data Science methodology.
AGILE DATA SCIENCE
Predictive modeling is a research task. It is difficult to know in advance which algorithms and variables, when combined, will reveal the secrets a data set may be concealing. To mitigate risk, Elder Research conducts a rigorous experimental research design and analysis process where all aspects of the model and business hypotheses are carefully detailed and tested throughout the project. We employ a methodology that combines best practices for data mining — embodied by the Cross Industry Standard Process for Data Mining (CRISP-DM) — with the best practices for Agile software development. Following this Agile Data Science methodology creates an efficient process of exploration where improvements to the model are discovered, carefully considered and evaluated. This unique process combination allows Elder Research to deliver higher performing analytical models in less time than the traditional modeling approach.
Using a rapid-prototyping framework, a baseline solution is developed quickly to surface any technical and interface issues. Then, the complete system is strengthened by iteratively improving components — often possible in parallel. This allows the path-critical components to be discovered and improved as time and budget allow, and provides decision makers with better estimates of the tradeoffs involved. Included in this framework are regular administrative and technical updates and on-site meetings at key milestones.
Agile Data Science intergates the following process:
FramE the Business Objective
Having clearly defined business objectives is critical to the success of an analytics project. We have consistently achieved breakthrough solutions by combining the business domain expertise of our clients with our consulting expertise in business analysis, systems engineering, and predictive modeling. Through this collaboration, we can gain an understanding of the data, business goals, and constraints necessary to build a reliable model and then effectively match our services to the clients’ needs.
Explore and Transform Data
The quality and reliability of the data has a huge impact on the efficacy of an analytic solution. However, the data preparation phase often consumes as much as 80 percent of a project.Elder Research has developed rigorous data preprocessing, cleansing, de-normalization, and extraction techniques and tools to ensure and improve data quality. With our Agile Data Science methodology and custom data analysis tools, the complexity of structuring and transforming the data is implemented over a number of iterations to distribute the burden of the data preparation phase. Careful attention to the existence of fatal data flaws, such as “leaks from the future” and survivor bias ensures the data properly reflects what will be seen during operation.
Ensemble Modeling Techniques
Our goal in every analytical modeling project is to produce the implementable model having the best out-of-sample error — the one that performs the best when given new, unseen data. To best achieve this goal, we have mastered many modeling techniques and we study many model options as it is often surprising which algorithms perform best in new situations. Dr. Elder was one of the first to discover that combinations of different methods or algorithms, called ensembles, usually outperform a single algorithm. Our general approach of using a collection of modeling techniques has become a best practice for predictive analytics.
We have experience with a great variety of software tools, and work closely with the client — and their particular existing tool configuration — to find the optimal combination of tools to solve the problem and meet project requirements and expectations.
The final solution for any of our projects is selected using a rigorous cross-validation process that ensures the model is robust to changes in the data and underlying assumptions. We employ out-of-sample testing and validation to limit over-fit and over-search during model development. This continuous testing within our Agile Data Science methodology, guides the next sprint and ensures that performance improves over time. Elder Research judges the performance of each model iteration for its ability to accurately predict or classify the target variable in a hold-out sample of data. This helps to ensure the stability of the algorithm when it is deployed in operation. Target Shuffling is one technique we use for testing the statistical accuracy of our data mining results. It is particularly useful for identifying false positives, or when two events or variables occurring together are perceived to have a cause-and-effect relationship, as opposed to a coincidental one. The more variables you have, the easier it becomes to ‘oversearch’ and identify (false) patterns among them—called the ‘vast search effect’.
Keys To Success
Three philosophies have contributed to our ability to deliver successful analytics solutions:
Maintaining a Diversity of Projects
Whether a project is military, medical, or monetary, success hinges on whether or not the client and our team can successfully learn useful patterns from historical data. Our mix of applications across a broad spectrum of commercial industries and government agencies has exposed us to the collected wisdom of expert practitioners in each domain, teaching us much. For instance, a data feature useful in drug efficacy studies has contributed to fraud risk estimation, and techniques originally applied to high-performance aircraft have improved investment and credit risk scoring projects.
Employing a Diversity of Algorithms
Unlike many data science consultants, our team is not tied to, or limited by, a single data science technology. Though some of us have spent years creating competitive algorithms, we have learned that each problem presents its own challenges and requirements, and that a toolbox of powerful techniques is needed. We have published innovations in six fields:
- Polynomial Networks
- Decision Trees
- Global Optimization
- Bayesian Estimation
- Adaptive Kernels
- High-dimensional Visualization
We have made improvements to several others — including Nearest Neighbors, Neural Networks, Radial Basis Functions, and Nonlinear Regression.
The Elder Research team have been pioneers in the breakthrough technique of combining multiple models to achieve estimates more robust and accurate than possible through any single model – sometimes even the retrospective best of the individual models. This simple idea contains subtleties in its execution, yet years of practice at the forefront of this approach have made it a powerful tool in our arsenal. An ensemble can even be less complex than any of its components, helping to explain its superior performance.