Target Shuffling is a process for testing the statistical accuracy of data mining results. It is particularly useful for identifying false positives, or when two events or variables occurring together are perceived to have a cause-and-effect relationship, as opposed to a coincidental one. The more variables you have, the easier it becomes to ‘oversearch’ and identify (false) patterns among them—called the ‘vast search effect’.
John Elder first came up with target shuffling 20 years ago, when Elder Research was working with a client who wasn’t sure if he wanted to invest more money into a new hedge fund. While the fund had done well in its first year, it had been a volatile ride, and the client was unsure if the success was real. A statistical test showed that the probability of the fund being that successful by chance was very low, but the client wasn’t convinced. So John performed 1,000 simulations where he shuffled the results where the target variable was the buy or hold signal for each day. He then compared the random results to how the hedge fund had actually performed. Out of 1,000 simulations, the random distribution returned better results in just 15 instances—in other words, there was a 1.5 percent chance that the hedge fund’s success was a result of luck. This new way of presenting the data made sense to the client, and as a result he invested 10 times as much in the fund.
Two important lessons were learned from that experience. One is that target shuffling is a very good way to test non-traditional statistical problems. But more importantly, it’s a process that makes sense to a decision maker. Statistics is not persuasive to most people—it’s just too complex.