Are Orange Cars Really not Lemons

White Paper


An article in The Seattle Times reported that “an orange used car is least likely to be a lemon.” This discovery surfaced in a competition hosted by Kaggle to predict bad buys among used cars using a labeled dataset.

Of the 72,983 used cars, 8,976 were bad buys (12.3%). Yet, of the 415 orange cars in the dataset, only 34 were bad (8.2%). The visualization used was entirely appropriate and accurate, but susceptible to the small-sample effect so it led to incorrect conclusions.

This white paper dives into the details and explores techniques, particularly Target Shuffling, to avoid making the same mistake.

Download the White Paper