Characterizing and Improving a Large-Scale Recommender System

A red arrow pointing to a row of white blocks with black question marks on them

Picture This

Our client is one of the world’s largest food and beverage companies, with tens of billions of dollars in annual sales in North America alone. The client strives to use data to inform their decision making, and at this scale—selling through thousands of outlets, ranging from small shops to the very largest grocery chains—even small tactical decisions can have big payoffs.

Like other food and beverage companies, our client maintains a support staff who regularly visit individual retailers to rectify distribution issues and sell new products. For several years, this staffing effort has been backed by an end-to-end prioritization system that is intended to identify the highest-value activities and promote them for the field staff. It is an impressive system, and, internally, the client attributed millions of dollars in new and recovered sales per year to this tool.

Big Challenge

After several years of running this system in production, the client’s internal analytics organization asked Elder Research for a deep evaluation of the tool’s value. We were tasked with linking the system’s recommendations to actual realized dollars and contrasting this analysis with value estimates that had been hypothesized during the system’s original construction—prior to the impacts of the COVID-19 pandemic. The client’s own understanding of the tool’s profitability was founded on these hypothetical valuations, so discrepancies in real-world value could correspond to millions of dollars in mistaken attribution.

In addition to the business challenge, this task presented a big analytical challenge because of the client’s need for causal, “this-causes-that” answers. These types of analyses must be treated with special care in situations like this, where it isn’t possible to carry out designed experiments.

The system’s sheer scale also presented computational challenges, and we worked to identify and reconcile insights across the hierarchies and units involved (each vs. pallet, item vs. brand, store vs. chain, etc.). Our client was interested in both the tactical picture, with store-level and product-level insights, and the strategic view, looking across large swaths of the program. We provided detailed analyses that could answer questions at both scales consistently.

The Solution

We worked collaboratively and transparently with our client’s internal analytics team, and we found answers to their questions about the real value being produced by their existing recommender system. Getting to these answers required diving into and understanding the many different parts of a large organization’s production systems:

reviewing requirements and technical documentation,
getting into the data pipelines,
stepping through business logic and value calculations, and
reconciling these with the help of client stakeholders.

In the past, the system’s complexity and a lack of transparency from previous vendors had made it difficult for the client to collect and evaluate this information.

Working iteratively under an agile design, we provided holistic, program-level insights and individual time-series analyses. We successfully applied causal modeling methods to pin down the actual realized value of the system’s recommendations while clearly articulating both the variability in the system’s success rate and the level of uncertainty present in our findings.

Providing our clients with the results they need and a better understanding of their tools requires a breadth of analytics, computational data science, and communications skills. In this case, we took on a broad, hard-to-pin-down problem and worked closely with the client to take the problem apart, develop compelling answers, and convey these findings clearly and accurately. We were able to give an unbiased, outside perspective to the internal team responsible for this system.

The Results

Our effort produced a wealth of findings for the client, starting with useful estimates of their system’s actual value—which were an order of magnitude different from the client’s initial hypothesis—and extending into how this value has evolved over time. We also uncovered key areas in which fundamental system assumptions do not hold in the post-pandemic environment. And we found instances in which, as constructed, the system did not match end-user requirements or assumptions.

Our work provided a compelling analysis that is driving major changes to our client’s recommender system as the client adapts to current market conditions and a better understanding of the system’s value. This project is likely to have further downstream effects as well, providing new views into the system and helping continued improvements to be more data-informed.