Among software companies, new version releases are a promising practice to improve the customer experience and prevent churn. Yet now more than ever, determining software reliability between releases remains a major challenge for these companies. Is there a way to quantify the software performance that influences the customer experience? The answer is yes. Elder Research has developed a powerful approach to measuring software reliability by leveraging log data and Product Usage Analytics.
Treating Software like Hardware
One approach we’ve observed in the software industry is to treat software like hardware products such as televisions or appliances. The thinking is that, because there is already a well-established theoretical model to forecast hardware product reliability, it can also be applied to software reliability. In a representative sample of hardware products, time to failure for failed goods is used in conjunction with the data from products that have not yet failed—this data is referred to as censored because the true time to failure is unknown. This method generates a prediction of what to expect for the entire lot of products produced under the same conditions. The predicted values calculated for the lot include:
- Estimated Number of Failures
- Failure Rate
- Mean-Time-Between-Failure (MTBF)
At first glance, it seems like the above approach could be used to determine software reliability; however, it doesn’t take into account the fundamental differences between hardware and software products.
What is Unique about Software Products?
A hardware product has only one outcome—the product has an anticipated lifetime and eventually it will fail. For example, a consumer will use their out-of-the-box blender until it breaks and is no longer usable.
In contrast, a software user interacts with the product on a per-session basis where each user session generally has one of two outcomes:
- The user opens the software, uses it, and closes the software without an issue
- The user opens the software, uses it, and it crashes or terminates abruptly
When considering the hardware product reliability method discussed above, it is expected that a product will eventually fail. For any snapshot of the respective product sample studied, a product is classified as either “not yet failed” or “failed at x time”. In the case of software products, the session-by-session user experience creates both successes and failures. The hardware product reliability approach does not account for successes— it censors products that have not failed yet. For this reason, the successes recorded in log data need to be accounted for in the metrics used to assess the performance of the software release.
A Powerful Approach Using Product Usage Analytics
It is expensive and time-consuming to collect actual failure data, so generating predictive failure metrics is an essential cost-savings alternative for hardware goods manufacturers. In the software product industry it isn’t necessary to generate predictive failure metrics. Because software session log data is plentiful, easily recorded, and generated quickly by many users, an observed value for reliability can be calculated from this data using Product Usage Analytics. An observed value is powerful because it describes of the status of the software release and is directly actionable. An added benefit of observed values is they do not operate on assumptions as predicted values do.
The key to comparing software reliability between major releases, minor releases, and service packs is to determine a suite of performance metrics that are:
- indicative of stability
- captured in session logs
- actionable by the software team
Crash rate and mean-time-to-failure (MTTF) are good examples of reliability metrics that can be used to measure software reliability. For example, a software company would like to see crash rates decline and MTTF increase between releases. There are other industry standard metrics available for software reliability and Product Usage Analytics, but it is imperative that any suite of performance metrics must be supported by log data and able to be calculated across releases and over time. Finally, the software team must be able to derive insights from the performance indicators so that they can easily take action and drive an observable improvement in the reliability score.