A medical device manufacturer designed a sensor that is implanted within a major organ to detect disease events. The sensor is being tested on animal subjects with the Intent to move to human trials and secure FDA approval for monitoring patients with previous cases of the disease. Elder Research was engaged to provide machine learning models using the sensor data to identify abnormal activity that are predictive of the disease. The goal was to classify sensor traces as either normal or high risk.
The sensor produces a time series signal of the organ behavior. The team was given 700+ traces taken from multiple animals over a 30 day period. After feature engineering, cluster analysis and dimensionality reduction, the team settled on two features which maximized the output signal from the smallest number of inputs, thus simplifying deployment. Synthetic labeling (semi-supervised learning) was used to relabel some of the noisier lab results and two binary classifier models (high/normal risk) were delivered; An Ensemble model on the reduced features and a Convolutional Neural Network (CNN) with Transfer Learning on the entire raw time series.
The Ensemble model provides accuracy and ease of deployment at the expense of robustness. A voting method was used to weight the results from several binary classifiers, where Logistic Regression and K-Nearest Neighbors were chosen for the final decision boundary. Alternatively, the CNN model provides robustness at the expense of accuracy and ease of deployment. Transfer learning offered significant
performance gains using a calibration step to learn the specific patient profile. Leave-one-group-out model validation was used to avoid overfit given the small number of test subjects.
The client opted to use the Ensemble model due to the simplicity of deployment. The model provides a probability of high risk for any given subject. The final Ensemble model delivered a very high mean holdout sample accuracy of 96.93%, mean sensitivity of 93.84%, mean specificity of 98.94%, and specific animal accuracy between 93.9% and 99.4%. Next steps include analyzing data from additional animals and human subjects, then incorporating other biological data to increase robustness.