Challenge: Recording complex experimental data is critical if it is to be trusted in the future. Companies that sell curated databases must have efficiently identified and fixed all erroneous entries. Errors in central database misguides research programs and will lead to inaccurate machine learning.
Solution: Alchemite was used to generate machine learning models from the entire data corpus, skipping over any missing data and taking into account standard noise in any of the measurements. The resulting model was then used to re-predict several million data points in the corpus, giving a set of predicted values. The predicted values were then compared to the given values and using the confidence in the predictions enabled us to identify outliers that warranted manual inspection.
Outcome: Alchemite’s deep learning algorithms automatically validated an entire experimental corpus, with no explicit domain knowledge, learning expected correlations between all available data and identifying data points that were not consistent with the model. We identified a range of outliers which were typos, experimental errors, or genuine outliers – giving the client a cheap, fast way to automatically validate complex numerical data. Our technology increased the confidence and breadth of results generated dramatically.
Swipe to navigate articles