If we want to make our AI systems smarter, we need to let them asks questions. But should we?
Duke University’s biomedical engineers have unveiled a novel method to enhance the efficiency of machine learning models, focusing on the discovery of new molecular therapeutics. This process achieves notable results by using a fraction of the typically required data, with an algorithm that actively identifies data gaps, increasing accuracy in some instances by more than twofold. Essentially, what experts are saying is that allowing AI to ask questions could make it smarter.
This innovative approach could simplify the process for scientists to identify and categorize molecules with potential benefits in the development of new drug candidates and other materials. The research was published in the Royal Society of Chemistry’s journal Digital Discovery on June 23.
Machine Learning Algorithms in Molecular Discovery
Machine learning algorithms (AI) have been increasingly utilized to identify and predict the properties of small molecules, including drug candidates and other compounds. Despite significant advances in computational power and machine learning algorithms, their abilities remain restricted by the datasets used to train them, which are often far from perfect.
Data bias, characterized by the overrepresentation of one property over another in datasets, is a central issue. This bias is akin to an algorithm learning to distinguish dogs from cats based on one billion photos of dogs and just a hundred of cats, as explained by Daniel Reker, an assistant professor of biomedical engineering at Duke University.
This issue is particularly acute in drug discovery, where scientists work with datasets where more than 99% of tested compounds are deemed “ineffective,” and only a handful of molecules are tagged as potentially useful.
The Solution: Active Machine Learning
Researchers usually employ data subsampling, where their algorithm learns from a small but representative data subset. However, this approach may eliminate crucial data points, impacting the algorithm’s accuracy. To overcome this, Reker and his collaborators employed a technique known as active machine learning.
Active machine learning, which enables the algorithm to request additional information in the face of confusion or data gaps, showed promising results in improving model efficiency at predicting performance.
Active Machine Learning in Drug Development
The Duke team was the first to apply active machine learning to existing datasets, particularly in the context of molecular biology and drug development.
The team demonstrated that active subsampling identified and predicted molecular characteristics more accurately than standard subsampling strategies. It was even 139% more effective than an algorithm trained on the entire dataset in some cases. Surprisingly, the ideal amount of data turned out to be much less than expected, sometimes needing just 10% of the available data.
The team plans to continue examining this data inflection point and use this novel approach to identify new molecules for potential therapeutic targets. The new approach could reduce data storage needs and costs, making machine learning more accessible, reproducible, and powerful.
PLEASE READ: Have something to add? Visit Curiosmos on Facebook. Join the discussion in our mobile Telegram group. Also, follow us on Google News. Interesting in history, mysteries, and more? Visit Ancient Library’s Telegram group and become part of an exclusive group.