RESEARCH & DISCOVERIES
Current Areas of Study
(Left) Predicted vs. experimental values of flash point on full integrated distribution test set for a deep learning method. (Middle) Xiaoyu Sun and (Right) Nathaniel J. Krakauer from skunkworks who are co-lead authors on the paper we wrote on this work.
​
Machine Learning for
Flash Points of Organic Molecules
The goal of this work was to explore the efficacy of deep learning compared to more traditional human-designed featurization for predicting organic molecule flash points.
​
Flash points of organic molecules play an important role in preventing flammability hazards and large databases of measured values exist, although millions of compounds remain unmeasured. To rapidly extend existing data to new compounds many researchers have used quantitative structure-property relationship (QSPR) analysis to effectively predict flash points. In recent years graph based deep learning (GBDL) has emerged as a powerful alternative method to traditional QSPR. In this paper we assess GBDL models by comparing against 12 previous QSPR studies using more traditional methods. Our result shows that GBDL yields slightly worse but comparable performance with previous QSPR studies. To further explore GBDL models, we collected the largest flash point dataset to date, which contains 10575 unique molecules. Overall our results showed that deep learning was not clearly advantageous for this problem. This project involved 9 Skunkworks researchers over a few years. The research has been published in: Sun, Xiaoyu, Nathaniel J. Krakauer, Alexander Politowicz, Wei Ting Chen, Qiying Li, Zuoyi Li, Xianjia Shao, et al. 2020. “Assessing Graph-Based Deep Learning Models for Predicting Flash Point.” Molecular Informatics39 (6): 1–14. https://doi.org/10.1002/minf.201900101.