J. Chem. Inf. Model. 2025, 65 (14), 7415–7425
DOI: 10.1021/acs.jcim.5c00514
Accurate retention time (RT) prediction models can significantly improve liquid chromatography–mass spectrometry (LC-MS) data analysis widely used in chemical synthesis. As hundreds of thousands of syntheses are performed annually at Enamine, a large amount of experimental data has been generated internally. In this paper, we present the development of an RT prediction model based on the GATv2Conv + DL graph neural network (NN) architecture, trained on the internal data and further evaluated using the METLIN SMRT data set. The final model achieved a mean absolute error (MAE) of 2.48 s for the 120 s LC-MS method. We also conducted a detailed analysis of RT prediction errors and determined that the interval between RT – 7.12 s and RT + 9.58 s contained over 95% of the data. The developed model has been successfully integrated into the existing in-house LC-MS analysis toolkit, enhancing its predictive and analytical capabilities. Additionally, we have published a curated subset of 20,000 data points from our internal data set to support community benchmarking and further research.