Al Ibrahim, EmadEmadAl IbrahimMorgan, NathanNathanMorganMüller, SimonSimonMüllerMotati, SaikiranSaikiranMotatiGreen, WilliamWilliamGreen2025-12-162025-12-162025-11-21Journal of the American Chemical Society 147 (49): 45057-45069 (2025)https://hdl.handle.net/11420/60267Determining solubilities of organic molecules is critical in various fields such as pharmaceuticals, agrochemicals, and environmental science. Knowing how a solute will dissolve in different solvents and at different temperatures is essential for drug formulation, synthesis, purification, and crystallization. Hard-to-estimate solubility limits currently hinder the design of new processes, making innovation more expensive. We propose a fast and general method for predicting the solubilities of neutral organic molecules in a wide range of solvents and temperatures. Our method uses a thermodynamic fusion cycle to combine machine learning predictions of the activity coefficient, fusion enthalpy, and melting point temperature. This method was tested on a combined data set with more than 100,000 experimental solubility values, showing better or comparable performance to competing methods on many solubility benchmarks even at elevated temperatures. We also introduce reference ensembling to leverage all available experimental solubilities for a given solute in estimating its solubility in a different solvent. Reference ensembling is also shown to enhance the robustness of models trained directly on solubility data.en1520-5126Journal of the American Chemical Society2025494505745069American Chemical Society (ACS)Activity coefficientSolubilitySolution chemistrySolventsThermodynamic propertiesNatural Sciences and Mathematics::540: ChemistryAccurately predicting solubility curves via a thermodynamic cycle, machine learning, and solvent ensemblesJournal Article10.1021/jacs.5c13746Journal Article