Synthesizing complex natural products is a grand challenge in organic chemistry. We present DeepRetro, a significant advancement in computational retrosynthesis that discovers viable synthetic routes for molecules previously considered too complex for automated methods. DeepRetro is a novel, open-source framework that tightly integrates large language models (LLMs), traditional retrosynthetic engines, and expert human feedback into an iterative design loop. Unlike prior approaches that rely on either template-based methods or unconstrained LLMs, our hybrid system combines the precision of templates with the generative flexibility of LLMs, governed by rigorous chemical validity checks and recursive refinement. This system dynamically explores and revises synthetic pathways, guided by algorithmic checks and expert input through an interactive interface. While DeepRetro shows strong performance on standard benchmarks, its main strength is its ability to propose novel, viable pathways for highly complex natural products. Through case studies, we demonstrate how this approach facilitates new total synthesis routes and enhances human-machine collaboration. DeepRetro serves as a working model for applying LLMs to scientific discovery, and we release it as an open-source tool to accelerate progress in drug discovery and materials design.
@article{deepretro2026,title={{DeepRetro} discovers retrosynthetic pathways through iterative large language model reasoning},volume={16},issn={2045-2322},url={https://doi.org/10.1038/s41598-026-38821-z},doi={10.1038/s41598-026-38821-z},number={1},journal={Nature Scientific Reports},author={Sathyanarayana, Shreyas Vinaya and Hiremath, Sharanabasava D. and Rahil Kirankumar, Shah and Panda, Rishikesh and Jana, Rahul and Singh, Riya and Irfan, Rida and Murali, Ashwin and Ramsundar, Bharath},month=feb,year={2026},pages={8448},}
RSC DD
Selected work
ChemBERTa-3: An Open Source Training Framework for Chemical Foundation Models
R. Singh, A. A. Barsainyan, R. Irfan, and 15 more authors
Royal Society of Chemistry’s Digital Discovery, 2026
The rapid advancement of machine learning in computational chemistry has opened new doors for designing molecules, predicting molecular properties, and discovering novel materials. However, building scalable and robust models for molecular machine learning remains a significant challenge due to the vast size and complexity of chemical space. Recent advances in chemical foundation models hold considerable promise for addressing these challenges, but such models remain difficult to train and are often fully or partially proprietary. For this reason, we introduce ChemBERTa-3, an open source training and benchmarking framework designed to train and fine-tune large-scale chemical foundation models. ChemBERTa-3 provides: (i) unified, reproducible infrastructure for model pretraining and fine-tuning, (ii) systematic benchmarking tooling to evaluate proposed chemical foundation model architectures on tasks from the MoleculeNet suite, and (iii) fully open release of model weights, training configurations, and deployment workflows. Our experiments demonstrate that although both graph-based and transformer-based architectures perform well at small scale, transformer-based models are considerably easier to scale. We also discuss how to overcome the numerous challenges that arise when attempting to reproducibly construct large chemical foundation models, ranging from subtle benchmarking issues to training instabilities. We test ChemBERTa-3 infrastructure in both an AWS-based Ray deployment and in an on-premise high-performance computing cluster to verify the reproducibility of the framework and results. We anticipate that ChemBERTa-3 will serve as a foundational building block for next-generation chemical foundation models and for the broader project of creating open source LLMs for scientific applications. In support of reproducible and extensible science, we have open sourced all ChemBERTa3 models and our Ray cluster configurations.
@article{chemberta3_2026,doi={10.1039/D5DD00348B},title={ChemBERTa-3: An Open Source Training Framework for Chemical Foundation Models},author={Singh, R. and Barsainyan, A. A. and Irfan, R. and Amorin, C. J. and He, S. and Davis, T. and Thiagarajan, A. and Sankaran, S. and Chithrananda, S. and Ahmad, W. and Jones, D. and McLoughlin, K. and Kim, H. and Bhutani, A. and Sathyanarayana, S. V. and Viswanathan, V. and Allen, J. E. and Ramsundar, B.},journal={Royal Society of Chemistry's Digital Discovery},year={2026},}
Conference papers
Full conference publications in applied machine learning and scientific computing.
2022
IEEE I2CT
Selected work
Speech to Equation Conversion using a PoE Tagger
Peeta Basa Pati and V Shreyas
In 2022 IEEE 7th International conference for Convergence in Technology (I2CT), 2022
@inproceedings{PoETagger,author={Basa Pati, Peeta and Shreyas, V},booktitle={2022 IEEE 7th International conference for Convergence in Technology (I2CT)},title={Speech to Equation Conversion using a PoE Tagger},year={2022},pages={1-4},doi={10.1109/I2CT54291.2022.9824252},url={https://ieeexplore.ieee.org/document/9824252}}
Workshop papers
Research prototypes, open-source systems, and workshop contributions around chemistry and scientific AI.
2025
ICML WS
DeepChem-Variant: A Modular Open Source Framework for Genomic Variant Calling
A. V. Bisoi, V Shreyas, J. Siguenza, and 1 more author
In Championing Open-source DEvelopment in ML Workshop at ICML 2025, 2025
@inproceedings{deepchem_variant2025,title={DeepChem-Variant: A Modular Open Source Framework for Genomic Variant Calling},author={Bisoi, A. V. and Shreyas, V and Siguenza, J. and Ramsundar, B.},booktitle={Championing Open-source DEvelopment in ML Workshop at ICML 2025},year={2025},url={https://openreview.net/forum?id=jWZV6hWCeX}}
AAAI WS
Open-source Polymer Generative Pipeline
D. Mohanty, V Shreyas, A. Palai, and 1 more author
In 4th Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE), 2025
@inproceedings{polymer_pipeline2025,title={Open-source Polymer Generative Pipeline},author={Mohanty, D. and Shreyas, V and Palai, A. and Ramsundar, B.},booktitle={4th Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE)},year={2025},}
NeurIPS WS
Multi-Turn Human-LLM Interaction Through the Lens of a Two-Way Intelligibility Protocol
H. Mestha, K. Bania, V Shreyas, and 2 more authors
@inproceedings{mti_llm2025,title={Multi-Turn Human-LLM Interaction Through the Lens of a Two-Way Intelligibility Protocol},author={Mestha, H. and Bania, K. and Shreyas, V and Liu, S. and Srinivasan, A.},booktitle={MTI-LLM Workshop at NeurIPS 2025},year={2025},}
2024
NeurIPS WS
Selected work
Open Source Molecular Processing Pipeline for Generating Molecules
V Shreyas, Jose Siguenza, Karan Bania, and 1 more author
In Machine Learning and the Physical Sciences Workshop at NeurIPS 2024, 2024
@inproceedings{openMolGen2024,title={Open Source Molecular Processing Pipeline for Generating Molecules},author={Shreyas, V and Siguenza, Jose and Bania, Karan and Ramsundar, Bharath},booktitle={Machine Learning and the Physical Sciences Workshop at NeurIPS 2024},year={2024},archiveprefix={arXiv},primaryclass={cs.LG},doi={10.48550/arXiv.2408.06261},url={https://arxiv.org/abs/2408.06261},note={Also presented at MoML 2024 and Baylearn 2024}}
AAAI WS
Selected work
Predicting ATP binding sites in protein sequences using Deep Learning and Natural Language Processing
V Shreyas and Swati Agarwal
In 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE), 2024
@inproceedings{ATP-binding2024,title={Predicting ATP binding sites in protein sequences using Deep Learning and Natural Language Processing},author={Shreyas, V and Agarwal, Swati},booktitle={3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE)},year={2024},archiveprefix={arXiv},primaryclass={q-bio.BM},doi={10.48550/arXiv.2402.01829},url={https://arxiv.org/abs/2402.01829}}
Preprints
Early-stage work that is publicly available and still evolving.
@misc{CountCLIP2024,title={CountCLIP -- [Re] Teaching CLIP to Count to Ten},author={Mestha, Harshvardhan and Agrawal, Tejas and Bania, Karan and Shreyas, V and Bhisikar, Yash},year={2024},archiveprefix={arXiv},primaryclass={cs.CV},doi={10.48550/arXiv.2406.03586},url={https://arxiv.org/abs/2406.03586}}