publications | Shreyas Vinaya Sathyanarayana

Research output

Work across chemistry, molecular machine learning, scientific AI, and open-source tooling, organized by venue type for faster browsing.

Journal publications

Peer-reviewed journal articles and accepted long-form research papers.

2026

Nat Sci Rep
Selected work

DeepRetro discovers retrosynthetic pathways through iterative large language model reasoning

Shreyas Vinaya Sathyanarayana, Sharanabasava D. Hiremath, Shah Rahil Kirankumar, and 6 more authors

Nature Scientific Reports, Feb 2026

Abs DOI Bib Scholar 4

Synthesizing complex natural products is a grand challenge in organic chemistry. We present DeepRetro, a significant advancement in computational retrosynthesis that discovers viable synthetic routes for molecules previously considered too complex for automated methods. DeepRetro is a novel, open-source framework that tightly integrates large language models (LLMs), traditional retrosynthetic engines, and expert human feedback into an iterative design loop. Unlike prior approaches that rely on either template-based methods or unconstrained LLMs, our hybrid system combines the precision of templates with the generative flexibility of LLMs, governed by rigorous chemical validity checks and recursive refinement. This system dynamically explores and revises synthetic pathways, guided by algorithmic checks and expert input through an interactive interface. While DeepRetro shows strong performance on standard benchmarks, its main strength is its ability to propose novel, viable pathways for highly complex natural products. Through case studies, we demonstrate how this approach facilitates new total synthesis routes and enhances human-machine collaboration. DeepRetro serves as a working model for applying LLMs to scientific discovery, and we release it as an open-source tool to accelerate progress in drug discovery and materials design.
@article{deepretro2026, title = {{DeepRetro} discovers retrosynthetic pathways through iterative large language model reasoning}, volume = {16}, issn = {2045-2322}, url = {https://doi.org/10.1038/s41598-026-38821-z}, doi = {10.1038/s41598-026-38821-z}, number = {1}, journal = {Nature Scientific Reports}, author = {Sathyanarayana, Shreyas Vinaya and Hiremath, Sharanabasava D. and Rahil Kirankumar, Shah and Panda, Rishikesh and Jana, Rahul and Singh, Riya and Irfan, Rida and Murali, Ashwin and Ramsundar, Bharath}, month = feb, year = {2026}, pages = {8448}, }
RSC DD
Selected work

ChemBERTa-3: An Open Source Training Framework for Chemical Foundation Models

R. Singh, A. A. Barsainyan, R. Irfan, and 15 more authors

Royal Society of Chemistry’s Digital Discovery, 2026

Abs DOI Bib Scholar 7

The rapid advancement of machine learning in computational chemistry has opened new doors for designing molecules, predicting molecular properties, and discovering novel materials. However, building scalable and robust models for molecular machine learning remains a significant challenge due to the vast size and complexity of chemical space. Recent advances in chemical foundation models hold considerable promise for addressing these challenges, but such models remain difficult to train and are often fully or partially proprietary. For this reason, we introduce ChemBERTa-3, an open source training and benchmarking framework designed to train and fine-tune large-scale chemical foundation models. ChemBERTa-3 provides: (i) unified, reproducible infrastructure for model pretraining and fine-tuning, (ii) systematic benchmarking tooling to evaluate proposed chemical foundation model architectures on tasks from the MoleculeNet suite, and (iii) fully open release of model weights, training configurations, and deployment workflows. Our experiments demonstrate that although both graph-based and transformer-based architectures perform well at small scale, transformer-based models are considerably easier to scale. We also discuss how to overcome the numerous challenges that arise when attempting to reproducibly construct large chemical foundation models, ranging from subtle benchmarking issues to training instabilities. We test ChemBERTa-3 infrastructure in both an AWS-based Ray deployment and in an on-premise high-performance computing cluster to verify the reproducibility of the framework and results. We anticipate that ChemBERTa-3 will serve as a foundational building block for next-generation chemical foundation models and for the broader project of creating open source LLMs for scientific applications. In support of reproducible and extensible science, we have open sourced all ChemBERTa3 models and our Ray cluster configurations.
@article{chemberta3_2026, doi = {10.1039/D5DD00348B}, title = {ChemBERTa-3: An Open Source Training Framework for Chemical Foundation Models}, author = {Singh, R. and Barsainyan, A. A. and Irfan, R. and Amorin, C. J. and He, S. and Davis, T. and Thiagarajan, A. and Sankaran, S. and Chithrananda, S. and Ahmad, W. and Jones, D. and McLoughlin, K. and Kim, H. and Bhutani, A. and Sathyanarayana, S. V. and Viswanathan, V. and Allen, J. E. and Ramsundar, B.}, journal = {Royal Society of Chemistry's Digital Discovery}, year = {2026}, }

Conference papers

Full conference publications in applied machine learning and scientific computing.

2022

IEEE I2CT

Selected work

Speech to Equation Conversion using a PoE Tagger

Peeta Basa Pati and V Shreyas

In 2022 IEEE 7th International conference for Convergence in Technology (I2CT), 2022

DOI Bib Scholar 4

@inproceedings{PoETagger,
  author = {Basa Pati, Peeta and Shreyas, V},
  booktitle = {2022 IEEE 7th International conference for Convergence in Technology (I2CT)},
  title = {Speech to Equation Conversion using a PoE Tagger},
  year = {2022},
  pages = {1-4},
  doi = {10.1109/I2CT54291.2022.9824252},
  url = {https://ieeexplore.ieee.org/document/9824252}
}

Workshop papers

Research prototypes, open-source systems, and workshop contributions around chemistry and scientific AI.

2025

ICML WS

DeepChem-Variant: A Modular Open Source Framework for Genomic Variant Calling

A. V. Bisoi, V Shreyas, J. Siguenza, and 1 more author

In Championing Open-source DEvelopment in ML Workshop at ICML 2025, 2025

Bib Scholar 1

@inproceedings{deepchem_variant2025,
  title = {DeepChem-Variant: A Modular Open Source Framework for Genomic Variant Calling},
  author = {Bisoi, A. V. and Shreyas, V and Siguenza, J. and Ramsundar, B.},
  booktitle = {Championing Open-source DEvelopment in ML Workshop at ICML 2025},
  year = {2025},
  url = {https://openreview.net/forum?id=jWZV6hWCeX}
}

AAAI WS

Open-source Polymer Generative Pipeline

D. Mohanty, V Shreyas, A. Palai, and 1 more author

In 4th Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE), 2025

Bib Scholar 0

@inproceedings{polymer_pipeline2025,
  title = {Open-source Polymer Generative Pipeline},
  author = {Mohanty, D. and Shreyas, V and Palai, A. and Ramsundar, B.},
  booktitle = {4th Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE)},
  year = {2025},
}

NeurIPS WS

Multi-Turn Human-LLM Interaction Through the Lens of a Two-Way Intelligibility Protocol

H. Mestha, K. Bania, V Shreyas, and 2 more authors

In MTI-LLM Workshop at NeurIPS 2025, 2025

Bib Scholar 0

@inproceedings{mti_llm2025,
  title = {Multi-Turn Human-LLM Interaction Through the Lens of a Two-Way Intelligibility Protocol},
  author = {Mestha, H. and Bania, K. and Shreyas, V and Liu, S. and Srinivasan, A.},
  booktitle = {MTI-LLM Workshop at NeurIPS 2025},
  year = {2025},
}

2024

NeurIPS WS

Selected work

Open Source Molecular Processing Pipeline for Generating Molecules

V Shreyas, Jose Siguenza, Karan Bania, and 1 more author

In Machine Learning and the Physical Sciences Workshop at NeurIPS 2024, 2024

Also presented at MoML 2024 and Baylearn 2024

DOI Bib Scholar 0

@inproceedings{openMolGen2024,
  title = {Open Source Molecular Processing Pipeline for Generating Molecules},
  author = {Shreyas, V and Siguenza, Jose and Bania, Karan and Ramsundar, Bharath},
  booktitle = {Machine Learning and the Physical Sciences Workshop at NeurIPS 2024},
  year = {2024},
  archiveprefix = {arXiv},
  primaryclass = {cs.LG},
  doi = {10.48550/arXiv.2408.06261},
  url = {https://arxiv.org/abs/2408.06261},
  note = {Also presented at MoML 2024 and Baylearn 2024}
}

AAAI WS

Selected work

Predicting ATP binding sites in protein sequences using Deep Learning and Natural Language Processing

V Shreyas and Swati Agarwal

In 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE), 2024

DOI Bib Scholar 0

@inproceedings{ATP-binding2024,
  title = {Predicting ATP binding sites in protein sequences using Deep Learning and Natural Language Processing},
  author = {Shreyas, V and Agarwal, Swati},
  booktitle = {3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE)},
  year = {2024},
  archiveprefix = {arXiv},
  primaryclass = {q-bio.BM},
  doi = {10.48550/arXiv.2402.01829},
  url = {https://arxiv.org/abs/2402.01829}
}

Preprints

Early-stage work that is publicly available and still evolving.

2024

arXiv

CountCLIP – [Re] Teaching CLIP to Count to Ten

Harshvardhan Mestha, Tejas Agrawal, Karan Bania, and 2 more authors

2024

DOI Bib Scholar 0

@misc{CountCLIP2024,
  title = {CountCLIP -- [Re] Teaching CLIP to Count to Ten},
  author = {Mestha, Harshvardhan and Agrawal, Tejas and Bania, Karan and Shreyas, V and Bhisikar, Yash},
  year = {2024},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV},
  doi = {10.48550/arXiv.2406.03586},
  url = {https://arxiv.org/abs/2406.03586}
}