Loading Events

« All Events

AI-MI Seminar Series: Literature-Informed Agents for Drug Discovery

March 26 @ 3:00 pm - 4:00 pm

Join us to explore how researchers are using AI to unlock chemical and biological knowledge hidden in scientific literature. The talk will highlight new approaches for automatically extracting data from papers and patents to build large-scale datasets that power molecule and protein design—advancing drug discovery, molecular property prediction, and AI-driven therapeutic development.

Watch live at youtube.com/@AIMaterialsInstitute

Topic: AI-driven therapeutic design within chemistry and biochemistry has become one of the most exciting areas of growth for the field, with promising successes in protein folding, antibody and de novo molecule and protein design, antibiotic discovery, and many others. Much of this success relies on a myriad of manually curated repositories that provide data on the structure, function, and biological activity for proteins, small molecules, genetic variants, and other biological entities of interest. Resources like ChEMBL are laboriously constructed by human scientists combing through tens of thousands of papers. However, despite this wealth of curated data, the majority of our knowledge of chemistry, biology, and medicine remains “locked” in natural-language text found in publications, patents, and other articles. In this talk, I’ll discuss a large-scale effort we are undertaking to automatically construct large-scale resources like these using AI in a handful of days and under $100. We are working on extracting rich data for multimodal supervision in a systematic and automated way. Our results so far achieve excellent performance–both supervised and zero-shot–on an enormous variety of molecule property prediction tasks, with even small 15 million parameter models leveraging our large dataset as pretraining outperforming much larger, 2B+ parameter models released by Google. We’ll also show how these literature tools can be incorporated into a very simple agentic molecular design pipeline that significantly outperforms molecular design tools across Bayesian optimization, AlphaEvolve, Guided diffusion, and other LLM-based and agentic design baselines.

Speaker: Jake Gardner is an Assistant Professor at the University of Pennsylvania in the Computer and Information Science department. His lab does research spanning both the practice and theory of probabilistic machine learning. He’s particularly interested in how he can use techniques like generative modelling and Bayesian optimization to solve challenging design and optimization problems in the natural sciences, like discovering new and more efficient antibiotics, vaccines, antibodies, materials, and more. Before he joined Penn, he was a Research Scientist at Uber AI Labs. Before this, he was a Postdoctoral Associate in Operations Research and Information Engineering at Cornell University. He received his Ph.D. in Computer Science from Cornell University, where he was advised by Kilian Weinberger.

Details

Organizer