The first study out of the NSF Artificial Intelligence Materials Institute — led by AI-MI Director Eun-Ah Kim, the Hans A. Bethe Professor of Physics at Cornell, with collaborators at Google — puts large language models to the test as scientific “world models.”
A panel of 12 human experts evaluated six leading LLM systems on their grasp of the high-temperature superconductivity literature. The models proved strong at extracting text but “totally incapable” of engaging with data visualizations, and the study lays out a concrete wish list for improving scientific AI.
The work was supported by the NSF Artificial Intelligence Materials Institute (award no. 2433348). Read the study in PNAS, or the coverage in the Cornell Chronicle.

