Researchers from Nanyang Technological University and their collaborators have successfully harnessed the power of Chat-GPT to streamline text parsing for solid-state synthesis, focusing on ternary chalcogenides. This innovative approach aims to optimize the synthesis of high-quality crystalline materials, pivotal for advancing thermoelectric devices. The study, led by Dr. Kedar Hippalgaonkar from Nanyang Technological University, with contributions from Dr. Maung Thway, Mr. Andre Low, Dr. Haiwen Dai, Dr. Jose Recatala-Gomez, and Dr. Andy Chen, also from Nanyang Technological University, and Mr. Samyak Khetan from the Indian Institute of Technology Bombay, was published in the journal Digital Discovery.

Solid-state synthesis is a critical method for discovering new inorganic materials, particularly those used in thermoelectric applications, which convert heat into electricity. Traditional approaches to data-driven synthesis require meticulous manual extraction and cleaning of synthesis recipes from vast bodies of text. This process is not only time-consuming but also presents a high barrier to entry, especially for materials with sparse literature.

To address these challenges, the team proposed using large language models (LLMs) like GPT-3.5, available within Chat-GPT for parsing synthesis recipes, capturing essential synthesis information intuitively in terms of primary and secondary heating peaks. By developing a domain-expert curated dataset (Gold Standard), they engineered a prompt set for Chat-GPT to replicate this dataset (Silver Standard) with remarkable accuracy.

The research focused on the synthesis of ternary chalcogenides, such as CuInTe/Se, known for their thermoelectric properties at intermediate temperatures. From a database of research papers, Chat-GPT successfully parsed a significant portion, which were then used to develop a classifier to predict phase purity. This methodology demonstrates the generalizability of LLMs for text parsing, offering a potentially transformative paradigm in the synthesis and characterization of novel materials.

Dr. Hippalgaonkar emphasized the significance of their work, stating, “Our methodology provides a roadmap for future endeavors seeking to amalgamate LLMs with materials science research, heralding a potentially transformative paradigm in the synthesis and characterization of novel materials.”

The researchers meticulously extracted data from published papers between 2000 and 2023, focusing on CuInTe/Se while excluding methods like solution synthesis and the Bridgman method. They identified key aspects crucial for attaining pure compounds: primary heating, secondary heating, annealing, and densification. The prompts were optimized iteratively, ensuring the extraction of relevant synthesis details in a structured format.

The extracted data allowed for a comprehensive analysis of synthesis conditions, revealing that secondary heating, annealing, and primary heating significantly impact phase purity. Their decision tree classifier demonstrated the potential of using machine learning to predict synthesis outcomes based on text-parsed data.

“Data in solid-state synthesis can be biased towards positive recipes and balanced datasets are necessary to move the field forward” said Dr. Hippalgaonkar. Dr. Thway agreed saying, “Our methodology demonstrates the generalizability of Large Language Models (LLMs) for text parsing, specifically for materials with sparse literature”. Their work also demonstrated the potential for Chat-GPT to interpolate and extrapolate synthesis conditions for similar materials, suggesting a practical approach for synthesizing new compounds. 

This research underscores the importance of integrating advanced AI tools with traditional materials science methodologies, paving the way for more efficient and accurate synthesis processes. Dr. Hippalgaonkar and his team’s success with Chat-GPT opens new avenues for leveraging LLMs in scientific research, particularly in fields with limited literature and complex data extraction needs.

Journal Reference

Maung Thway, Andre K. Y. Low, Samyak Khetan, Haiwen Dai, Jose Recatala-Gomez, Andy Paul Chen, and Kedar Hippalgaonkar. “Harnessing GPT-3.5 for text parsing in solid-state synthesis – case study of ternary chalcogenides.” Digital Discovery, 2024. DOI: https://doi.org/10.1039/D3DD00202K

About the Authors

Associate Professor Kedar Hippalgaonkar is a NRF Fellow (Class of 2021) and a joint appointee with the Materials Science and Engineering Department at Nanyang Technological University (NTU) and as a Senior Scientist at the Institute of Materials Research and Engineering (IMRE) at the Agency for Science Technology and Research (A*STAR). He led the Accelerated Materials Development for Manufacturing (AMDM) program from 2018-2023 focusing on the development of new materials, processes and optimization using Machine Learning, AI and high-throughput computations and experiments in electronic and plasmonic materials and polymers. He was also leading the Pharos Program on Hybrid (inorganic-organic) thermoelectrics for ambient applications from 2016-2020. He has published over 70 research papers, has co-founded a startup (Xinterra, Inc.), won the Ministry Of Education START Award in 2021 and was nominated as a Journal of Materials Chemistry Emerging Investigator in 2019. He was recognized as a Science and Technology for Society Young Leader in Kyoto in 2015. For his outstanding graduate research, he was awarded the Materials Research Society Silver Medal in 2014. Funded through the A*STAR National Science Scholarships, he graduated with a Bachelor of Science (Distinction) from the Department of Mechanical Engineering at Purdue University in 2003 and obtained his Doctor of Philosophy from the Department of Mechanical Engineering at UC Berkeley in 2014. While pursuing his doctoral studies, he conducted research on fundamentals of heat, charge, and light in solid state materials.

Dr. Maung Thway is a research fellow at the Applications of Teaching & Learning Analytics for Students (ATLAS) of Nanyang Technological University. His research involves studying the impact of Gen-AI applications in learning at the university level. Previously, he was a research fellow at School of Materials Science and Engineering under Associate Professor Kedar Hippalgaonkar, where he developed methodologies to accelerate materials discovery. He received his PhD degree in Electrical Engineering from National University of Singapore, Singapore, in 2020. His research during PhD included fabrication, characterization, and integration of perovskite/Si and III-V/Si tandem solar cells.

Andre KY Low is a postgraduate student in the Materials Science and Engineering Department at Nanyang Technological University in Singapore, supervised by Associate Professor Kedar Hippalgaonkar. His thesis is on development and application of constrained multi-objective optimization algorithms for accelerating materials discovery. Andre is recipient of the A*STAR Graduate Scholarship, affiliated with the Institute of Materials Research and Engineering. Andre previously earned his Bachelors in Materials Science and Engineering from Nanyang Technological University as the Valedictorian for the graduating class of 2021.

Jose Recatalà Gómez is a research fellow in the Materials Science and Engineering Department at Nanyang Technological University in Singapore, working in Associate Professor Kedar Hippalgaonkar’s team. He specializes in integrating Generative AI and machine learning with high-throughput solid-state synthesis to discover inorganic materials for energy and environmental applications. Jose earned his Bachelor’s in Chemistry from Universitat Jaume I, Spain, in 2015, a Master’s in Advanced Materials from Universidad Autónoma de Madrid, Spain, in 2016, and a PhD from the University of Southampton, England, in 2021. He was awarded an A*STAR Research Attachment Programme (ARAP) scholarship and spent two years at the Institute of Materials Research and Engineering (IMRE) in Singapore.