đŹWhy There Is No "AlphaFold for Materials" â AI for Materials Discovery with Heather Kulik
Latent Space: The AI Engineer Podcast
The podcast explores the application of AI and machine learning in accelerating the discovery of new materials, particularly in chemistry. Heather Kulik, a professor of chemical engineering at MIT, shares her work on using AI to predict and optimize material properties, such as making tougher plastics by uncovering unexpected chemical phenomena in polymer networks. She highlights the use of active learning to solve multidimensional challenges, like optimizing metal-organic frameworks for CO2 capture, considering factors such as stability and CO2 absorption. Kulik also addresses the limitations of current machine learning models, advocating for more diverse chemical bonding data and rigorous validation to replace conventional physics-based modeling.
Part 1: AI Discovery and Quantum Mechanics
00:00AI-Driven Discovery of Tougher Polymers: A Surprising Chemical Phenomenon
AI-Driven Discovery of Tougher Polymers: A Surprising Chemical Phenomenon
The discussion begins with the question of why one should learn chemistry when AI like ChatGPT has a PhD-level understanding. Heather Kulik shares her experience using AI in accelerated discovery of new materials. Her group uses AI to make predictions faster than traditional computational models. She highlights a project where AI screened thousands of materials, uncovering an unexpected chemical phenomenon that made a polymer four times tougher. The AI design surprised experimentalists, who then made it in the lab and confirmed the results. This tougher plastic has applications in improving overall durability and use of plastics.
03:30Unveiling Quantum Mechanics: AI's Role in Discovering Material Properties
Unveiling Quantum Mechanics: AI's Role in Discovering Material Properties
Heather Kulik explains the surprising chemical discovery, noting that the molecules break apart to make the overall structure tougher. The discovery involved a fully quantum mechanical phenomenon where electrons move around differently, stabilizing the molecule at the breaking point. This concept is similar to how catalysts and enzymes work, but had not been shown in polymer materials before. Kulik was drawn to data-driven discovery early on, excited by learning from patterns in data. She wanted to unearth broader trends in material behavior, rather than studying one molecule at a time.
Part 2: Methods, Frameworks, and ML Evolution
06:10From Cheminformatics to Neural Networks: Evolving Materials Design with AI
From Cheminformatics to Neural Networks: Evolving Materials Design with AI
Around 2015-2016, Heather Kulik shifted from calling her work "cheminformatics" to embracing machine learning. A student, John-Paul Jeunet, adapted materials design concepts into training neural networks. Kulik sees promise in machine learning for solving multidimensional challenges, such as optimizing metal-organic frameworks for CO2 capture. This involves balancing multiple objectives like stability, CO2 absorption, mechanical stability, and thermal stability. Machine learning offers significant speedups in optimizing across multiple dimensions, enabling the search for materials with multiple desirable properties.
08:48Metal-Organic Frameworks: Applications in Gas Storage, Sensing, and Catalysis
Metal-Organic Frameworks: Applications in Gas Storage, Sensing, and Catalysis
Metal-organic frameworks are useful for gas storage, sensing, separations, and CO2 capture. Their use in catalysis is limited by their stability. These frameworks can place precise chemical groups in specific orientations, allowing targeted interactions with guest molecules. Heather Kulik explains that metal-organic frameworks are like LEGOs for chemists, with building blocks that can be combined in infinite ways to create precise chemistry. Before AI, Kulik used quantum mechanical modeling to understand transition metal catalysis, which is computationally costly, taking hours to weeks for a single prediction.
12:25Accelerating Quantum Mechanics: Using ML to Predict the Best Approximation
Accelerating Quantum Mechanics: Using ML to Predict the Best Approximation
Heather Kulik's work involves accelerating quantum mechanical predictions and using machine learning models to predict the best approximation to use based on the material studied. The quantum mechanical wave function is used as input to neural networks to predict the right method. Kulik addresses the question of whether it's necessary to learn chemistry when AI like ChatGPT has a PhD-level understanding. She finds that ChatGPT is good at Wikipedia-level chemistry knowledge, but struggles with more complex tasks like designing a ligand with specific properties. She emphasizes the importance of learning chemistry well enough to know when these models are right or wrong.
Part 3: Data Challenges and Model Rigor
15:57Overcoming Data Limitations: Machine Learning Gaps in Chemistry
Overcoming Data Limitations: Machine Learning Gaps in Chemistry
Heather Kulik identifies gaps in machine learning for chemistry, particularly in reactivity predictions and diverse chemical bonding. There is a lack of data on complex phenomena involving multiple elements. Existing datasets are often focused on "boring chemistry" like organic molecules binding to proteins. She suggests the need for more data sets and leaderboards for areas like how matter behaves when light is shined on it. She notes that while there are repositories of DFT data on crystalline materials, the data often comes from low-fidelity density functional theory, and lacks experimental ground truth.
18:51Transparency Needed: Can ML Models Replace Physics-Based Modeling?
Transparency Needed: Can ML Models Replace Physics-Based Modeling?
Heather Kulik discusses the need for a community challenge to break open problems in material science, similar to CASP in the protein world. She questions whether machine learning potentials can really replace conventional physics-based modeling. While new foundation models look good initially, they can exhibit "wacky things" like molecules falling apart. She calls for more rigor in evaluating whether these models can truly replace physics-based modeling, rather than just fitting data that may lack quality.
21:30Bridging the Gap: Automation and Experimental Chemistry
Bridging the Gap: Automation and Experimental Chemistry
The discussion shifts to bridging the gap between computational design and experimental chemistry. Heather Kulik notes that high-throughput synthesis and experimentation are being explored, but some experiments are easier for humans than for autonomous systems, and vice versa. She also highlights the importance of the process in getting materials to the device scale, which is an area where machine learning is still at "ground zero."
23:39Material Science Data: Challenges in Bonding and Experimental Validation
Material Science Data: Challenges in Bonding and Experimental Validation
Heather Kulik explains that while experimental structures are available, the challenge is that there are many more building blocks in materials than in proteins, leading to more ways to think about chemical bonding. She notes that no potentials are robustly encoding all of this bonding, especially with respect to metal-organic bonding. Even at ground state properties, there are too many parameters and not a clear set of interactions limited to a small number of building blocks. There is no real way to know if the models are right or wrong at bigger lane scales and time scales, because the experimental data is not there.
Part 4: Future Initiatives and Academic Role
26:42Integrating Textual Information: Enhancing Models with Literature Data
Integrating Textual Information: Enhancing Models with Literature Data
Heather Kulik discusses integrating textual information from papers into AI models. They use natural language processing and graph digitization to extract datasets of properties from the literature. One challenge is that people interpret their results in different ways, which can bias the models. LLMs are sensitive to false positives, requiring significant time to check the accuracy of ingested data. To address the bias towards previously reported results, they train models on literature but apply them to new structures.
29:59Data Initiatives: High-Throughput Automation and Public Data Sharing
Data Initiatives: High-Throughput Automation and Public Data Sharing
Heather Kulik suggests creating an initiative or multi-institutional funding resource to get data in a high-throughput automated way. She envisions user facilities where computational researchers can design experiments and have them executed, with data collected publicly. She emphasizes the need for systematization of how results get reported so that they can be machine learning ready from day one. She also notes the importance of materials for climate change.
32:29The Role of Academia: Creativity and Niche Problems
The Role of Academia: Creativity and Niche Problems
Heather Kulik reflects on the role of academia in light of recent materials investment for private companies. She notes that companies have access to more compute, so academics need to focus on more creative problems that don't require brute force compute. She highlights her group's code, MOL Simplifier, for transition metal complex structure generation and metal organic framework screening, and encourages listeners to try it out and provide feedback.
Sign in to continue reading, translating and more.
Open full episode in Podwise