Navigating the Maze of Material Science: The Pitfalls of Brute-Force Machine Learning Approaches

Shamit Shrivastava
4 min readJul 9, 2023

--

Latent space of materials and the laws of thermodynamics (Image Courtesy Dall E)

TDLR: Thermodynamics needs to be a priory for any foundational model of atoms or materials.

In our modern scientific landscape, machine learning stands as a beacon of hope for decoding complex patterns and relationships. This is wonderfully illustrated by DeepMind’s success in protein folding prediction, a triumph attributed to machine learning models trained on sequence and structure data. This achievement has sparked attempts to adopt similar brute-force learning methods within the wider domain of material science. However, a deeper dive reveals that this leap is not as simple as it might seem and comes with its own unique challenges.

Structure to Properties: A Single Point in a Vast Landscape

Brute-force approaches rely on training machine learning models on enormous volumes of data regarding the structure and properties of molecules and materials, with the goal of uncovering underlying patterns and relationships. However, this leap from structure to properties is a complex one and overlooks a critical detail.

Consider a molecule’s structure — the arrangement of its atoms and bonds — as a single point on an intricate, high-dimensional map. This map represents all the possible states or configurations the molecule can adopt under different conditions, with each dimension being a variable like temperature or pressure. The structure of a molecule is just one such point, a snapshot of the molecule’s configuration at a specific moment under a set of conditions. To accurately predict material behavior, we must consider not only this single point but the entire phase space — the high-dimensional map of all possible configurations.

Missing Information: Behavior and State Diagrams

If a foundational model in material science aims to predict the “equation of state” for any material, it attempts to offer a mathematical model that describes a material’s properties under different conditions. However, this equation is not solely a product of a material’s structure; it also encompasses its behavior.

Here lies the shortcoming of the brute-force approach. Behavior, which is represented by state diagrams illustrating how a system transitions between various states under different conditions, is generally not inferred from structure alone. Yet, these diagrams are often absent from the structural and property data used in brute-force learning approaches.

A state diagram, also known as a phase diagram, is a type of graphical representation that illustrates the different states or phases a system can adopt under varying conditions. These diagrams typically plot variables such as pressure, temperature, volume, or concentration against each other to map out the conditions that cause the system to transition from one state to another.

One of the simplest examples of a state diagram is the phase diagram for water. This diagram depicts the states of water (solid, liquid, and gas) and the conditions that instigate transitions between these states. For instance, at sea level (a pressure of 1 atmosphere), water transitions from a solid to a liquid at 0 degrees Celsius and from a liquid to a gas at 100 degrees Celsius.

This concept extends to more complex systems, such as steam engines, where water is heated under pressure to create steam, which expands and propels a piston to create motion. The state diagram for water explicates that increasing temperature at constant pressure causes water to shift from a liquid state to a gas, which is the primary principle behind steam engine operation.

When we consider intricate materials like proteins, polymers, or their mixtures, state diagrams become significantly more complex. These substances can adopt different configurations based on multiple variables, such as temperature, pressure, and other factors like molecular size, solvent properties, and concentration.

For instance, a protein or polymer might exhibit a hard and brittle solid phase at low temperatures. As the temperature increases, it could transition into a flexible and ductile solid and, at sufficiently high temperatures, transform into a viscous liquid. The state diagram for these materials maps out these various states and the conditions that initiate transitions between them.

The high dimensionality of these state diagrams for complex materials stems from the multitude of factors influencing their behavior. Despite their complexity, these diagrams provide invaluable insights into how these materials will behave under different conditions. This understanding is pivotal for the utilization of such complex materials in diverse applications, from drug delivery and tissue engineering to the creation of novel plastics and paints.

Guidance: Laws of Thermodynamics

On top of the challenge posed by incomplete data, the brute-force approach neglects an invaluable compass in the world of material science: the laws of thermodynamics. These fundamental laws govern how matter and energy behave and interact. Any equation of state for a material must adhere to these laws.

Training machine learning models to generate equations of state without considering the laws of thermodynamics as priors is like attempting to navigate an intricate maze without a map. It not only requires vast amounts of data and computational resources, but it also runs the risk of producing models that fit the training data but contradict these fundamental laws.

Therefore, we must seek clarity when discussing the development of foundational machine-learning models of atoms. If the goal is to predict material behavior and derive their equations of state, it’s scientifically unsound, and impractical, to disregard the laws of thermodynamics.

So, while the brute-force approach may shine in certain domains, it presents significant challenges in the realm of material science. A more practical and scientifically sound approach would involve using our existing knowledge, especially the laws of thermodynamics, to guide the training of machine learning models. This way, we can create models that accurately predict material properties and align with the fundamental principles of the universe.

--

--

Shamit Shrivastava
Shamit Shrivastava

Written by Shamit Shrivastava

Biophysics of sound in membranes and its applications. Post Doctoral Researcher, Engineering Sciences, University of Oxford, UK www.shamits.org

No responses yet