Technical Articles

Sifting Through Multisource Data for Safer Battery Materials with Machine Learning

By Austin D. Sendek, Stanford University

On June 14, 2016, RoboSimian, an ape-like robot built by Jet Propulsion Laboratory researchers to rescue people from disaster areas, exploded in the lab and caught fire. The following year, a major cell phone manufacturer issued a global recall of its new tablets after reports of fires and explosions. Since then, there have been numerous accounts of similar incidents. In each case, lithium-ion batteries were identified as the root cause.

The problem with these batteries is their liquid electrolytes, which tend to vaporize or catch fire if a battery-powered device can't cool off quickly enough. Researchers are searching for solid electrolyte materials with good ionic conductivity and electrochemical stability to replace these potentially dangerous liquid electrolytes, but the search has been slow-going. It can take weeks to evaluate a single candidate material through experimentation or simulation, and there are more than 12,000 lithium-containing crystalline solids in the Materials Project database that could be promising candidates—not to mention the many thousands or millions of materials not yet cataloged.

Using a machine learning model developed in MATLAB®, my colleagues and I found the needle in the haystack: a handful of exceptional solid electrolytes out of the more than 12,000 that we analyzed. Trained on a set of known good electrolytes and their atomistic structures, our MATLAB model appears to be over three times more likely to identify promising new materials than random guessing, and two times more likely than Stanford graduate students working in the field.

Lithium-Ion Battery Basics: The Problem with Liquid Electrolytes

In lithium-ion batteries, lithium ions migrate through the electrolyte as the battery is charged and discharged. Because water reacts with lithium, battery manufacturers use organic solvents rather than water-based solvents for the electrolyte. This is where the problem comes in: Unlike water, organic liquids such as gasoline, hair spray, and nail polish remover are typically flammable and unstable. 

In addition to safety issues, liquid electrolytes have at least two other drawbacks. First, using them to create higher voltage batteries is difficult because they tend to break down as the voltage driven across them increases. Second, they do little to prevent a phenomenon known as dendrite growth, a leading cause of early battery death. Taken altogether, these disadvantages provide compelling motivation to find a suitable solid-state electrolyte.

Assembling Data from Multiple Sources

Under the supervision of Professor Evan Reed, we began by aggregating data from three sources: the Materials Project database, published papers, and the Inorganic Crystal Structure Database (ICSD), an online database of experimentally validated atomistic structures.

First, we identified all 12,831 lithium-containing solids in the Materials Project database. We eliminated more than 92% of this initial set after screening for structural stability, chemical stability, and low electronic conductivity. In addition, we compiled information on the earth abundance of the materials and their predicted costs. This initial screening left us with over 300 stable candidate materials that might be promising solid electrolyte materials if only their lithium conductivity was fast enough. To accomplish this, we turned to machine learning. [1]

We began by combing through the scientific literature to find 40 solid crystalline materials for which researchers had characterized the crystal structure and measured the ionic conductivity at room temperature. About one-third of these 40 materials had sufficient ionic conductivity to be useful battery electrolytes, although these materials all have stability issues that prevent them from being adopted in solid state batteries. This mix of 40 fast and slow lithium-conducting materials would serve as a training set for a machine learning algorithm to rapidly predict the lithium conduction behavior in new materials.

We then downloaded the atomistic structures for these 40 materials from the ICSD. Using this data, we computed 20 features that characterize the local atomic arrangements and chemistry in each of the crystals based on the positions, masses, electronegativities, and atomic radii of the atoms in the structure. These computations were all performed in MATLAB. The 20 features that we selected included atomistic metrics such as volume per atom, lithium bond ionicity, number of lithium neighbors, and minimum anion-anion separation distance. We believed that these 20 features might be correlated with ionic conductivity, based either on our intuition or on previous reports in the literature. We found that the use of such “smart” features—that is, features based on preexisting knowledge of materials physics—is essential when applying machine learning to such a small dataset.

Selecting a Machine Learning Model

The next question was: Which combination of these 20 features would best predict the training data? Given our relatively small training set of 40 materials and just 20 features, and the ease and flexibility in modeling offered by MATLAB, we were able to consider more than 10,000,000 possible combinations of features and models. 

Statistics and Machine Learning Toolbox™ made it easy to explore these numerous models, including least squares regression, robust regression, locally weighted least squares, SVMs, logistic regression, and multiclass classification. We trained a model for each machine learning algorithm we wanted to test, and then validated the accuracy of the algorithm against our training data. 

None of the models trained with atomistic features alone provided enough predictive power for ionic conductivity, but multifeature models did. Ultimately, we identified an optimal logistic regression model with five features that was capable of classifying the training set materials with as little as 10% cross-validated error. This made sense to us, since logistic regression classifiers tend to perform well with small training sets like ours. This logistic regression classifier would give a binary prediction: Does this material exhibit sufficient lithium conductivity to be useful as a solid electrolyte material, or not? Our trained model made this prediction accurately 9 times out of 10.

We then turned this trained model loose on our 300+ remaining candidate materials (Figure 1).

Figure 1. Candidates identified by the machine learning model.

Figure 1. Candidates identified by the machine learning model.

The classifier enabled us to eliminate 93.3% of these candidate materials, leaving just 21 potential candidates from the original 12,831. Once the model was trained, this screening step took only seconds to complete. All in all, we eliminated 99.8% of candidate materials through our screening process.

Results and Next Steps

To test the validity of the predictions, we simulated lithium conduction in these materials using accurate but slow quantum physics-based simulations. [2] So far, we’ve found that when we follow the recommendations of the machine learning–based model, we discover new lithium ion conducting materials three times faster than if we use simple trial and error. We even tested the model against human intuition by giving the same list of randomly drawn materials to both the model and a group of Stanford Ph.D. students in materials science. The model was twice as accurate as the students in identifying good lithium conductors as the students while making predictions in less than one thousandth of the time.

Some of the candidate materials identified by our model were completely unexpected. The atomistic structures for these materials were so complicated that we had no scientific intuition to help us determine whether the materials would have sufficient ionic conductivity. When it turned out that they did conduct, as the model predicted, it helped validate our intuition. We can now incorporate what we have learned into future versions of our MATLAB machine learning model, which we expect will improve as more experimental data is reported. One of our discovered materials was so exciting that we patented it and immediately found an interested corporate partner to license the patent and continue researching the material.

We continue to perform some of these examinations, both here at Stanford and in collaboration with outside groups that are conducting studies on individual candidate materials. In the near future, one of these candidate materials may prove to be the solid electrolyte that replaces liquid electrolytes in lithium-ion batteries and makes exploding battery packs a thing of the past.

Stanford University is among the nearly 1000 universities worldwide that provide campus-wide access to MATLAB and Simulink. With the Total Academic Headcount (TAH) License, researchers, faculty, and students have access to a common configuration of products, at the latest release level, for use anywhere—in the classroom, at home, in the lab, or in the field.

About the Author

Austin D. Sendek is a Ph.D. candidate at the Department of Applied Physics at Stanford University, working with Prof. Evan Reed of the Department of Materials Science and Engineering. His research interests include the development and deployment of new computational approaches, grounded in concepts from machine learning and artificial intelligence, to accelerate the design of materials for energy storage applications.

Published 2018


  1. Sendek, A.D. et al. "Holistic Computational Structure Screening of more than 12,000 Candidates for Solid Lithium-ion Conductor Materials." Energy Environ. Sci. (2016). doi:10.1039/C6EE02697D.

  2. Sendek, A.D. et al. "Machine learning-assisted discovery of many new solid Li-ion electrolyte materials." arXiv:1808.02470 (2018).