Opportunities in ​Big Data in Materials Science

Research in the field of materials has made great strides in recent years, going from a basically applied and engineering-related approach to a position where it has great impact in other areas, including physics, chemistry, and biology. In this sense, theoretical and computational researches have been gaining a lot of space, mainly due to the great predictive capacity of these methodologies. Among the most used methodologies are the quantum mechanical methods, mainly those based on the Density Functional Theory, which gave the Nobel Prize in Chemistry to Walter Kohn in 1998. [1] Today this type of methodology reached its maturity, having guaranteed the principles of reproducibility and reliability of several computational codes, as recently published Science.

Brazil has several important scientific groups working in this area. Articles from São Paulo groups have already featured the covers of several important magazines, and many works became worldwide references. Today, in the state of São Paulo, there are about 20 research groups working with these methodologies. Some of them have already appeared on the covers of the most important international scientific journals, as can be seen in the image. Once the area has reached its maturity, we must pay attention to the continuous innovations of the sector, so that our community stays up to date and at the forefront of knowledge. A major action in this direction occurred when the US government launched the Materials Genome program to create a new era in the discovery and manufacture of materials, significantly increasing the speed of material discovery and keeping its cost at a fraction of what is today. The diagram on the left indicates the main concepts regarding this initiative.

img_bitdate-01One of the main actions in this direction has been the creation of large repositories of data on numerous materials, obtained through simulations of first principles. For this, robots are used that sweep the periodic table by joining elements in different crystalline structures, and calculating basic properties of these materials. Most of the simulated materials have not yet been studied and are sometimes not even stable. The amount of data available is huge. Some of the initiatives have more than 1 million simulated materials. This type of initiative was also the cover of one of the editions of the journal Nature in May 2016. [3] With the advent of these repositories, the area of computational simulation of materials must take a new direction, where the focus of the study will no longer be to discover the properties of a particular material, but rather to find out which materials have a property for a specific application.

img_bitdate-02There are about a dozen such initiatives in the world, and they are all still at an early stage. The area still seeks structuring and dissemination, and this means a great opportunity to make São Paulo an important partner in these initiatives. Among the main initiatives are NoMad, [4] led by German physicist Mathias Scheffler, Aflow, [5] from Duke University in the United States, and the Materials Project, [6] joint initiative between Berkeley and MIT. The possibilities of applications are very wide, through the search of materials for solar cells, thermoelectric materials, catalysis, batteries, magnetic materials, nanoelectronics among others. One of the recent applications refers to materials known as TCOs (transparent conductive oxides), which are used, for example, in smart windows. Today the most widely used and most studied material for this purpose is the ITO (Indium Tin Oxide). The question to ask is: “What other materials can be used for this type of application?”. Instead of going to the laboratory and growing all materials via ‘trial and error’, we can scan the databases for materials with a specific energy gap and specific effective masses that are within the desired values for the application. Once these materials are found, which may be outside the ITO family, this information is sent to the laboratory for the materials to be produced and tested. [7] Another application refers to thermoelectric materials. The search for better thermoelectric plants has been one of the main focuses of this community. The strategy has always been the same: grow a material, measure its properties and see if they are appropriate. This generates a great cost with low efficiency. With Big Data, you can scan databases by looking for materials with specific Figure  of Merit, and suggest that these materials be investigated and grown. This approach is much faster and cheaper than the other. [8]

Once established this new route of discovery and design of materials in the international scenario, it is imperative that the Brazilian community also take action on this front. Strategically, it is interesting that researchers from the State of São Paulo are part of the world initiatives already begun, acting as partners in the construction of databases and specializing in the search for materials, and certainly, to master the methodologies inherent in this area. In addition to the knowledge related to the area of Physics and Materials Science, there is much to be developed with respect to Computer Science, including optimized algorithms for search, machine learning and data mining.

A broader initiative in this area will bring direct benefits to Brazilian researchers, but it also has the potential to leverage the competitiveness of innovative companies, as it offers non-standard routes for the search for new materials. The Graduate Program in Nanosciences and Advanced Materials at UFABC is advancing in this direction, and its researchers are already developing projects in the area.

[1] W. Kohn, Nobel Lecture: Electronic structure of matter—wave functions and density functionals, Reviews of Modern Physics 71, 1253 (1999).
[2] K. Lejaeguere, et al., Reproducibility in density functional theory calculations of solids, Science 351, 1394 (2016).
[3] N. Nosengo, The material code, Nature, 533, 23 (2016).
[4] http://nomad-repository.eu/cms/
[5] http://www.aflowlib.org/
[6] https://materialsproject.org/
[7] G. Hautier, et al.Identification and design principles of low hole effective mass p-type transparent conducting oxides, Nature Comm. 4, 2291 (2013).
[8] S .Curtarolo, et al., The high-throughput highway to computational materials design, Nature Mat. 12, 191 (2013).