A team from the Département d'Astrophysique, in collaboration with the start-up Iris.AI, has shown that one could find, in select biology studies, some relevant information to better understand the interstellar medium. These results will soon appear in the Journal of Interdisciplinary Methodologies and Issues in Science.
The goal of this study was to explore the enigma of the Diffuse Infrared Bands (DIBs), a forest of hundreds of absorption bands in the visible range, whose origin is still unknown, exactly one century after their discovery in 1922. This pluridisciplinary team implemented the technique of Natural Language Processing (NLP), a branch of machine learning applied to the analysis of written texts. These bands are present as soon as we observe a stellar spectrum. A few years after their discovery, astrophysicists had understood that their origin was interstellar, because their intensity is correlated with the extinction by dust along the line of sight and independent of stellar characteristics. It is now known that DIBs must be carried by organic molecules of about 100 atoms, but it is not known which ones. The multidisciplinary team, made up of astrophysicists and developers of computational linguistics tools from the company Iris.AI, has implemented the technique of Automatic Language Processing (ALP), one of the branches of statistical learning that is applied to the analysis of written texts. They developed an artificial intelligence which read 1.5 million scientific articles from all fields, and specialized itself by reading the thousand of papers about DIBs.
This artificial intelligence pointed the researchers toward a dozen of publications, all in biology, measuring molecular transitions matching certain DIBs. These studies can be separated into two categories. A first series corresponds to studies of the visual pigments of diverse animals, including the elephant shark, the dolphin and some butterflies. The observed transitions of these pigments all originate from chromophores, a family of organic molecular groups. The second category deals with the heme molecule (which is at the basis of hemoglobin) or some of its derivatives. It happens that the existence in the interstellar medium of these two classes of molecules has been independently proposed in the literature, but the connection with DIBs had not been done, until now.
These results thus show us new research directions to isolate and study these families of molecules in the laboratory. Thanks to this proof of concept, the collaboration will now refine the code and explore a larger database of bibliographic data. This study also demonstrate the relevance of NLP to make sense out of interdisciplinary information in order to propose new ideas, a method that will likely become more important with the development of artificial intelligence.