Cosmology and Statistics

The XMM Heritage project on the DEEP Euclid Fornax field aims to characterize distant galaxy clusters by comparing X-ray and optical/IR detections. The two methods call on very different cluster properties; ultimately, their combination will make it possible to set the free parameters of the Euclid cluster selection function over the entire WIDE survey, and thus constitute a fundamental ingredient for Euclid cosmological analysis.

The targeted redshift range ([1-2]) has never been systematically explored, despite being a critical area for the use of clusters in cosmology.
With FornaX, for the first time we'll have access to a large volume at these redshifts, enabling us to statistically quantify the evolution of clusters: role of AGNs in the properties of intracluster gas? Are there massive gas-deficient clusters? What are the respective biases of X-ray and optical detection?
The thesis work will involve (1) building and validating the X-ray cluster catalog; (2) correlating it with the optical/IR catalogs obtained by Euclid; and (3) studying the combined X-ray and optical evolution of the clusters.

All the algorithms for detecting and characterizing clusters in XMM images already exist, but we'll be pushing detection even further by using artificial intelligence techniques (combining spatial and spectral information on sources).
The complex problem of spatial correlation between XMM and Euclid cluster catalogs will also involve AI.

Project website: https://fornax.cosmostat.org/

Source clustering impact on Euclid weak lensing high-order statistics

SL-DRF-25-0341

Research field :

Astrophysics

Location :

Direction d’Astrophysique (DAP)

Laboratoire CosmoStat (LCS)

Saclay

Contact :

Natalia Porqueres

Jean-Luc STARCK

Starting date : 01-10-2025

Contact :

Natalia Porqueres
CEA - DRF

+33169085764

Thesis supervisor :

Jean-Luc STARCK
CEA - DRF/IRFU/DAP/LCS

01 69 08 57 64

Personal web page : http://jstarck.cosmostat.org

Laboratory link : http://www.cosmostat.org

More : https://www.physics.ox.ac.uk/our-people/porqueres

In the coming years, the Euclid mission will provide measurements of the shapes and positions of billions of galaxies with unprecedented precision. As the light from the background galaxies travels through the Universe, it is deflected by the gravity of cosmic structures, distorting the apparent shapes of galaxies. This effect, known as weak lensing, is the most powerful cosmological probe of the next decade, and it can answer some of the biggest questions in cosmology: What are dark matter and dark energy, and how do cosmic structures form?
The standard approach to weak lensing analysis is to fit the two-point statistics of the data, such as the correlation function of the observed galaxy shapes. However, this data compression is sub- optimal and discards large amounts of information. This has led to the development of several approaches based on high-order statistics, such as third moments, wavelet phase harmonics and field-level analyses. These techniques provide more precise constraints on the parameters of the cosmological model (Ajani et al. 2023). However, with their increasing precision, these methods become sensitive to systematic effects that were negligible in the standard two-point statistics analyses.
One of these systematics is source clustering, which refers to the non-uniform distribution of the galaxies observed in weak lensing surveys. Rather than being uniformly distributed, the observed galaxies trace the underlying matter density. This clustering causes a correlation between the lensing signal and the galaxy number density, leading to two effects: (1) it modulates the effective redshift distribution of the galaxies, and (2) it correlates the galaxy shape noise with the lensing signal. Although this effect is negligible for two-point statistics (Krause et al. 2021, Linke et al. 2024), it significantly impacts the results of high-order statistics (Gatti et al. 2023). Therefore, accurate modelling of source clustering is critical to applying these new techniques to Euclid’s weak lensing data.
In this project, we will develop an inference framework to model source clustering and asses its impact on cosmological constraints from high-order statistics. The objectives of the project are:
1. Develop an inference framework that populates dark matter fields with galaxies, accurately modelling the non-uniform distribution of background galaxies in weak lensing surveys.
2. Quantify the source clustering impact on the cosmological parameters from wavelet transforms and field-level analyses.
3. Incorporate source clustering in emulators of the matter distribution to enable accurate data modelling in the high-order statistics analyses.
With these developments, this project will improve the accuracy of cosmological analyses and the realism of the data modelling, making high-order statistics analyses possible for Euclid data.

Machine-learning methods for the cosmological analysis of weak- gravitational lensing images from the Euclid satellite

SL-DRF-25-0367

Research field :

Astrophysics

Location :

Direction d’Astrophysique (DAP)

Laboratoire CosmoStat (LCS)

Saclay

Contact :

Martin Kilbinger

Samuel Farrens

Starting date : 01-10-2025

Contact :

Martin Kilbinger
CEA - DRF/IRFU/DAp/LCS

21753

Thesis supervisor :

Samuel Farrens
CEA - DRF/IRFU/DAP/LCS

28377

Personal web page : http://www.cosmostat.org/people/kilbinger

Laboratory link : http://www.cosmostat.org

Weak gravitational lensing, the distortion of the images of high-redshift galaxies due to foreground matter structures on large scales, is one
of the most promising tools of cosmology to probe the dark sector of the Universe. The statistical analysis of lensing distortions can reveal
the dark-matter distribution on large scales, The European space satellite Euclid will measure cosmological parameters to unprecedented accuracy. To achieve this ambitious goal, a number of sources of systematic errors have to be quanti?ed and understood. One of the main origins of bias is related to the detection of galaxies. There is a strong dependence on local number density and whether the galaxy's light emission overlaps with nearby
objects. If not handled correctly, such ``blended`` galaxies will strongly bias any subsequent measurement of weak-lensing image
distortions.
The goal of this PhD is to quantify and correct weak-lensing detection biases, in particular due to blending. To that end, modern machine-
and deep-learning algorithms, including auto-di?erentiation techniques, will be used. Those techniques allow for a very e?cient estimation
of the sensitivity of biases to galaxy and survey properties without the need to create a vast number of simulations. The student will carry out cosmological parameter inference of Euclid weak-lensing data. Bias corrections developed during this thesis will be included a prior in galaxy shape measurements, or a posterior as nuisance parameters. This will lead to measurements of cosmological parameters with an reliability and robustness required for precision cosmology.

Generative AI for Robust Uncertainty Quantification in Astrophysical Inverse Problems

SL-DRF-25-0514

Research field :

Astrophysics

Location :

Direction d’Astrophysique (DAP)

Laboratoire CosmoStat (LCS)

Saclay

Contact :

Tobias LIAUDAT

François LANUSSE

Starting date : 01-10-2025

Contact :

Tobias LIAUDAT
CEA - DRF/IRFU/DEDIP

07 83 88 91 52

Thesis supervisor :

François LANUSSE
CEA - DRF/IRFU/DAp

+33 6 70 76 38 33

Personal web page : https://flanusse.net

Laboratory link : https://www.cosmostat.org

More : https://tobias-liaudat.github.io

Context
Inverse problems, i.e. estimating underlying signals from corrupted observations, are ubiquitous in astrophysics, and our ability to solve them accurately is critical to the scientific interpretation of the data. Examples of such problems include inferring the distribution of dark matter in the Universe from gravitational lensing effects [1], or component separation in radio interferometric imaging [2].

Thanks to recent deep learning advances, and in particular deep generative modeling techniques (e.g. diffusion models), it now becomes not only possible to get an estimate of the solution of these inverse problems, but to perform Uncertainty Quantification by estimating the full Bayesian posterior of the problem, i.e. having access to all possible solutions that would be allowed by the data, but also plausible under prior knowledge.

Our team has in particular been pioneering such Bayesian methods to combine our knowledge of the physics of the problem, in the form of an explicit likelihood term, with data-driven priors implemented as generative models. This physics-constrained approach ensures that solutions remain compatible with the data and prevents “hallucinations” that typically plague most generative AI applications.

However, despite remarkable progress over the last years, several challenges still remain in the aforementioned framework, and most notably:

[Imperfect or distributionally shifted prior data] Building data-driven priors typically requires having access to examples of non corrupted data, which in many cases do not exist (e.g. all astronomical images are observed with noise and some amount of blurring), or might exist but may have distribution shifts compared to the problems we would like to apply this prior to.
This mismatch can bias estimations and lead to incorrect scientific conclusions. Therefore, the adaptation, or calibration, of data-driven priors from incomplete and noisy observations becomes crucial for working with real data in astrophysical applications.

[Efficient sampling of high dimensional posteriors] Even if the likelihood and the data-driven prior are available, correctly sampling from non-convex multimodal probability distributions in such high-dimensions in an efficient way remains a challenging problem. The most effective methods to date rely on diffusion models, but rely on approximations and can be expensive at inference time to reach accurate estimates of the desired posteriors.

The stringent requirements of scientific applications are a powerful driver for improved methodologies, but beyond the astrophysical scientific context motivating this research, these tools also find broad applicability in many other domains, including medical images [3].

PhD project
The candidate will aim to address these limitations of current methodologies, with the overall aim to make uncertainty quantification for large scale inverse problems faster and more accurate.
As a first direction of research, we will extend recent methodology concurrently developed by our team and our Ciela collaborators [4,5], based on Expectation-Maximization, to iteratively learn (or adapt) diffusion-based priors to data observed under some amount of corruption. This strategy has been shown to be effective at correcting for distribution shifts in the prior (and therefore leading to well calibrated posteriors). However, this approach is still expensive as it requires iteratively solving inverse problems and retraining the diffusion models, and is critically dependent on the quality of the inverse problem solver. We will explore several strategies including variational inference and improved inverse problem sampling strategies to address these issues.
As a second (but connected) direction we will focus on the development of general methodologies for sampling complex posteriors (multimodal/complex geometries) of non-linear inverse problems. Specifically we will investigate strategies based on posterior annealing, inspired from diffusion model sampling, applicable in situations with explicit likelihoods and priors.
Finally, we will apply these methodologies to some challenging and high impact inverse problems in astrophysics, in particular in collaboration with our colleagues from the Ciela institute, we will aim to improve source and lens reconstruction of strong gravitational lensing systems.
Publications in top machine learning conferences are expected (NeurIPS, ICML), as well as publications of the applications of these methodologies in astrophysical journals.

References
[1] Benjamin Remy, Francois Lanusse, Niall Jeffrey, Jia Liu, Jean-Luc Starck, Ken Osato, Tim Schrabback, Probabilistic Mass Mapping with Neural Score Estimation, https://www.aanda.org/articles/aa/abs/2023/04/aa43054-22/aa43054-22.html

[2] Tobías I Liaudat, Matthijs Mars, Matthew A Price, Marcelo Pereyra, Marta M Betcke, Jason D McEwen, Scalable Bayesian uncertainty quantification with data-driven priors for radio interferometric imaging, RAS Techniques and Instruments, Volume 3, Issue 1, January 2024, Pages 505–534, https://doi.org/10.1093/rasti/rzae030

[3] Zaccharie Ramzi, Benjamin Remy, Francois Lanusse, Jean-Luc Starck, Philippe Ciuciu, Denoising Score-Matching for Uncertainty Quantification in Inverse Problems, https://arxiv.org/abs/2011.08698

[4] François Rozet, Gérôme Andry, François Lanusse, Gilles Louppe, Learning Diffusion Priors from Observations by Expectation Maximization, NeurIPS 2024, https://arxiv.org/abs/2405.13712

[5] Gabriel Missael Barco, Alexandre Adam, Connor Stone, Yashar Hezaveh, Laurence Perreault-Levasseur, Tackling the Problem of Distributional Shifts: Correcting Misspecified, High-Dimensional Data-Driven Priors for Inverse Problems, https://arxiv.org/abs/2407.17667

Institute of research into the fundamental laws of the Universe

Fundamental Research Division

The galaxy clusters in the XMM-Euclid FornaX deep field

Source clustering impact on Euclid weak lensing high-order statistics

Machine-learning methods for the cosmological analysis of weak- gravitational lensing images from the Euclid satellite

Generative AI for Robust Uncertainty Quantification in Astrophysical Inverse Problems