******************************************************* Welcome to the Photometric Redshift (PZ) Data Challenge ******************************************************* The Dark Energy Science Collaboration (DESC) invites researchers, data scientists, and astronomers to participate in the Photometric Redshift (PZ) Data Challenge, a collaborative effort to advance methods for estimating the distances to distant galaxies. Photometric redshifts, derived from multi-band brightness measurements, are essential for cosmological surveys like the Legacy Survey of Space and Time (LSST), enabling us to map the universe's structure and probe the nature of dark energy. This challenge provides a unique opportunity to test and benchmark algorithms on realistic simulated data, compare approaches across diverse methodologies—from template fitting to machine learning—and help shape the tools that will unlock discoveries from next-generation sky surveys. The challenge is framed as a series of sets of PZ estimations tasks using increasingly realistic data. `Set up the pz_data_challenge package and download challenge data `_ `Information about the input data `_ `How to submit an entry to the challenge `_ `Description of challenge tasks `_ `Assessment Metrics `_ `Details about challenge data preparation `_ `Write of up PZ Data Challenge documentation `_ ********************** Background Information ********************** .. _intro: Introduction ============ Redshift inference is a key element of many DESC science goals, and redshift uncertainty is one of the leading contributors to overall uncertainty on cosmological models from imaging survey data. Precursor surveys took a variety of approaches to this problem, accounting for differences in underlying data as well as modeling approaches. In all cases, redshift uncertainty was significantly larger than the DESC Science Requirements listed in the LSST DESC Science Requirements Document. This state of the art motivates a data challenge to characterize and improve existing methods, as well as to provide infrastructure for the development of improved methods. Overall, this requires generating uniform input catalogs to use and infrastructure for comparing output redshift posteriors to each other and to simulated truth catalogs. .. _redshift_basics: Photometric redshift basics =========================== Photometric redshift estimation involves taking a catalog of galaxies for which we have observations in several different filters and have measured the brightness of the galaxies in those bands, and using that information to estimate the redshift of the galaxies. For LSST we expect to have measurements in 6 bands: ’u’, ’g’, ’r’, ’i’, ’z’, and ’y’, covering a wavelength range from approximately 320 to 1600 nanometers. For the Roman space telescope, this will extend from about 500 to 2300 nanometers. Much of the information used to estimate photometric redshifts derives from the ’Balmer break’ present in the rest frame of many spectra at 400 nm. As the break crosses into different optical filters with increasing redshift, the differences in magnitudes between filters carry information about the redshift; .. figure:: figures/static_balmer.png :alt: image :width: 80.0% A passive galaxy at different redshifts and how it will show up in various optical filters, giving us the ability to estimate its redshift and therefore distance. For many galaxies, the so-called 'Balmer break' at 400 nm is a reliable feature that causes the flux to drop severely in bluer filters. Figure and caption by Jamie McCullough. This can also be seen when plotting redshifts as a function of derived colors, i.e., differences in magnitudes between filters; .. container:: figure* |image| |image1| Redshifts plotted as a function of r-i versus g-r colors for a sample of objects in the cardinal (left) and flagship (right) simulations. These are plotted for the data for task set 1, i.e., for a sample of objects with i < 23. This overly simple picture is complicated somewhat by the fact that different galaxies have different intrinsic spectra and colors: .. figure:: figures/gr_vs_sz_sidebyside.jpg :alt: image :width: 80.0% Color (g-r) plotted as a function of redshift for a sample of objects in the cardinal (top) and flagship (bottom) simulations. These are plotted for the data for task set 1, i.e., for a sample of objects with i < 23. The overlaid lines show the templates for several different types of galaxies. This is further complicated by the fact that reference redshifts, typically obtained by spectroscopy, slitless spectroscopy (i.e., GRISM measurements), or narrowband photometric measurements, are not a representative sample, as they are much easier to obtain for brighter objects. Depending on the method used to obtain the reference redshifts, they are also susceptible to errors such as confusing different spectral lines or confusion of blended objects. Some of the tasks in this data challenge encourage participants to try to address these complications. .. _challenge_format: *************************************** Information about the PZ Data Challenge *************************************** Challenge Format ================ The PZ data challenge comprises a series of sets of tasks for participants. Submissions will be evaluated to determine how ready various algorithms are to be used for cutting-edge analysis based on how well they perform on the various tasks. Readiness will be evaluated on a few different fronts: 1) Does the algorithm meet performance requirements? 2) Is it robust, flexible, and relatively easy to use on different datasets? 3) Is it scalable up to the scales we will need to use it at? This document and the associated web pages describe the data being provided to participants, the tasks they will be asked to perform, the expected format for submission and the metrics by which the algorithm readiness will be evaluated. Scope and Timeline ------------------ The data challenge will include two major parts, with a set of tasks emulating increasingly realistic scenarios in each part. The first part, :math:`p(z)` estimation, will focus on estimating the redshift of individual objects. The second part, tomography and :math:`n(z)` estimation, will focus on assigning objects to tomographic bins and estimating the distribution of redshifts in each bin. The data challenge will run from April 13, 2026 to September 2026. The first set of data and tasks related to :math:`p(z)` estimation will be released on April 13 and will close on July 17, 2026. A second set of data and tasks related to more realistic :math:`p(z)` estimation scenarios and :math:`n(z)` estimation will be released on June 1 and will close in September 2026.. Preliminary results will be released in August 2026, with a technical note summarizing those results to follow shortly thereafter and a comprehensive journal publication to follow later. .. _installation_and_setup: Installing and setting up the ``pz_data_challenge`` package ----------------------------------------------------------- The ``pz_data_challenge`` package will provide participants with tools to access data, set up submissions, estimate performance metrics and format submissions. This can be set up with a few small variants on the standard ``GitHub`` package setup procedure. Before starting you should pick a name for your submission, e.g., “example”. :: # Create a conda environment conda create --name pzdc python=3.13 # Clone the pz_data_challenge repository (or your fork of the repository) git clone git@github.com:LSSTDESC/pz_data_challenge.git # or git clone https://github.com/LSSTDESC/pz_data_challenge.git # Go into the directory cd pz_data_challenge # Install the code in "editable" mode pip install -e ".[dev]" # Use the provided script to set up your submission. # Here you should provide the name of your submission python scripts/prepare_submission.py This final step will copy the input data files to ``pz_data_challenge/public``, and set up the three files you will need to submit your entry. The notebooks in the ``pz_data_challenge/nb`` area give examples of how to access the data and create some of the diagnostic plots that were used to validate the data. Submission mechanism -------------------- Submission will take the form of pull request in the ``pz_data_challenge`` repository. Detailed instructions on how to submit an entry are provided in Sec. :ref:`5 ` of this document. .. _challenge_input_data: Challenge Input Data ==================== The preparation of the challenge data is described in the appendices. The data are available as a ``tar`` archive that is downloaded and unpacked as part of the ``pz_data_challenge`` setup procedure. Each task set in the data challenge has an associated set of files. Typically these will be a collection of training files that contain photometric data and reference redshifts, and a second set of files that contain photometric data but do not include redshifts. Each task set will involve estimating something about the redshifts or redshift distributions in the test files. Typically there will be several training and test files for a particular task set, covering different scenarios and using different input simulations. Input data format ----------------- The input data for the challenge are presented in HDF5 files. The naming convention for the files is ``{challenge}_{taskset}_{simulation}_{label}_{scenario}.hdf5``. The meanings of the various fields are: .. container:: :name: file_fields .. table:: Fields in the input file names. ========== ========================================================== Field Description ========== ========================================================== challenge Challenge associated with file taskset Task set associated with file simulation Simulation used to produce file (“cardinal” or “flagship”) label File label (e.g., “test”, “training”) scenario Data scenario (e.g., “1yr”, “10yr”) ========== ========================================================== The columns in the files are: .. container:: :name: columns .. table:: Contents of input files. ==================== ===================================== Column Description ==================== ===================================== redshift True redshift (training files only) ra Right ascension (training files only) dec Declination (training files only) object_id Unique object ID mag_{band}_lsst Magnitude in LSST {band} mag_{band}_lsst_err Magnitude uncertainty in LSST {band} mag_{band}_roman Magnitude in Roman {band} mag_{band}_roman_err Magnitude uncertainty in Roman {band} ==================== ===================================== We note that we use ``np.nan`` to in the magnitude columns to signify non-detections. We note that the ``table-io`` package installed with ``pz_data_challenge`` provides a command line interface to convert files from ``hdf5`` format to other formats such as ``parquet`` tables or ``pandas`` data frames. :: # convert a hdf5 file to pandas dataframe in a parquet file tables-io convert --input public/pz_challenge_taskset_1_cardinal_test_10yr.hdf5 --output public/pz_challenge_taskset_1_cardinal_test_10yr.pq .. _challenge_submissions: Challenge Submissions ===================== Challenge subtask types ----------------------- The challenge is organized as a series of sets of tasks using increasingly realistic representations of the data. In general, each set of tasks includes 3 subtasks. #. Estimate either per-object :math:`p(z)` or ensemble :math:`n(z)` distributions for a set of different scenarios and provide the estimates in a specified format. #. Provide trained models for the different scenarios and a Python function that can be used to generate the estimates from subtask 1 on an arbitrary dataset. #. Provide a Python function that can be used to generate the models and estimates from subtasks 1 and 2 on arbitrary datasets. The :math:`p(z)` estimates in subtask 1 and the trained models in subtask 2 should be provided in a compressed ``tar`` file, which are described below. Templates and instructions for the Python functions needed for subtasks 2 and 3 will be provided and are described below. Data format for per-object :math:`p(z)` estimates ------------------------------------------------- The :math:`p(z)` estimates should be submitted in ``qp`` format, which allows users to specify a complete :math:`p(z)` distribution for each object, as well as summary statistics for each object. The ``qp`` package supports several different representations of :math:`p(z)`, such as different functional forms as well as interpolated grids, histograms, and others. For users unfamiliar with ``qp``, we highly recommend representing the :math:`p(z)` as either an interpolated grid or a Gaussian mixture model. :: # Interpolated grid import qp import numpy as np # Define the x-grid. Note that we put all the # p(z) on the same x-grid xvals = np.array([0,0.5,1,1.5,2]) # Define the y-values. Note we provide n_grid_points x n_objects # values, as we need to provide a y-value at each grid point # for each object. yvals = np.array( [ [0.01,0.2,0.3,0.2,0.01], [0.1,0.3,0.5,0.2,0.05] ] ) ensemble = qp.interp.create_ensemble(xvals,yvals) ensemble.write_to() :: # Mixture model import qp import numpy as np # Define the means, standard deviations, and weights. # These should each have shape n_objects, n_components. # In this case we are defining 3 objects with 2-Gaussian # representations. # For each object the weights should sum to 1, or they # will be normalized. means = [[0.3, 0.4], [0.5, 0.5], [0.6, 0.8]] stds = [[0.2, 0.4], [0.1, 0.3], [0.05, 0.3]] weights = [[0.8, 0.2], [0.7, 0.3], [0.8, 0.2]] ens = qp.mixmod.create_ensemble(means=means,stds=stds,weights=weights) ensemble.write_to() The submission files should use the same file name conventions defined in Tab. `1 `__. The labels will typically be ``pz_estimate`` or ``pz_model`` and will be specified in the descriptions of the various tasks, e.g., ``pz_challenge_taskset_1_cardinal_pz_estimate_1yr.hdf5`` or ``pz_challenge_taskset_1_cardinal_pz_model_1yr.pkl``. All of these files should then be joined into a ``tar`` file, which should then be placed somewhere it can be download. The URL for the ``tar`` should be specified in ``tests/test_{submission}.py`` :: SUBMISSION_NAME = "example" SUBMISSION_URL = "https://your.institution.edu/submit_example.tgz" Format for estimation-only Python functions and trained models -------------------------------------------------------------- For the second subtask, submissions should provide trained models and implement a function to run estimation using those trained models on the test files provided for each task set. The function will look something like this: :: def run_taskset_1_estimation_only( model_file: str | Path, test_file: str | Path, output_file: str | Path, ) -> None: # do stuff and write p(z) estimates to "output_file" or :: def run_taskset_2_estimation_only( model_file: str | Path, test_file: str | Path, output_file: str | Path, ) -> None: # do stuff and write p(z) estimates to "output_file" Templates for these functions are provided in the file ``tests/test_{submission}.py`` created as part of the setup. Format for training and estimation Python functions --------------------------------------------------- For the third subtask, submissions should implement a function to train models and run estimation using those trained models on the training and test files provided for each task set. The function will look something like this: :: def run_taskset_1_training_and_estimation( train_file: str | Path, test_file: str | Path, output_file: str | Path, ) -> None: # train a model using the "train_file" and make p(z) estimates # and write them to "output_file" or :: def run_taskset_2_training_and_estimation( train_file: str | Path, test_file: str | Path, output_file: str | Path, ) -> None: # train a model using the "train_file" and make p(z) estimates # and write them to "output_file" Templates for these functions are provided in the file ``tests/test_{submission}.py`` created as part of the setup. .. _submission-mechanism-1: Submission mechanism details ---------------------------- Submissions will take the form of a pull request on the ``pz_data_challenge`` repository and will include: #. A file ``tests/test_{submission}.py`` that includes the URL from which the compressed ``tar`` file should be downloaded as well as the Python functions for subtasks 2 and 3. When created this will contain empty placeholder functions that will need to be implemented. #. A file ``requirements_{submission}.txt`` that should be modified to include ``pip`` package names of any packages that need to be installed in order to run the functions in subtasks 2 and 3. #. A file ``.github/workflows/submit_{submission}.yaml`` to run the submission validation in a GitHub action. This should not need to be modified unless the prerequisite installation requires more than just ``pip`` installing packages. All three of these files are created by the ``scripts/prepare_submission.py`` script. You will need modify the ``tests/test_{submission}.py`` to give the location of the ``tar`` file containing the PZ estimates and trained models. See ``_ for an example of a submission. Submission validation --------------------- The wrapping functions provided in the ``tests/test_{submission}.py`` file implement a number of checks on the data. Specifically, for each expected file they check that: #. the file exists; #. the file contains a valid ``qp`` ensemble; #. the ``qp`` ensemble includes ancillary data; #. the ancillary data includes a ’zmode’ column with redshift estimates; #. the ancillary data includes an ’object_id’ column; #. the object_ids in the submission file match the associated test file. If any of these checks fail, the GitHub action triggered by the submission will fail and report the cause of the failure. The easiest way to test that you have correctly implemented the required functions is simply to run these commands. :: # Make sure that you have installed any packages you need pip install -r requirement_{submission_name}.txt # Run the functions you have provided as unit tests py.test tests/test_{submission_name}.py if this succeeds, you can use a provided script to help you open the pull request for your submission. :: # run the submission helper script. python scripts/submit.py {submission_name} Note that the help script only prints the required commands, it does not run them. In short the command are: :: # Check status of your local git clone by running git status, and make # sure that you are on the branch submit/{submission_name} and do not # have any files added or modified git status # Add your files to git git add .github/workflows/submit_example.yaml requirements_example.txt tests/test_example.py # Commit your files to your branch: git commit -m "Submitting {submission_name}" .github/workflows/submit_{submission_name}.yaml requirements_{submission_name}.txt tests/test_{submission_name}.py # Push your commit git push --set-upstream origin submit/{submission_name} # Pushing to git should give you a URL that you can visit to create a # pull request, for example: # https://github.com/LSSTDESC/pz_data_challenge/pull/new/submit/example # Visit that URL and create a pull request, then add the 'submission' # label to the PR. # Finally, make sure that the github action validating your submission # succeeds and fix any issues. Submission aids --------------- A few scripts are provided to help you. - ``scripts/download_public.py``: downloads and unpacks the public data. - ``scripts/prepare_submission.py``: sets up your area for a submission, creates the needed files from templates and downloads the public data. Suggests that you create a branch for you submission. - ``scripts/remove_submission_files.py``: removes the submission files if you need to start over. - ``scripts/run_metrics.py``: run perfomance metrics on files in a = submission you have created. - ``py.test tests/test_{submission_name}.py``: validates all the parts of your submission, checking that you have created all the required files and that they are properly formatted. Feedback after submission ------------------------- Approximately every 2 weeks we will merge the various PRs that are passing the github actions, run performance metrics and update the website with information about their performance. .. _metrics: Metrics and Assessment Criteria =============================== We will use a number of different metrics to assess the performance of the submitted algorithms. Many of these metrics, as well as the motivations behind them, are defined and discussed in `[1] `_ Metrics for per-object point estimates -------------------------------------- Performance on per-object point estimates, i.e., providing a single best estimate of the redshift of each object in the test sample. All of our point-estimate metrics first compute the scaled residual, :math:`\Delta_i`, between the point estimate for each object, :math:`z_{p,i}`, and the true redshift for that object, :math:`z_{t,i}`: .. math:: \Delta_i = \frac{z_{p,i}-z_{t,i}}{1+z_{t,i}}. We then use this to construct the following metrics; see `[1] `_ for more details: - ``Bias`` is simply the median of :math:`\Delta_i`. - ``OutlierRate`` is the fraction of the :math:`\Delta` distribution outside of :math:`[0, {\rm max}(0.06, 3\sigma_{\rm iqr})]`. - ``SigmaMAD`` is an estimate of the standard deviation of the median absolute deviation (MAD), which is computed as .. math:: {\sigma_{\rm MAD}} = 1.4862\,{\rm median}(|\Delta_i - {\rm median}(\Delta)|). Metrics for per-object :math:`p(z)` distributions ------------------------------------------------- We will also assess the algorithm’s ability to provide a precise and accurate estimate of the posterior distribution, :math:`p(z)`, for each object using the following metrics. - Conditional Density Loss (``CDELoss``): We implement the method in `[2] `_ to compute this metric. A better estimation should return a smaller CDE loss. - Probability Integral Transform (PIT): This is the cumulative distribution function of the photo-:math:`z` PDF evaluated at the galaxy’s true redshift for each galaxy in the catalog, i.e., :math:`{\rm PIT} = \int_{0}^{z_t} p(z)\, dz`. Following `[3] `_, we provide the PIT-QQ (quantile-quantile) diagram, where the PIT distribution is directly compared to the ideal uniform distribution. A diagonal PIT-QQ diagram indicates a good estimation. An example of the PIT-QQ plot is shown below. We will then use three metrics to quantify how well the PIT distribution matches the ideal: the Kolmogorov-Smirnov (KS) test, the Root Mean Square Error (RMSE), and the Kullback-Leibler divergence (KL divergence). Metrics for computational usability and performance --------------------------------------------------- We will assess relevant aspects of the computational performance that will affect usability and scaling. - **Ease of use**: We will assess whether the algorithm is easy to install and can be run on the different task sets without needing excessively complicated additional configuration files. - **Training time**: How quickly the algorithm trains models, and how this scales with the training sample size. Here we mainly want to ensure that the training time will not dominate the iteration cycle. Taking several minutes to train on 100k objects is fine; taking hours to do so would be problematic. - **Model size**: How large the trained model files are, and how this scales with the training sample size. Again, we mainly want to ensure that the model size will not tax our resources. If the model files are an order of magnitude larger than the input data files, we might worry. - **Estimation time**: How quickly the algorithm estimates redshifts per object. This will determine the use cases for which we might use the algorithm. We can run an algorithm that takes a few ms per object on all of the billions of galaxies we will have in the final LSST sample; for an algorithm that takes a few seconds per object, we would probably be constrained to only run it on much smaller particular datasets for specific science cases, such as samples of supernovae or strongly lensed objects. - **Output data size per object**: How large the output files with the :math:`p(z)` estimates are. For a ``qp`` interpolated grid representation with 300 points, these would be about 2.4 kB per object, which is large but manageable, whereas for a Gaussian mixture model with 5 Gaussians, this would be close to 120 bytes per object. .. _tasks: Challenge Tasks related to :math:`p(z)` estimation ================================================== Task set 1: Estimate redshifts using representative training samples -------------------------------------------------------------------- The first, simplest task is to estimate redshifts using representative training samples. I.e., the training samples are drawn from the same distributions as the test samples. For this task set we did not use any of the spectroscopic selection emulation, but simply applied a uniform magnitude cut of :math:`i < 23` in selecting objects for both the training and test samples. The four ``pz_challenge_taskset_1_{simulation}_training_{scenario}.hdf5`` files are the training sets for the “Flagship” and “Cardinal” simulations, emulating 1 year and 10 years of LSST data under the expected observing strategy and conditions. These files have true redshifts to serve as labels. The corresponding ``pz_challenge_taskset_1_{simulation}_test_{scenario}.hdf5`` files were drawn from the same distributions. The true redshifts have been removed from these files. The task is to assign :math:`p(z)` estimates for all the objects in these 4 test files. The subtasks in this task set are: #. Estimate :math:`p(z)` for each object in each of the test files and provide the estimates in a downloadable ``tar`` file. #. Provide pre-trained models appropriate to each of the training files and implement a Python function (``run_taskset_1_estimation_only``) to use those pre-trained models to estimate :math:`p(z)` for each object in the associated test files. #. Implement a Python function (``run_taskset_1_training_and_estimation``) to train a model for each training file and use that model to estimate :math:`p(z)` for each object in the associated test files. Task set 2: Estimate redshifts on non-representative samples ------------------------------------------------------------ The second, slightly more challenging task is to estimate redshifts using non-representative training samples. I.e., the training samples are not drawn from the same distributions as the test samples. For this task set we applied the spectroscopic selection emulation for the training set, but retained all the objects down to :math:`i < 25.4` in the test set. Accordingly, the training set will not be representative of the fainter objects in the test set. This reflects that spectroscopic redshifts are typically significantly more difficult to obtain than photometry. The four ``pz_challenge_taskset_2_{simulation}_training_{scenario}.hdf5`` files are the training sets for the “Flagship” and “Cardinal” simulations, emulating 1 year and 10 years of LSST data under the expected observing strategy and conditions and with spectroscopic selections emulated. The corresponding ``pz_challenge_taskset_2_{simulation}_test_{scenario}.hdf5`` files were drawn from the distributions of all objects down to :math:`i < 25.4`, and the true redshifts have been removed from these files. The task is to assign :math:`p(z)` estimates for all the objects in these 4 test files. The subtasks in this task set are: #. Estimate :math:`p(z)` for each object in each of the test files and provide the estimates in a downloadable ``tar`` file. #. Provide pre-trained models appropriate to each of the training files and implement a Python function (``run_taskset_2_estimation_only``) to use those pre-trained models to estimate :math:`p(z)` for each object in the associated test files. #. Implement a Python function (``run_taskset_2_training_and_estimation``) to train a model for each training file and use that model to estimate :math:`p(z)` for each object in the associated test files. Task set 3: Estimate redshifts on non-representative samples, including narrowband photometric redshifts -------------------------------------------------------------------------------------------------------- The third task is to estimate redshifts using non-representative training samples that more accurately emulate real reference redshift samples. This include some narrowband photometric redshifts from the COSMOS2020 dataset that go deeper than most spectroscopic redshifts, but have more scatter and more significant levels of catastrophic outliers. As before, the four ``pz_challenge_taskset_3_{simulation}_training_{scenario}.hdf5`` files are the training sets for the “Flagship” and “Cardinal” simulations, emulating 1 year and 10 years of LSST data under the expected observing strategy and conditions and with spectroscopic selections emulated. These files include flags showing which spectroscopic survey particular objects would be associated with, and for the COSMOS2020 field, also include a column “redshift_manyband” giving the narrow-band photometric redshifts in addition to the spectroscopic redshifts. The point of this taskset is to find a way to optimally use the additional information from the COSMOS2020 field. The corresponding ``pz_challenge_taskset_3_{simulation}_test_{scenario}.hdf5`` files were drawn from the distributions of all objects down to :math:`i < 25.4`, and both the true redshifts and the narrow-band photometric redshifts have been removed from these files. The task is to assign :math:`p(z)` estimates for all the objects in these 4 test files. The subtasks in this task set are: #. Estimate :math:`p(z)` for each object in each of the test files and provide the estimates in a downloadable ``tar`` file. #. Provide pre-trained models appropriate to each of the training files and implement a Python function (``run_taskset_3_estimation_only``) to use those pre-trained models to estimate :math:`p(z)` for each object in the associated test files. #. Implement a Python function (``run_taskset_3_training_and_estimation``) to train a model for each training file and use that model to estimate :math:`p(z)` for each object in the associated test files. Task set 4: Estimate redshifts on non-representative samples, including narrowband photometric redshifts and unrecognized blended of objects -------------------------------------------------------------------------------------------------------------------------------------------- The fourth and final task in the PZ challenge is to estimate redshifts on a sample of objects that include “unrecognized blends”. I.e., multiple object that are detected as a single object by the image processing algorithms. using non-representative training samples. I.e., the training samples are not drawn from the same distributions as the test samples. For this task set we applied the spectroscopic selection emulation for the training set, but retained all the objects down to :math:`i < 25.4` in the test set. Accordingly, the training set will not be representative of the fainter objects in the test set. This reflects that spectroscopic redshifts are typically significantly more difficult to obtain than photometry. The four ``pz_challenge_taskset_4_{simulation}_training_{scenario}.hdf5`` files are the training sets for the “Flagship” and “Cardinal” simulations, emulating 1 year and 10 years of LSST data. Similar to the taskset 3 files, except with emulated object blending applied. The corresponding ``pz_challenge_taskset_4_{simulation}_test_{scenario}.hdf5`` files were drawn from the distributions of all objects down to :math:`i < 25.4`, and both the true redshifts and the narrow-band photometric redshifts have been removed from these files. The task is to assign :math:`p(z)` estimates for all the objects in these 4 test files. The subtasks in this task set are: #. Estimate :math:`p(z)` for each object in each of the test files and provide the estimates in a downloadable ``tar`` file. #. Provide pre-trained models appropriate to each of the training files and implement a Python function (``run_taskset_4_estimation_only``) to use those pre-trained models to estimate :math:`p(z)` for each object in the associated test files. #. Implement a Python function (``run_taskset_4_training_and_estimation``) to train a model for each training file and use that model to estimate :math:`p(z)` for each object in the associated test files. .. _challenge_data_prep: ******************************************** Information about Challenge Data Preparation ******************************************** .. _input_sims: Input simulations ================= The challenge employs simulated galaxy catalogs derived from two complementary N-body cosmological simulations: the Cardinal simulations and the Flagship simulation. These synthetic datasets provide a controlled environment where the true redshifts are known by construction, enabling rigorous validation of photometric redshift algorithms and systematic assessment of their performance characteristics. The Cardinal simulations comprise a suite of high-resolution N-body simulations specifically designed to explore the sensitivity of cosmological observables to variations in fundamental cosmological parameters. The simulations employ state-of-the-art semi-analytic models to populate dark matter halos with galaxies, incorporating realistic prescriptions for star formation, dust attenuation, and spectral energy distribution modeling. The Flagship simulation represents a single, ultra-large cosmological simulation run with fiducial cosmological parameters consistent with current observational constraints. With a volume exceeding several cubic gigaparsecs, the Flagship provides statistical power to probe rare objects and the high-mass end of the galaxy population. Its primary purpose in the photometric redshift challenge is to provide a realistic mock catalog that captures the full complexity of galaxy populations across cosmic time, including correlations between galaxy properties, environmental dependencies, and the intricate relationships between spectral features and redshift. Together, these complementary simulation suites enable challenge participants to test both the accuracy and the robustness of their photometric redshift estimation methods under realistic observational conditions. .. _emulating_observations: Emulating observational effects =============================== To bridge the gap between the idealized simulation outputs and realistic survey observations, we employ the RAIL (Redshift Assessment Infrastructure Layers) software package to emulate observational effects. RAIL provides a modular framework for injecting realistic photometric uncertainties, applying survey-specific selection functions, and simulating the measurement errors characteristic of modern large-scale imaging surveys. This processing ensures that the simulated galaxy catalogs reflect the complexities of actual observations, including magnitude-dependent photometric scatter, incomplete sky coverage, and the effects of source blending in crowded fields, thereby providing a more stringent and realistic testbed for photometric redshift estimation algorithms. Photometric Smearing -------------------- Central to our observational emulation is RAIL’s wrapping of the photometric error module, photErr, which we have extended and wrapped to account for realistic observing strategies and time-dependent survey conditions. The standard photErr module provides basic photometric error modeling based on magnitude-dependent noise characteristics, but our enhanced version incorporates additional complexity including spatially varying depth maps. This wrapper accesses detailed operational simulation outputs that emulate the expected LSST survey strategy. Our photErr implementation computes photometric uncertainties by combining the intrinsic Poisson noise from source photons with realistic models of sky background, readout noise, and other systematic contributions. For each simulated galaxy, we use the expected coadded depth to derive final photometric error estimates. This approach captures the heterogeneous nature of survey depth across the footprint, where some regions benefit from numerous high-quality exposures while others may be observed only during poor conditions. The resulting photometric uncertainties vary realistically with position on the sky, band-dependent limiting magnitudes, and local observing history, providing challenge participants with mock catalogs whose noise properties more closely match those expected from the actual survey. Spectroscopic and narrowband photometric redshift selection ----------------------------------------------------------- RAIL can emulate the selection functions of several different spectroscopic redshift surveys, including VVDSf02, zCOSMOS, DEEP2_LSST, and the DESI BGS, ELG, and LRG samples. We can also use RAIL to emulate narrowband photometric surveys and include small amounts of mislabeled reference redshifts. .. _prep_process: Preparing Training, Test, and Reserved Datasets =============================================== All of the data preparation was performed using the ``rail_projects`` and ``rail_package_config`` packages for bookkeeping and reproducibility. .. container:: :name: prep_scripts .. table:: Scripts used in data preparation. +-----------------+------------------------+------------------------+ | Script | Command Run | Purpose | +=================+========================+========================+ | do_00_reduce | rail-project reduce | Reduce input truth | | | | catalogs | +-----------------+------------------------+------------------------+ | | | (mag. cut and drop | | | | columns) | +-----------------+------------------------+------------------------+ | do_01_build | rail-project build | Build configurations | | | | to run | +-----------------+------------------------+------------------------+ | | | truth-to-observed | | | | pipeline | +-----------------+------------------------+------------------------+ | do_02_t2o | rail-project run | Run truth-to-observed | | | truth-to-observed | | +-----------------+------------------------+------------------------+ | | | pipelines to make | | | | degraded catalogs | +-----------------+------------------------+------------------------+ | do_03_merge | rail-project merge | Combine spectroscopic | | | | selections | +-----------------+------------------------+------------------------+ | do_04_subselect | rail-project subsample | Make train/test files | +-----------------+------------------------+------------------------+ | | | from catalogs | +-----------------+------------------------+------------------------+ .. |image| image:: figures/color_color_redshift_taskset_1_cardinal_10yr.png :width: 45.0% .. |image1| image:: figures/color_color_redshift_taskset_1_flagship_10yr.png :width: 45.0% .. include:: validation.rst .. LocalWords: pz_data_challenge slitless pzdc nb taskset hdf5 qp .. LocalWords: _lsst _lsst_err table-io tables-io numpy xvals yvals .. LocalWords: xvals,yvals stds means,stds stds,weights pz_estimate .. LocalWords: pz_model github zmode iqr Kullback-Leibler coadded .. LocalWords: pz_challenge_taskset_1_cardinal_pz_estimate_1yr.hdf5 .. LocalWords: pz_challenge_taskset_1_cardinal_pz_model_1yr.pkl .. LocalWords: pz_challenge_taskset_1_ pz_challenge_taskset_2_ .. LocalWords: gigaparsecs do_04_subselect validation.rst