*******************************************************
Welcome to the Photometric Redshift (PZ) Data Challenge
*******************************************************

The Dark Energy Science Collaboration (DESC) invites researchers, data
scientists, and astronomers to participate in the Photometric Redshift
(PZ) Data Challenge, a collaborative effort to advance methods for
estimating the distances to distant galaxies. Photometric redshifts,
derived from multi-band brightness measurements, are essential for
cosmological surveys like the Legacy Survey of Space and
Time (LSST), enabling us to map the universe's structure and probe the
nature of dark energy. This challenge provides a unique opportunity to
test and benchmark algorithms on realistic simulated data, compare
approaches across diverse methodologies—from template fitting to
machine learning—and help shape the tools that will unlock discoveries
from next-generation sky surveys.

The challenge is framed as a series of sets of PZ estimations tasks
using increasingly realistic data.

`Set up the pz_data_challenge package and download challenge data <installation_and_setup_>`_

`Information about the input data <challenge_input_data_>`_

`How to submit an entry to the challenge <challenge_submissions_>`_

`Description of challenge tasks <tasks_>`_

`Assessment Metrics <metrics_>`_

`Details about challenge data preparation <challenge_data_prep_>`_

`Write of up PZ Data Challenge documentation <https://portal.nersc.gov/cfs/lsst/PZ/data_challenge/pz_challenge.pdf>`_


**********************
Background Information
**********************

.. _intro:

Introduction
============

Redshift inference is a key element of many DESC science goals, and
redshift uncertainty is one of the leading contributors to overall
uncertainty on cosmological models from imaging survey data. Precursor
surveys took a variety of approaches to this problem, accounting for
differences in underlying data as well as modeling approaches. In all
cases, redshift uncertainty was significantly larger than the DESC
Science Requirements listed in the LSST DESC Science Requirements
Document.

This state of the art motivates a data challenge to characterize and
improve existing methods, as well as to provide infrastructure for the
development of improved methods. Overall, this requires generating
uniform input catalogs to use and infrastructure for comparing output
redshift posteriors to each other and to simulated truth catalogs.

.. _redshift_basics:

Photometric redshift basics
===========================

Photometric redshift estimation involves taking a catalog of galaxies
for which we have observations in several different filters and have
measured the brightness of the galaxies in those bands, and using that
information to estimate the redshift of the galaxies. For LSST we expect
to have measurements in 6 bands: ’u’, ’g’, ’r’, ’i’, ’z’, and ’y’,
covering a wavelength range from approximately 320 to 1600 nanometers.
For the Roman space telescope, this will extend from about 500 to 2300
nanometers.

Much of the information used to estimate photometric redshifts derives
from the ’Balmer break’ present in the rest frame of many spectra at 400
nm. As the break crosses into different optical filters with increasing
redshift, the differences in magnitudes between filters carry
information about the redshift;

.. figure:: figures/static_balmer.png
   :alt: image
   :width: 80.0%

   A passive galaxy at different redshifts and how it will show up in various optical      
   filters, giving us the ability to estimate its redshift and
   therefore distance. For many galaxies, the so-called
   'Balmer break' at 400 nm is a reliable feature that causes the
   flux to drop severely in bluer filters. Figure and caption
   by Jamie McCullough.
	      
This can also be seen when plotting redshifts as a function of derived colors,
i.e., differences in magnitudes between filters;

.. container:: figure*

   |image| |image1|

   Redshifts plotted as a function of r-i versus g-r colors for a sample of objects
   in the cardinal (left) and flagship (right) simulations. These
   are plotted for the data for task set 1, i.e., for a sample of
   objects with i < 23.

   
This overly simple picture is complicated somewhat by the fact that
different galaxies have different intrinsic spectra and colors:

.. figure:: figures/gr_vs_sz_sidebyside.jpg
   :alt: image
   :width: 80.0%

   Color (g-r) plotted as a function of
   redshift for a sample of objects in the cardinal (top) and
   flagship (bottom) simulations. These are plotted for the data
   for task set 1, i.e., for a sample of objects with i < 23. The
   overlaid lines show the templates for several different types of galaxies.
	      
This is further complicated by the fact that reference redshifts,
typically obtained by spectroscopy, slitless spectroscopy (i.e., GRISM
measurements), or narrowband photometric measurements, are not a
representative sample, as they are much easier to obtain for brighter
objects. Depending on the method used to obtain the reference redshifts,
they are also susceptible to errors such as confusing different spectral
lines or confusion of blended objects. Some of the tasks in this data
challenge encourage participants to try to address these complications.

.. _challenge_format:


***************************************
Information about the PZ Data Challenge
***************************************

Challenge Format
================

The PZ data challenge comprises a series of sets of tasks for
participants. Submissions will be evaluated to determine how ready
various algorithms are to be used for cutting-edge analysis based on how
well they perform on the various tasks. Readiness will be evaluated on a
few different fronts: 1) Does the algorithm meet performance
requirements? 2) Is it robust, flexible, and relatively easy to use on
different datasets? 3) Is it scalable up to the scales we will need to
use it at?

This document and the associated web pages describe the data being
provided to participants, the tasks they will be asked to perform, the
expected format for submission and the metrics by which the algorithm
readiness will be evaluated.

Scope and Timeline
------------------

The data challenge will include two major parts, with a set of tasks
emulating increasingly realistic scenarios in each part. The first part,
:math:`p(z)` estimation, will focus on estimating the redshift of
individual objects. The second part, tomography and :math:`n(z)`
estimation, will focus on assigning objects to tomographic bins and
estimating the distribution of redshifts in each bin.

The data challenge will run from April 13, 2026 to September 2026.
The first set of data and tasks related to :math:`p(z)` estimation will be
released on April 13 and will close on July 17, 2026. A second set of
data and tasks related to more realistic :math:`p(z)` estimation
scenarios and :math:`n(z)` estimation will be released on June 1 and will
close in September 2026..

Preliminary results will be released in August 2026, with a
technical note summarizing those results to follow shortly thereafter
and a comprehensive journal publication to follow later.


.. _installation_and_setup:

Installing and setting up the ``pz_data_challenge`` package
-----------------------------------------------------------

The ``pz_data_challenge`` package will provide participants with tools
to access data, set up submissions, estimate performance metrics and
format submissions. This can be set up with a few small variants on the
standard ``GitHub`` package setup procedure. Before starting you should
pick a name for your submission, e.g., “example”.

::

   # Create a conda environment
   conda create --name pzdc python=3.13

   # Clone the pz_data_challenge repository (or your fork of the repository)
   git clone git@github.com:LSSTDESC/pz_data_challenge.git
   # or git clone https://github.com/LSSTDESC/pz_data_challenge.git

   # Go into the directory
   cd pz_data_challenge

   # Install the code in "editable" mode
   pip install -e ".[dev]"

   # Use the provided script to set up your submission.
   # Here you should provide the name of your submission
   python scripts/prepare_submission.py <submission_name>

This final step will copy the input data files to
``pz_data_challenge/public``, and set up the three files you will need
to submit your entry.

The notebooks in the ``pz_data_challenge/nb`` area give examples of how
to access the data and create some of the diagnostic plots that were
used to validate the data.

Submission mechanism
--------------------

Submission will take the form of pull request in the
``pz_data_challenge`` repository. Detailed instructions on how to submit
an entry are provided in Sec. :ref:`5 <challenge_submissions>` of this
document.


.. _challenge_input_data:

Challenge Input Data
====================

The preparation of the challenge data is described in the appendices.
The data are available as a ``tar`` archive that is downloaded and
unpacked as part of the ``pz_data_challenge`` setup procedure.

Each task set in the data challenge has an associated set of files.
Typically these will be a collection of training files that contain
photometric data and reference redshifts, and a second set of files that
contain photometric data but do not include redshifts. Each task set
will involve estimating something about the redshifts or redshift
distributions in the test files.

Typically there will be several training and test files for a particular
task set, covering different scenarios and using different input
simulations.

Input data format
-----------------

The input data for the challenge are presented in HDF5 files. The naming
convention for the files is
``{challenge}_{taskset}_{simulation}_{label}_{scenario}.hdf5``. The
meanings of the various fields are:

.. container::
   :name: file_fields

   .. table:: Fields in the input file names.

      ========== ==========================================================
      Field      Description
      ========== ==========================================================
      challenge  Challenge associated with file
      taskset    Task set associated with file
      simulation Simulation used to produce file (“cardinal” or “flagship”)
      label      File label (e.g., “test”, “training”)
      scenario   Data scenario (e.g., “1yr”, “10yr”)
      ========== ==========================================================


The columns in the files are:
      
.. container::
   :name: columns

   .. table:: Contents of input files.

      ==================== =====================================
      Column               Description
      ==================== =====================================
      redshift             True redshift (training files only)
      ra                   Right ascension (training files only)
      dec                  Declination (training files only)
      object_id            Unique object ID
      mag_{band}_lsst      Magnitude in LSST {band}
      mag_{band}_lsst_err  Magnitude uncertainty in LSST {band}
      mag_{band}_roman     Magnitude in Roman {band}
      mag_{band}_roman_err Magnitude uncertainty in Roman {band}
      ==================== =====================================

We note that we use ``np.nan`` to in the magnitude columns to signify non-detections.

We note that the ``table-io`` package installed with
``pz_data_challenge`` provides a command line interface
to convert files from ``hdf5`` format to other formats such as
``parquet`` tables or ``pandas`` data frames.

::

   # convert a hdf5 file to pandas dataframe in a parquet file
   tables-io convert
     --input public/pz_challenge_taskset_1_cardinal_test_10yr.hdf5
     --output public/pz_challenge_taskset_1_cardinal_test_10yr.pq


.. _challenge_submissions:

Challenge Submissions
=====================

Challenge subtask types
-----------------------

The challenge is organized as a series of sets of tasks using
increasingly realistic representations of the data. In general, each set
of tasks includes 3 subtasks.

#. Estimate either per-object :math:`p(z)` or ensemble :math:`n(z)`
   distributions for a set of different scenarios and provide the
   estimates in a specified format.

#. Provide trained models for the different scenarios and a Python
   function that can be used to generate the estimates from subtask 1 on
   an arbitrary dataset.

#. Provide a Python function that can be used to generate the models and
   estimates from subtasks 1 and 2 on arbitrary datasets.

The :math:`p(z)` estimates in subtask 1 and the trained models in
subtask 2 should be provided in a compressed ``tar`` file, which are
described below. Templates and instructions for the Python functions
needed for subtasks 2 and 3 will be provided and are described below.

Data format for per-object :math:`p(z)` estimates
-------------------------------------------------

The :math:`p(z)` estimates should be submitted in ``qp`` format, which
allows users to specify a complete :math:`p(z)` distribution for each
object, as well as summary statistics for each object.

The ``qp`` package supports several different representations of
:math:`p(z)`, such as different functional forms as well as interpolated
grids, histograms, and others.

For users unfamiliar with ``qp``, we highly recommend representing the
:math:`p(z)` as either an interpolated grid or a Gaussian mixture model.

::

   # Interpolated grid
   import qp
   import numpy as np
   # Define the x-grid. Note that we put all the
   # p(z) on the same x-grid
   xvals = np.array([0,0.5,1,1.5,2])
   # Define the y-values. Note we provide n_grid_points x n_objects 
   # values, as we need to provide a y-value at each grid point 
   # for each object.
   yvals = np.array(
    [
      [0.01,0.2,0.3,0.2,0.01],
      [0.1,0.3,0.5,0.2,0.05]
    ]
   )
   ensemble = qp.interp.create_ensemble(xvals,yvals)
   ensemble.write_to(<output_filename.hdf5>)

::

   # Mixture model
   import qp
   import numpy as np
   # Define the means, standard deviations, and weights.
   # These should each have shape n_objects, n_components.
   # In this case we are defining 3 objects with 2-Gaussian 
   # representations.
   # For each object the weights should sum to 1, or they
   # will be normalized.
   means = [[0.3, 0.4], [0.5, 0.5], [0.6, 0.8]]
   stds = [[0.2, 0.4], [0.1, 0.3], [0.05, 0.3]]
   weights = [[0.8, 0.2], [0.7, 0.3], [0.8, 0.2]]
   ens = qp.mixmod.create_ensemble(means=means,stds=stds,weights=weights)
   ensemble.write_to(<output_filename.hdf5>)

The submission files should use the same file name conventions defined
in Tab. `1 <file_fields>`__. The labels will typically be
``pz_estimate`` or ``pz_model`` and will be specified in the
descriptions of the various tasks, e.g.,
``pz_challenge_taskset_1_cardinal_pz_estimate_1yr.hdf5`` or
``pz_challenge_taskset_1_cardinal_pz_model_1yr.pkl``.

All of these files should then be joined into a ``tar`` file, which
should then be placed somewhere it can be download. The URL for the
``tar`` should be specified in ``tests/test_{submission}.py``

::

   SUBMISSION_NAME = "example"
   SUBMISSION_URL = "https://your.institution.edu/submit_example.tgz"


Format for estimation-only Python functions and trained models
--------------------------------------------------------------

For the second subtask, submissions should provide trained models and
implement a function to run estimation using those trained models on the
test files provided for each task set. The function will look something
like this:

::

   def run_taskset_1_estimation_only(
       model_file: str | Path,
       test_file: str | Path,
       output_file: str | Path,
   ) -> None:
       # do stuff and write p(z) estimates to "output_file"

or

::

   def run_taskset_2_estimation_only(
       model_file: str | Path,
       test_file: str | Path,
       output_file: str | Path,
   ) -> None:
       # do stuff and write p(z) estimates to "output_file"

Templates for these functions are provided in the file
``tests/test_{submission}.py`` created as part of the setup.

Format for training and estimation Python functions
---------------------------------------------------

For the third subtask, submissions should implement a function to train
models and run estimation using those trained models on the training and
test files provided for each task set. The function will look something
like this:

::

   def run_taskset_1_training_and_estimation(
       train_file: str | Path,
       test_file: str | Path,
       output_file: str | Path,
   ) -> None:
       # train a model using the "train_file" and make p(z) estimates 
       # and write them to "output_file"

or

::

   def run_taskset_2_training_and_estimation(
       train_file: str | Path,
       test_file: str | Path,
       output_file: str | Path,
   ) -> None:
       # train a model using the "train_file" and make p(z) estimates
       # and write them to "output_file"

Templates for these functions are provided in the file
``tests/test_{submission}.py`` created as part of the setup.

.. _submission-mechanism-1:

Submission mechanism details
----------------------------

Submissions will take the form of a pull request on the
``pz_data_challenge`` repository and will include:

#. A file ``tests/test_{submission}.py`` that includes the URL from
   which the compressed ``tar`` file should be downloaded as well as the
   Python functions for subtasks 2 and 3. When created this will contain
   empty placeholder functions that will need to be implemented.

#. A file ``requirements_{submission}.txt`` that should be modified to
   include ``pip`` package names of any packages that need to be
   installed in order to run the functions in subtasks 2 and 3.

#. A file ``.github/workflows/submit_{submission}.yaml`` to run the
   submission validation in a GitHub action. This should not need to be
   modified unless the prerequisite installation requires more than just
   ``pip`` installing packages.

All three of these files are created by the
``scripts/prepare_submission.py`` script.

You will need modify the ``tests/test_{submission}.py`` to give the
location of the ``tar`` file containing the PZ estimates and trained
models.

See `<https://github.com/LSSTDESC/pz_data_challenge/pull/6>`_ for an
example of a submission.

Submission validation
---------------------

The wrapping functions provided in the ``tests/test_{submission}.py``
file implement a number of checks on the data. Specifically, for each
expected file they check that:

#. the file exists;

#. the file contains a valid ``qp`` ensemble;

#. the ``qp`` ensemble includes ancillary data;

#. the ancillary data includes a ’zmode’ column with redshift estimates;

#. the ancillary data includes an ’object_id’ column;

#. the object_ids in the submission file match the associated test file.

If any of these checks fail, the GitHub action triggered by the
submission will fail and report the cause of the failure.

The easiest way to test that you have correctly implemented the required
functions is simply to run these commands.

::

   # Make sure that you have installed any packages you need
   pip install -r requirement_{submission_name}.txt

   # Run the functions you have provided as unit tests
   py.test tests/test_{submission_name}.py

if this succeeds, you can use a provided script to help you open the
pull request for your submission.

::

   # run the submission helper script.
   python scripts/submit.py {submission_name}

Note that the help script only prints the required commands, it does not
run them. In short the command are:

::

   # Check status of your local git clone by running git status, and make
   # sure that you are on the branch submit/{submission_name} and do not
   # have any files added or modified
   git status

   # Add your files to git
   git add .github/workflows/submit_example.yaml
     requirements_example.txt
     tests/test_example.py

   # Commit your files to your branch: 
   git commit -m "Submitting {submission_name}"
     .github/workflows/submit_{submission_name}.yaml
     requirements_{submission_name}.txt
     tests/test_{submission_name}.py

   # Push your commit
   git push --set-upstream origin submit/{submission_name}

   # Pushing to git should give you a URL that you can visit to create a
   # pull request, for example:
   #   https://github.com/LSSTDESC/pz_data_challenge/pull/new/submit/example
   # Visit that URL and create a pull request, then add the 'submission'
   # label to the PR.
   # Finally, make sure that the github action validating your submission
   # succeeds and fix any issues.

Submission aids
---------------

A few scripts are provided to help you.

-  ``scripts/download_public.py``: downloads and unpacks the public
   data.

-  ``scripts/prepare_submission.py``: sets up your area for a
   submission, creates the needed files from templates and downloads the
   public data. Suggests that you create a branch for you submission.

-  ``scripts/remove_submission_files.py``: removes the submission files
   if you need to start over.

-  ``scripts/run_metrics.py``: run perfomance metrics on files in a =
   submission you have created.

-  ``py.test tests/test_{submission_name}.py``: validates all the parts
   of your submission, checking that you have created all the required
   files and that they are properly formatted.


Feedback after submission
-------------------------

Approximately every 2 weeks we will merge the various PRs that are passing the
github actions, run performance metrics and update the website with information
about their performance.
   

.. _metrics:

Metrics and Assessment Criteria
===============================

We will use a number of different metrics to assess the performance of
the submitted algorithms. Many of these metrics, as well as the
motivations behind them, are defined and discussed in `[1] <https://arxiv.org/abs/2505>`_


Metrics for per-object point estimates
--------------------------------------

Performance on per-object point estimates, i.e., providing a single best
estimate of the redshift of each object in the test sample. All of our
point-estimate metrics first compute the scaled residual,
:math:`\Delta_i`, between the point estimate for each object,
:math:`z_{p,i}`, and the true redshift for that object, :math:`z_{t,i}`:

.. math:: \Delta_i = \frac{z_{p,i}-z_{t,i}}{1+z_{t,i}}.

We then use this to construct the following metrics; see
`[1] <https://arxiv.org/abs/2505>`_ for more details:

-  ``Bias`` is simply the median of :math:`\Delta_i`.

-  ``OutlierRate`` is the fraction of the :math:`\Delta` distribution
   outside of :math:`[0, {\rm max}(0.06, 3\sigma_{\rm iqr})]`.

-  ``SigmaMAD`` is an estimate of the standard deviation of the median
   absolute deviation (MAD), which is computed as

   .. math:: {\sigma_{\rm MAD}} = 1.4862\,{\rm median}(|\Delta_i - {\rm median}(\Delta)|).

Metrics for per-object :math:`p(z)` distributions
-------------------------------------------------

We will also assess the algorithm’s ability to provide a precise and
accurate estimate of the posterior distribution, :math:`p(z)`, for each
object using the following metrics.

-  Conditional Density Loss (``CDELoss``): We implement the method in
   `[2] <https://ui.adsabs.harvard.edu/abs/2017arXiv170408095I>`_ to compute this metric. A
   better estimation should return a smaller CDE loss.

-  Probability Integral Transform (PIT): This is the cumulative
   distribution function of the photo-:math:`z` PDF evaluated at the
   galaxy’s true redshift for each galaxy in the catalog, i.e.,
   :math:`{\rm PIT} = \int_{0}^{z_t} p(z)\, dz`. Following
   `[3] <https://academic.oup.com/mnras/article/499/2/1587/5905416>`_, we provide the
   PIT-QQ (quantile-quantile) diagram, where the PIT distribution is
   directly compared to the ideal uniform distribution. A diagonal
   PIT-QQ diagram indicates a good estimation. An example of the PIT-QQ
   plot is shown below.  We will then use three metrics to quantify how well the PIT distribution
   matches the ideal: the Kolmogorov-Smirnov (KS) test, the Root Mean
   Square Error (RMSE), and the Kullback-Leibler divergence (KL
   divergence).

Metrics for computational usability and performance
---------------------------------------------------

We will assess relevant aspects of the computational performance that
will affect usability and scaling.

-  **Ease of use**: We will assess whether the algorithm is easy to
   install and can be run on the different task sets without needing
   excessively complicated additional configuration files.

-  **Training time**: How quickly the algorithm trains models, and how
   this scales with the training sample size. Here we mainly want to
   ensure that the training time will not dominate the iteration cycle.
   Taking several minutes to train on 100k objects is fine; taking hours
   to do so would be problematic.

-  **Model size**: How large the trained model files are, and how this
   scales with the training sample size. Again, we mainly want to ensure
   that the model size will not tax our resources. If the model files
   are an order of magnitude larger than the input data files, we might
   worry.

-  **Estimation time**: How quickly the algorithm estimates redshifts
   per object. This will determine the use cases for which we might use
   the algorithm. We can run an algorithm that takes a few ms per object
   on all of the billions of galaxies we will have in the final LSST
   sample; for an algorithm that takes a few seconds per object, we
   would probably be constrained to only run it on much smaller
   particular datasets for specific science cases, such as samples of
   supernovae or strongly lensed objects.

-  **Output data size per object**: How large the output files with the
   :math:`p(z)` estimates are. For a ``qp`` interpolated grid
   representation with 300 points, these would be about 2.4 kB per
   object, which is large but manageable, whereas for a Gaussian mixture
   model with 5 Gaussians, this would be close to 120 bytes per object.

.. _tasks:

Challenge Tasks related to :math:`p(z)` estimation
==================================================

Task set 1: Estimate redshifts using representative training samples
--------------------------------------------------------------------

The first, simplest task is to estimate redshifts using representative
training samples. I.e., the training samples are drawn from the same
distributions as the test samples. For this task set we did not use any
of the spectroscopic selection emulation, but simply applied a uniform
magnitude cut of :math:`i < 23` in selecting objects for both the
training and test samples.

The four
``pz_challenge_taskset_1_{simulation}_training_{scenario}.hdf5`` files
are the training sets for the “Flagship” and “Cardinal” simulations,
emulating 1 year and 10 years of LSST data under the expected observing
strategy and conditions. These files have true redshifts to serve as
labels.

The corresponding
``pz_challenge_taskset_1_{simulation}_test_{scenario}.hdf5`` files were
drawn from the same distributions. The true redshifts have been removed
from these files. The task is to assign :math:`p(z)` estimates for all
the objects in these 4 test files.

The subtasks in this task set are:

#. Estimate :math:`p(z)` for each object in each of the test files and
   provide the estimates in a downloadable ``tar`` file.

#. Provide pre-trained models appropriate to each of the training files
   and implement a Python function (``run_taskset_1_estimation_only``)
   to use those pre-trained models to estimate :math:`p(z)` for each
   object in the associated test files.

#. Implement a Python function
   (``run_taskset_1_training_and_estimation``) to train a model for each
   training file and use that model to estimate :math:`p(z)` for each
   object in the associated test files.

Task set 2: Estimate redshifts on non-representative samples
------------------------------------------------------------

The second, slightly more challenging task is to estimate redshifts
using non-representative training samples. I.e., the training samples
are not drawn from the same distributions as the test samples. For this
task set we applied the spectroscopic selection emulation for the
training set, but retained all the objects down to :math:`i < 25.4` in
the test set. Accordingly, the training set will not be representative
of the fainter objects in the test set. This reflects that spectroscopic
redshifts are typically significantly more difficult to obtain than
photometry.

The four
``pz_challenge_taskset_2_{simulation}_training_{scenario}.hdf5`` files
are the training sets for the “Flagship” and “Cardinal” simulations,
emulating 1 year and 10 years of LSST data under the expected observing
strategy and conditions and with spectroscopic selections emulated.

The corresponding
``pz_challenge_taskset_2_{simulation}_test_{scenario}.hdf5`` files were
drawn from the distributions of all objects down to :math:`i <
25.4`, and the true redshifts have been removed from these files. The
task is to assign :math:`p(z)` estimates for all the objects in these 4
test files.

The subtasks in this task set are:

#. Estimate :math:`p(z)` for each object in each of the test files and
   provide the estimates in a downloadable ``tar`` file.

#. Provide pre-trained models appropriate to each of the training files
   and implement a Python function (``run_taskset_2_estimation_only``)
   to use those pre-trained models to estimate :math:`p(z)` for each
   object in the associated test files.

#. Implement a Python function
   (``run_taskset_2_training_and_estimation``) to train a model for each
   training file and use that model to estimate :math:`p(z)` for each
   object in the associated test files.


Task set 3: Estimate redshifts on non-representative samples, including narrowband photometric redshifts
--------------------------------------------------------------------------------------------------------

The third task is to estimate redshifts using non-representative
training samples that more accurately emulate real reference redshift
samples. This include some narrowband photometric redshifts from the
COSMOS2020 dataset that go deeper than most spectroscopic redshifts, but
have more scatter and more significant levels of catastrophic outliers.

As before, the four
``pz_challenge_taskset_3_{simulation}_training_{scenario}.hdf5`` files
are the training sets for the “Flagship” and “Cardinal” simulations,
emulating 1 year and 10 years of LSST data under the expected observing
strategy and conditions and with spectroscopic selections emulated.
These files include flags showing which spectroscopic survey particular
objects would be associated with, and for the COSMOS2020 field, also
include a column “redshift_manyband” giving the narrow-band photometric
redshifts in addition to the spectroscopic redshifts. The point of this
taskset is to find a way to optimally use the additional information
from the COSMOS2020 field.

The corresponding
``pz_challenge_taskset_3_{simulation}_test_{scenario}.hdf5`` files were
drawn from the distributions of all objects down to :math:`i <
25.4`, and both the true redshifts and the narrow-band photometric
redshifts have been removed from these files. The task is to assign
:math:`p(z)` estimates for all the objects in these 4 test files.

The subtasks in this task set are:

#. Estimate :math:`p(z)` for each object in each of the test files and
   provide the estimates in a downloadable ``tar`` file.

#. Provide pre-trained models appropriate to each of the training files
   and implement a Python function (``run_taskset_3_estimation_only``)
   to use those pre-trained models to estimate :math:`p(z)` for each
   object in the associated test files.

#. Implement a Python function
   (``run_taskset_3_training_and_estimation``) to train a model for each
   training file and use that model to estimate :math:`p(z)` for each
   object in the associated test files.


Task set 4: Estimate redshifts on non-representative samples, including narrowband photometric redshifts and unrecognized blended of objects
--------------------------------------------------------------------------------------------------------------------------------------------

The fourth and final task in the PZ challenge is to estimate redshifts
on a sample of objects that include “unrecognized blends”. I.e.,
multiple object that are detected as a single object by the image
processing algorithms.

using non-representative training samples. I.e., the training samples
are not drawn from the same distributions as the test samples. For this
task set we applied the spectroscopic selection emulation for the
training set, but retained all the objects down to :math:`i < 25.4` in
the test set. Accordingly, the training set will not be representative
of the fainter objects in the test set. This reflects that spectroscopic
redshifts are typically significantly more difficult to obtain than
photometry.

The four
``pz_challenge_taskset_4_{simulation}_training_{scenario}.hdf5`` files
are the training sets for the “Flagship” and “Cardinal” simulations,
emulating 1 year and 10 years of LSST data. Similar to the taskset 3
files, except with emulated object blending applied.

The corresponding
``pz_challenge_taskset_4_{simulation}_test_{scenario}.hdf5`` files were
drawn from the distributions of all objects down to :math:`i <
25.4`, and both the true redshifts and the narrow-band photometric
redshifts have been removed from these files. The task is to assign
:math:`p(z)` estimates for all the objects in these 4 test files.

The subtasks in this task set are:

#. Estimate :math:`p(z)` for each object in each of the test files and
   provide the estimates in a downloadable ``tar`` file.

#. Provide pre-trained models appropriate to each of the training files
   and implement a Python function (``run_taskset_4_estimation_only``)
   to use those pre-trained models to estimate :math:`p(z)` for each
   object in the associated test files.

#. Implement a Python function
   (``run_taskset_4_training_and_estimation``) to train a model for each
   training file and use that model to estimate :math:`p(z)` for each
   object in the associated test files.

   
.. _challenge_data_prep:
   
********************************************
Information about Challenge Data Preparation
********************************************
   
.. _input_sims:

Input simulations
=================

The challenge employs simulated galaxy catalogs derived from two
complementary N-body cosmological simulations: the Cardinal simulations
and the Flagship simulation. These synthetic datasets provide a
controlled environment where the true redshifts are known by
construction, enabling rigorous validation of photometric redshift
algorithms and systematic assessment of their performance
characteristics.

The Cardinal simulations comprise a suite of high-resolution N-body
simulations specifically designed to explore the sensitivity of
cosmological observables to variations in fundamental cosmological
parameters. The simulations employ state-of-the-art semi-analytic models
to populate dark matter halos with galaxies, incorporating realistic
prescriptions for star formation, dust attenuation, and spectral energy
distribution modeling.

The Flagship simulation represents a single, ultra-large cosmological
simulation run with fiducial cosmological parameters consistent with
current observational constraints. With a volume exceeding several cubic
gigaparsecs, the Flagship provides statistical power to probe rare
objects and the high-mass end of the galaxy population. Its primary
purpose in the photometric redshift challenge is to provide a realistic
mock catalog that captures the full complexity of galaxy populations
across cosmic time, including correlations between galaxy properties,
environmental dependencies, and the intricate relationships between
spectral features and redshift.

Together, these complementary simulation suites enable challenge
participants to test both the accuracy and the robustness of their
photometric redshift estimation methods under realistic observational
conditions.

.. _emulating_observations:

Emulating observational effects
===============================

To bridge the gap between the idealized simulation outputs and realistic
survey observations, we employ the RAIL (Redshift Assessment
Infrastructure Layers) software package to emulate observational
effects. RAIL provides a modular framework for injecting realistic
photometric uncertainties, applying survey-specific selection functions,
and simulating the measurement errors characteristic of modern
large-scale imaging surveys. This processing ensures that the simulated
galaxy catalogs reflect the complexities of actual observations,
including magnitude-dependent photometric scatter, incomplete sky
coverage, and the effects of source blending in crowded fields, thereby
providing a more stringent and realistic testbed for photometric
redshift estimation algorithms.

Photometric Smearing
--------------------

Central to our observational emulation is RAIL’s wrapping of the
photometric error module, photErr, which we have extended and wrapped to
account for realistic observing strategies and time-dependent survey
conditions. The standard photErr module provides basic photometric error
modeling based on magnitude-dependent noise characteristics, but our
enhanced version incorporates additional complexity including spatially
varying depth maps. This wrapper accesses detailed operational
simulation outputs that emulate the expected LSST survey strategy.

Our photErr implementation computes photometric uncertainties by
combining the intrinsic Poisson noise from source photons with realistic
models of sky background, readout noise, and other systematic
contributions. For each simulated galaxy, we use the expected coadded
depth to derive final photometric error estimates. This approach
captures the heterogeneous nature of survey depth across the footprint,
where some regions benefit from numerous high-quality exposures while
others may be observed only during poor conditions. The resulting
photometric uncertainties vary realistically with position on the sky,
band-dependent limiting magnitudes, and local observing history,
providing challenge participants with mock catalogs whose noise
properties more closely match those expected from the actual survey.

Spectroscopic and narrowband photometric redshift selection
-----------------------------------------------------------

RAIL can emulate the selection functions of several different
spectroscopic redshift surveys, including VVDSf02, zCOSMOS, DEEP2_LSST,
and the DESI BGS, ELG, and LRG samples.

We can also use RAIL to emulate narrowband photometric surveys and
include small amounts of mislabeled reference redshifts.

.. _prep_process:

Preparing Training, Test, and Reserved Datasets
===============================================

All of the data preparation was performed using the ``rail_projects``
and ``rail_package_config`` packages for bookkeeping and
reproducibility.

.. container::
   :name: prep_scripts

   .. table:: Scripts used in data preparation.

      +-----------------+------------------------+------------------------+
      | Script          | Command Run            | Purpose                |
      +=================+========================+========================+
      | do_00_reduce    | rail-project reduce    | Reduce input truth     |
      |                 |                        | catalogs               |
      +-----------------+------------------------+------------------------+
      |                 |                        | (mag. cut and drop     |
      |                 |                        | columns)               |
      +-----------------+------------------------+------------------------+
      | do_01_build     | rail-project build     | Build configurations   |
      |                 |                        | to run                 |
      +-----------------+------------------------+------------------------+
      |                 |                        | truth-to-observed      |
      |                 |                        | pipeline               |
      +-----------------+------------------------+------------------------+
      | do_02_t2o       | rail-project run       | Run truth-to-observed  |
      |                 | truth-to-observed      |                        |
      +-----------------+------------------------+------------------------+
      |                 |                        | pipelines to make      |
      |                 |                        | degraded catalogs      |
      +-----------------+------------------------+------------------------+
      | do_03_merge     | rail-project merge     | Combine spectroscopic  |
      |                 |                        | selections             |
      +-----------------+------------------------+------------------------+
      | do_04_subselect | rail-project subsample | Make train/test files  |
      +-----------------+------------------------+------------------------+
      |                 |                        | from catalogs          |
      +-----------------+------------------------+------------------------+

.. |image| image:: figures/color_color_redshift_taskset_1_cardinal_10yr.png
   :width: 45.0%
.. |image1| image:: figures/color_color_redshift_taskset_1_flagship_10yr.png
   :width: 45.0%


.. include:: validation.rst	   
	   
..  LocalWords:  pz_data_challenge slitless pzdc nb taskset hdf5 qp
..  LocalWords:  _lsst _lsst_err table-io tables-io numpy xvals yvals
..  LocalWords:  xvals,yvals stds means,stds stds,weights pz_estimate
..  LocalWords:  pz_model github zmode iqr Kullback-Leibler coadded
..  LocalWords:  pz_challenge_taskset_1_cardinal_pz_estimate_1yr.hdf5
..  LocalWords:  pz_challenge_taskset_1_cardinal_pz_model_1yr.pkl
..  LocalWords:  pz_challenge_taskset_1_ pz_challenge_taskset_2_
..  LocalWords:  gigaparsecs do_04_subselect validation.rst