mlflow-cnn
Galaxy Redshift Estimation Using Convolutional Neural Networks with MLflow Tracking
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary
Repository
Galaxy Redshift Estimation Using Convolutional Neural Networks with MLflow Tracking
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Galaxy Redshift Estimation Using Convolutional Neural Networks with MLflow Tracking
Abstract
Creators:
- Jacob Nowack
- Srinath Saikrishnan
- Vikram Seenivasan
- Tuan Do
- Bernie Boscoe
- Zhuo Chen
- Chandler Campbell
UCLA Astrophysics Datalab
Description
Photometric redshift estimation is a critical component in large-scale galaxy surveys, enabling researchers to estimate distances to billions of galaxies without requiring time-intensive spectroscopic measurements. This project implements a dual-input Convolutional Neural Network (CNN) architecture to estimate redshifts using multi-band imaging data (g, r, i, z, y) from the GalaxiesML dataset alongside photometric magnitudes.
Our approach combines two parallel branches: a CNN branch processing 5-channel (g, r, i, z, y) galaxy images at either 64×64 or 127×127 pixel resolution as provided in the GalaxiesML dataset, and a fully-connected neural network branch handling numerical features derived from CModel magnitudes in the same five bands. This architecture captures both the spatial morphological information visible in multi-band imaging and the spectral energy distribution information contained in magnitudes to enhance redshift prediction capabilities.
The implementation leverages TensorFlow and MLflow in a local development environment for reproducible experimentation and comprehensive tracking. The model employs a custom HSC loss function implemented directly in the code, tailored for redshift estimation that accounts for the non-Gaussian nature of photometric redshift errors. This function uses the formula L = 1 - 1/(1 + (dz/γ)²) with γ = 0.15, creating a more robust loss that better handles outliers compared to standard mean squared error. The model training also tracks root mean squared error (RMSE) as a secondary metric for performance evaluation.
This notebook is designed specifically for Windows and Linux systems with CUDA-compatible GPUs. MAC compatibility is not supported due to the implementation's use of 'channels_first' data format, which conflicts with Apple's Metal framework requirements, and TensorFlow's known limitations with NCHW tensor formats on macOS platforms.
This notebook provides a complete, reproducible pipeline for training, evaluating, and visualizing galaxy redshift estimation models designed for local installation and execution. A custom HDF5DataGenerator class efficiently loads multi-band galaxy images and associated photometric features from HDF5 files with support for normalization, batch processing, and optional data augmentation. The implementation includes automated experiment tracking through MLflow, model checkpointing for capturing optimal weights during training, and comprehensive artifact collection including training history, visualization plots, and model metadata. Training histories, prediction scatter plots comparing true vs. predicted redshifts, and model architectures are automatically saved and logged to MLflow for analysis. The modular design allows researchers to easily modify network architectures, hyperparameters, and training configurations while maintaining a complete record of experimental results in a controlled local environment.
Keywords
Galaxy Redshift Estimation, Convolutional Neural Networks, Deep Learning, MLflow, Machine Learning, Astrophysics, Photometry, TensorFlow
Galaxies ML
GalaxiesML is a machine learning-ready dataset of galaxy images, photometry, redshifts, and structural parameters. It is designed for machine learning applications in astrophysics, particularly for tasks such as redshift estimation and galaxy morphology classification. The dataset comprises 286,401 galaxy images from the Hyper-Suprime-Cam (HSC) Survey PDR2 in five filters: g, r, i, z, y, with spectroscopically confirmed redshifts as ground truth.
GalaxiesML dataset link: https://zenodo.org/records/11117528
Note for US-RSE 2025 reviewers:
repo2docker and CI Compatibility Status:
Our reproducible ML pipeline demonstrates the core principles of reproducible research through automated environment setup, data acquisition, and dependency management. However, the full training pipeline currently encounters technical incompatibilities in containerized CI environments:
✅ What works reliably:
- Automated environment setup and dependency installation
- Data download and preprocessing pipeline (prepare_data.sh)
- MLflow experiment tracking initialization
- Model architecture definition and compilation
- Pipeline structure and organization
❌ Current CI execution status: The training notebook does not currently complete successfully in CI environments due to:
Model-environment incompatibility: The CNN architecture expects GPU-optimized tensor operations that are incompatible with CPU-only CI environments, specifically around convolution and pooling operations.
Data format mismatches: Tensor shape and format expectations between the model design and CI environment capabilities.
Resource constraints: Memory and computational limitations in containerized environments.
Reproducibility framework value:
While the training doesn't complete in CI, we've successfully demonstrated the reproducible research infrastructure:
- Automated environment creation
- Dependency management via environment.yml
- Automated data acquisition and setup
- Clear documentation and structure
- Version control and containerization principles
For successful reproduction: The pipeline is designed to run in GPU-enabled environments with sufficient computational resources. The reproducibility framework ensures consistent setup across different systems, even though the current model architecture has specific hardware requirements that exceed CI capabilities.
This approach reflects real-world reproducible research challenges where full computational workflows may require specific hardware while maintaining automated validation of the reproducible framework itself.
Table of Contents
- System Requirements
- Training Configuration
- Evaluation/Inference Configuration
- Prerequisites
Windows Instructions
Linux Instructions
Disclaimer: These instructions are designed for Windows and Linux, this code is incompatible with MAC.
System Requirements
| Component | Requirement |
|-----------|------------------------------------------------|
| OS | Windows 10/11 64-bit or Linux (Ubuntu 20.04+) |
Training Configuration
Minimum Hardware Specifications
| Component | Requirement |
|-----------|------------------------------------------------|
| GPU | NVIDIA GPU with 8GB VRAM (GTX 1070 or better) |
| CPU | Quad-core processor (Intel i5/AMD Ryzen 5) |
| RAM | 8GB DDR4 (16GB recommended) |
Recommended Hardware Specifications
| Component | Requirement |
|-----------|------------------------------------------------|
| GPU | NVIDIA GPU with 12GB+ VRAM (RTX 3060 Ti+) |
| CPU | 6+ core processor (Intel i7/AMD Ryzen 7) |
| RAM | 16GB DDR4 |
Evaluation/Inference Configuration
| Component | Requirement |
|-----------|------------------------------------------------|
| GPU | NVIDIA GPU with 6GB+ VRAM |
| CPU | Quad-core processor (Intel i5/AMD Ryzen 5) |
| RAM | 8GB DDR4 |
Note: The application uses GPU memory management and data generators to optimize resource utilization. Performance may vary based on specific hardware configurations and concurrent system load.
Dataset Download
You will need do download the datasets from the link above (5x127x127 or 5x64x64):
- 5x64x64trainingwith_morphology.hdf5
- 5x64x64validationwith_morphology.hdf5
- 5x64x64testingwith_morphology.hdf5
Windows Instructions:
Prerequisites:
- Visual Studio Code installed
- GIT installed
0. Install GIT
You will need to install GIT. You can install it using Windows Package Manager from within any terminal (CMD, Git Bash, Powershell, etc):
winget install Git.Git
Afterward check if it has been successfully installed using the command
git --version
1. Clone the Github repository
Clone the Github Repository into Visual Studio Code:
- Copy the HTTPS link from the GitHub

- Launch Visual Studio Code and select "Clone Git Repository"

- Paste the HTTPS link into the bar and press enter

Select a directory for the cloned repo to go in
Select "open" when this prompt appears

- Select "Yes, I trust the authors"

2. Install Miniconda
Miniconda is a lightweight distribution of Conda, a package manager, and environment manager designed for Python and other programming languages. You will need to install Miniconda for Python 3.10.
- Visit the download archive: https://repo.anaconda.com/miniconda/
- Find the installer: "Miniconda3-py31024.9.2-0-Windows-x8664.exe 83.3M 2024-10-23 02:24:15"
- You can use CTRL+F to search for this exact filename
- Download and run the installer
- Leave everything unchecked except for "create shortcuts"
- Optional: Select "Add Miniconda to my PATH Environment Variable" for convenience (however this may create conflicts with other Python versions)
- Optional: Select "Clear the package cache upon completion" if low on disk space

3. Miniconda Environment Creation
- Open a new CMD terminal in VS Code:

- If it shows "powershell" instead of "cmd":

- Create a new cmd terminal:

- Ensure "cmd" is selected:

Your terminal should look like this:

Not like this (Powershell):

Create the environment by typing:
C:\Users\<username>\miniconda3\Scripts\conda.exe create -n tf210 python=3.10- Replace "
" with your actual username on your system - Adjust path if Miniconda is installed elsewhere
- Replace "
Type "y" when prompted:

4. Miniconda Environment Activation/CUDA Installation
- Open Command Palette (CTRL + SHIFT + P)
- Type "Python: Select Interpreter" and select Python 3.10.16 (tf210)

- Open a new CMD terminal - you should see (tf210) in the path:

If (tf210) is not visible, manually activate:
conda activate tf210orC:\Users\<username>\miniconda3\Scripts\conda.exe activate tf210Install CUDA and cuDNN:
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1Type "y" when prompted:

5. Install Dependencies
- Install requirements:
Make sure pip is installed in your conda environment:
conda install pip
Install requirements:
pip install -r requirements.txt
- Verify CUDA installation:
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Successful output should look like:

Note: An empty list ([]) indicates TensorFlow is not detecting the GPU.
6. Configure Training File
- Open the MLFLOW.ipynb file
- Update the dataset paths to where you installed them, for example:

- Save the file (CTRL + S)
7 Setup Kernel
- While in your terminal with the activated conda environment (tf210) type:
pip install ipykernel
Next you will need to register the kernel. Type:
python -m ipykernel install --user --name tf210 --display-name "Python (tf210)"(type "y" when prompted)Restart VSCode
8. Run the Training Script
Navigate to the MLFLOW.ipynb and run each cell
Troubleshooting: If when you try to execute the cells a popup like this appears:

That means that you do not have the required VSCode extensions installed.
- Select "Browse marketplace for kernel extensions"
- Then install "Jupyter" (By Microsoft)

- Now go back to the notebook and try to run the cells again, and then select "Python Environments" when this pops up

- Finally, select the environment you created earlier (example: tf210)

- Note: If Windows firewall gives you a warning, select "allow all access"
Training Parameters:
- --image_size: Set to 64 or 127 depending on the dataset you downloaded
- --epochs: Number of training epochs (default: 200)
- --batch_size: Number of samples per training batch (default: 256)
- --learning_rate: Learning rate for training (default: 0.0001)
Training progress will be displayed in the notebook as well as loss values and other metrics. Checkpoints will be saved automatically during training. You can run the notebook as many times as you'd like with various parameters.
9. Exploring Your Model Results with MLflow
After training completes, run the "MLflow UI Startup Helper" cell to generate interactive links to the MLflow dashboard. Simply click the most appropriate link for your environment (typically the localhost option) to launch the visualization interface. The MLflow dashboard provides:
- Comprehensive model performance metrics and comparisons
- Interactive visualizations of training curves
- Access to model artifacts and parameters
- Side-by-side evaluation of different training runs
This powerful interface lets you analyze your model's behavior, identify improvement opportunities, and select the best performing version for redshift estimation.
Linux instructions:
Open Terminal
Press Ctrl + Alt + T to open a terminal window
Or find "Terminal" in your system's application menu
Install Git
First, we'll install Git:
sudo apt update
sudo apt install git
git --version
(verify it has installed successfully)
Check/install GPU drivers
- Next, check if you have the required NVIDIA drivers installed:
nvidia-smi
You should see something like this:
This shows your GPU information. If you see this and you see a driver verion greater than 450.80.02, you can skip the following steps.
- If you don’t see something similar to the above image you’ll need to install NVIDIA drivers:
sudo apt update
sudo apt install nvidia-driver-535
Note: Depending on your GPU model and Linux distribution, you might need a different driver version. You can check available versions with:
ubuntu-drivers devices
Note: For compatibility with CUDA 11.2 and TensorFlow 2.10, you need a NVIDIA driver version at least greater than 450.80.02.
Installing Miniconda:
- Create/navigate to a directory for your miniconda installation Example:
mkdir ~/CNNSoftware
cd ~/CNNSoftware
* Install Miniconda:
``` wget https://repo.anaconda.com/miniconda/Miniconda3-py31024.9.2-0-Linux-x8664.sh
chmod +x Miniconda3-py31024.9.2-0-Linux-x8664.sh
./Miniconda3-py31024.9.2-0-Linux-x8664.sh -u
``` * Press enter when prompted (hold enter key to scroll through terms and conditions)
- When you get to the end type “yes” and press enter when prompted
It will say something like:
"Confirm the installation location:
Miniconda3 will now be installed into this location:
/home/
- Press ENTER to confirm the location
- Press CTRL-C to abort the installation
- Or specify a different location below
"
As the message indicates, you can choose to paste a separate path or press enter to confirm the current default installation location. Type the full path to the directory you made earlier (CNNSoftware)
Once it is done it should say: ``` Preparing transaction: done
Executing transaction: done
installation finished.
Do you wish to update your shell profile to automatically initialize conda? This will activate conda on startup and change the command prompt when activated.
If you'd prefer that conda's base environment not be activated on startup, run the following command when conda is activated:
conda config --set autoactivatebase false
You can undo this by running conda init --reverse $SHELL? [yes|no]
```
* When prompted, type “yes” into the terminal.
It should now say “Thank you for installing Miniconda3!”
Close and reopen the terminal
Verify Miniconda has been successfully installed by typing the command:
conda --versionYou should see a version number like “conda 24.9.2” which indicates Miniconda has been successfully installed
Cloning the Repository
Now we need to clone the GitHub repo
Cd out of CNNSoftware ``` cd ..
```
- Next, type git clone (HTTPS link to repo) for example:
git clone https://github.com/Jacob489/MLFlow-CNN.git - Then navigate into the newly created folder containing the repo files
cd MLFlow-CNN
Installing Requirements
- Next we need to create a conda environment with all install necessary requirements installed. To do this
Type: ``` conda install mamba -n base -c conda-forge
``` (type y when prompted)
Then:
mamba env create --file environment.yml
(type y when prompted)
- Now we must activate the environment
Type:
conda activate galaxies
- You should now see “galaxies” in front of your terminal, indicating the environment has been successfully activated
Example:
conda list | grep -E "cudatoolkit|cudnn"
which should return the CUDA and cuDNN versions.
- Now verify that Tensorflow is working correctly
python -c "import tensorflow as tf; print(tf.__version__)" This should return "2.10.1"
Next we need to verify that CUDA is properly configured
Type:
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
It should return a message that looks something like this:
If you just have empty brackets "[ ]" it indicated Cuda has not been successfully installed/configured
Training the CNN
- Now it is time to create a Python kernel for our galaxies environment. Make sure you have ipykernel installed in that enviornment. Type:
pip install ipykernel
Then register a new kernel for the environment:
python -m ipykernel install --user --name galaxies --display-name "Python (galaxies)"
This will register a new kernel called "Python (galaxies)" that you can select when launching a notebook in Jupyter.
- Shut down all jupyter kernels and restart, and then you should see "galaxies" listed as an option in the kernel list

Now you will need to adjust the paths in the MLFLOW.ipynb notebook to where you have installed the datasets, for example:
- You can also adjust hyperparameters within the notebook if you wish, however this is optional
Training Parameters:
- --image_size: Set to 64 or 127 depending on the dataset you downloaded
- --epochs: Number of training epochs (default: 200)
- --batch_size: Number of samples per training batch (default: 256)
--learning_rate: Learning rate for training (default: 0.0001)
- Save the file, then run each cell in the notebook.
Training progress will be displayed, and once training is complete you can optionally run the notebook again with different parameters to compare performance across different runs.
Explore Model Performance with MLFlow
When you are ready to evaluate the performance of the model, run the "MLflow UI Startup Helper" cell to generate interactive links to the MLflow dashboard. Simply click the most appropriate link for your environment (typically the localhost option) to launch the visualization interface.
The MLflow dashboard provides:
- Comprehensive model performance metrics and comparisons
- Interactive visualizations of training curves
- Access to model artifacts and parameters
- Side-by-side evaluation of different training runs
This powerful interface lets you analyze your model's behavior, identify improvement opportunities, and select the best performing version for redshift estimation.
Owner
- Login: Jacob489
- Kind: user
- Repositories: 1
- Profile: https://github.com/Jacob489
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software or the accompanying datasets, please cite as below."
title: "UCLA Astrophysics Datalab MLFlow-CNN"
version: 0.1.0
date-released: 2025-05-01
identifiers:
- type: doi
value: 10.5281/zenodo.11117528
repository-code: "https://github.com/Jacob489/MLFlow-CNN/tree/main"
url: "https://datalab.astro.ucla.edu/index.html"
authors:
- family-names: Do
given-names: Tuan
- family-names: Saikrishnan
given-names: Srinath
- family-names: Nowack
given-names: Jacob
- family-names: Vikram
given-names: Seenivasan
- family-names: Boscoe
given-names: Bernie
- family-names: Chen
given-names: Zhuo
- family-names: Campbell
given-names: Chandler
license: MIT
GitHub Events
Total
- Watch event: 2
- Push event: 88
- Create event: 2
Last Year
- Watch event: 2
- Push event: 88
- Create event: 2
Dependencies
- astropy ==5.2.2
- h5py ==3.8.0
- ipykernel *
- keras *
- matplotlib ==3.6.3
- mlflow *
- numpy ==1.23.5
- pandas ==1.5.3
- scikit-learn ==1.2.2
- scipy ==1.10.1
- seaborn ==0.12.2
- tabulate ==0.9.0
- tensorboard ==2.10.1
- tensorflow-gpu ==2.10.1
- tensorflow-probability ==0.17.0
- tqdm ==4.65.0
- astropy 5.2.2.*
- cudatoolkit 11.2.*
- cudnn 8.1.*
- h5py 3.8.0.*
- ipykernel
- matplotlib 3.6.3.*
- mlflow
- numpy 1.23.5.*
- pandas 1.5.3.*
- pip
- protobuf 3.19.6.*
- python 3.10.*
- scikit-learn 1.2.2.*
- scipy 1.10.1.*
- seaborn 0.12.2.*
- tabulate 0.9.0.*
- tqdm 4.65.0.*