MARGE-HOMER Data at the Exoplanet Archive


Exoplanet atmospheric retrieval is an inverse modeling technique where atmospheric properties are inferred from exoplanet observations (primary transit, secondary eclipse, or direct observation). Using a radiative transfer (RT) code, thousands to millions of forward models are evaluted, compared to the observations, and probabilistically accepted/rejected by the Bayesian sampler (see review by Madhusudhan 2018). The RT calculations dominate the runtime, which is typically on the order of days of compute time.

Himes et al. (2021) demonstrated that the runtime of retrievals can be reduced by using a machine learning model that approximates RT, with minimal loss in accuracy. This webpage hosts the Reproducible Research Compendium (RRC) for the paper.

Extracting the .tar.gz files

The RRC is supplied as .tar.gz files to reduce the data size and enable users to download only the parts they are interested in. The RRC's full size once extracted is ~330 GB. If running MARGE, it will use an additional ~230 GB for the data in TFRecords format + the predictions on the validation and test sets.

To reconstruct the RRC, place all .tar.gz files in the same directory, and extract each one. On Unix-based systems, users may enter:

    for foo in RRC-HimesEtal2021*.tar.gz
        tar -zxvf $foo

into a terminal to accomplish this. This is available as an executable script below (

Windows users can use archiving programs like 7-Zip or WinZip to extract the .tar.gz files.


Himes et al. (2021) presents two software packages, released under the Reproducible Research Software License. The Machine learning Algorithm for Radiative transfer of Generated Exoplanets (MARGE) is a Python package that trains a user-specified neural network architecture to approximate a deterministic process, based on some data generated by a forward model. The Helper Of My Eternal Retrievals (HOMER) is a Python package that performs a Bayesian inverse inference using a MARGE-trained model. For more details, see the user manuals at their GitHub pages.

Data Description

The training, validation, and test sets are stored in the Numpy binary (NPY) format. Each file contains a 2D array of 64 data vectors, where each vector is made up of the 12 inputs followed by the 6821 outputs. The training set has 2,446,784 cases, the validation set has 689,536 cases, and the test set has 322,112 cases.

The data inputs are comprised of:

  • the 5 thermal profile parameters of Line et al. (2013)
  • the planetary radius at 1 bar, in R_Jup
  • the planetary gravity at 1 bar, in cm s-2
  • the semi-major axis, in AU
  • the log abundances of H2O, CO2, CO, and CH4

The atmospheric models have 100 log-uniform layers spanning 10-8–100 bar. Molecular abundances are assumed to be uniform over the range of pressures. Output spectra are computed using the radiative transfer package of the Bayesian Atmospheric Radiative Transfer code (BART, Harrington et al. 2021,submitted to PSJ; Cubillos et al. 2021, submitted to PSJ; Blecic et al. 2021, submitted to PSJ). Each spectrum is in erg s-1 cm-1 and spans 280–7100 cm-1 at a resolution of 1.0 cm-1. For more details, refer to Section 2.1 and Table 1 of Himes et al. (2021).


Download Links Approximate Download Size
Web Link wget Script
README   10 KB Readme file for all data; also included in all .tar files   < 10 KB Bash script to extract all .tar.gz files in the directory
md5sums.txt   4 KB Contains the hash of each .tar.gz file
RRC-HimesEtal2021-HOMER-outputs.tar.gz   552 MB Contains the outputs produced by HOMER for each associated config file
RRC-HimesEtal2021-MARGE-HOMER-configs.tar.gz   8.3 KB Contains the configuration files for MARGE and HOMER used in the paper
RRC-HimesEtal2021-MARGE-HOMER-inputs.tar.gz   2 GB Contains additional inputs for MARGE and HOMER
RRC-HimesEtal2021-MARGE-HOMER-software.tar.gz   121 MB Contains the MARGE and HOMER versions used in the paper
RRC-HimesEtal2021-MARGE-outputs_gridsearch1-1.tar.gz wget_RRC-HimesEtal2021-MARGE-outputs_gridsearch.bat 9 GB Contains the outputs produced by MARGE for the grid searches
RRC-HimesEtal2021-MARGE-outputs_gridsearch1-2.tar.gz 13 GB
RRC-HimesEtal2021-MARGE-outputs_gridsearch1-3.tar.gz 6.3 GB
RRC-HimesEtal2021-MARGE-outputs_gridsearch1-4.tar.gz 5.4 GB
RRC-HimesEtal2021-MARGE-outputs_gridsearch2-1.tar.gz 9.4 GB
RRC-HimesEtal2021-MARGE-outputs_gridsearch2-2.tar.gz 13 GB
RRC-HimesEtal2021-MARGE-outputs_gridsearch2-3.tar.gz 6.4 GB
RRC-HimesEtal2021-MARGE-outputs_gridsearch2-4.tar.gz 5.8 GB
RRC-HimesEtal2021-MARGE-outputs_gridsearch3-1.tar.gz 9.4 GB
RRC-HimesEtal2021-MARGE-outputs_gridsearch3-2.tar.gz 13 GB
RRC-HimesEtal2021-MARGE-outputs_gridsearch3-3.tar.gz 6.3 GB
RRC-HimesEtal2021-MARGE-outputs_gridsearch4-1.tar.gz 9.1 GB
RRC-HimesEtal2021-MARGE-outputs_gridsearch4-2.tar.gz 13 GB
RRC-HimesEtal2021-MARGE-outputs_gridsearch4-3.tar.gz 6.3 GB
RRC-HimesEtal2021-MARGE-outputs.tar.gz 6 GB Contains the main outputs produced by MARGE
RRC-HimesEtal2021-testset-1.tar.gz wget_RRC-HimesEtal2021-testset.bat 7.6 GB Data used in the testing set
RRC-HimesEtal2021-testset-2.tar.gz 7.7 GB
RRC-HimesEtal2021-trainingset-01.tar.gz wget_RRC-HimesEtal2021-trainingset.bat 7.9 GB Data used in the training set
RRC-HimesEtal2021-trainingset-02.tar.gz 7.9 GB
RRC-HimesEtal2021-trainingset-03.tar.gz 7.9 GB
RRC-HimesEtal2021-trainingset-04.tar.gz 7.7 GB
RRC-HimesEtal2021-trainingset-05.tar.gz 6.3 GB
RRC-HimesEtal2021-trainingset-06.tar.gz 6.3 GB
RRC-HimesEtal2021-trainingset-07.tar.gz 6.3 GB
RRC-HimesEtal2021-trainingset-08.tar.gz 6.3 GB
RRC-HimesEtal2021-trainingset-09.tar.gz 5.9 GB
RRC-HimesEtal2021-trainingset-10.tar.gz 6.3 GB
RRC-HimesEtal2021-trainingset-11.tar.gz 6.3 GB
RRC-HimesEtal2021-trainingset-12.tar.gz 6.3 GB
RRC-HimesEtal2021-trainingset-13.tar.gz 5.9 GB
RRC-HimesEtal2021-trainingset-14.tar.gz 6.3 GB
RRC-HimesEtal2021-trainingset-15.tar.gz 6.3 GB
RRC-HimesEtal2021-trainingset-16.tar.gz 6.3 GB
RRC-HimesEtal2021-trainingset-17.tar.gz 6.3 GB
RRC-HimesEtal2021-trainingset-18.tar.gz 3.8 GB
RRC-HimesEtal2021-validationset-1.tar.gz wget_RRC-HimesEtal2021-validationset.bat 8.2 GB Data used in the validation set
RRC-HimesEtal2021-validationset-2.tar.gz 8.2 GB
RRC-HimesEtal2021-validationset-3.tar.gz 8.2 GB
RRC-HimesEtal2021-validationset-4.tar.gz 8.2 GB


The research team thanks the NASA Exoplanet Archive for hosting this RRC and gratefully acknowledges the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research. This research was supported by the NASA Fellowship Activity under NASA Grant 80NSSC20K0682 and NASA Exoplanets Research Program grant NNX17AB62G.

Acknowledging Use of the Models

If you find this useful for your own work, please cite Himes et al. (2021) with the NASA Exoplanet Archive's standard acknowledgement.

Last updated: 25 May 2021