Description of Kepler Data Validation One-Page Summary Reports


Quarters 1-16

Version: 2.0
Delivered by the Kepler Project on Nov. 1, 2013


CONTENT

  1. Introduction
    1. Full Time-Series Flux Plot
    2. Phased Full-Orbit Flux Plot
    3. Secondary Eclipse Plot
    4. Phased Transit-Only Flux Plot
    5. Whitened, Phased Transit-Only Plot
    6. Odd-Even Transit Plot
    7. Centroid Offset Plot
    8. DV Analysis Table



Figure 1: Example of a one-page summary that corresponds to Q1-16 TCE 1 of 2 in KIC 005780885. The event is also known as Kepler-7b and KOI 97.01. The large, red letters identify each part of the one-page summary described below.

1. Introduction

Within the Kepler pipeline, stars that have been identified with at least one Threshold Crossing Event (TCE)—which is a series of at least three transit-like signals with a consistent period and sufficient signal-to-noise ratio—are put through a process called Data Validation (DV) (Wu et al. 2010). In DV, diagnostic parameters are computed and plotted for each TCE to help determine if it is an instrumental artifact from the spacecraft, a blended binary or other astrophysical false positive, or a true planetary candidate. Although very comprehensive multi-page reports are generated and archived for each Kepler star with at least one TCE, these simple one-page summaries provide much of the critical information for a quick assessment of candidacy. This document describes these one-page summaries, with an example shown in Figure 1. Large, red letters have been added to the figure for guidance throughout the rest of this document.

At the very top of each report is a single line of text that contains the Kepler Input Catalog (KIC) number, the planet candidate number, and its orbital period. Immediately below this is another line of text that contains the Kepler magnitude (Kp) of the star and its radius (R*) in Solar radii, effective temperature (Teff), surface gravity (log g), and metallicity ([Fe/H]). The remainder of the one-page summary is divided into sections designated by letters A-H. Each is explained in the following sections of this document, along with an explanation of how each plot and parameter can be used to help disposition a TCE. The software revision URL that appears on the bottom of the page identifies the version of the pipeline code used for the DV run. The date of summary generation is also provided.

A. Full Time-Series Flux Plot

Plot A shows the full flux time-series for the TCE with relative flux on the y-axis and time in Barycentric Kepler Julian Date (BKJD) on the x-axis (BJD = BKJD + 2,454,833.0). The light curve has been detrended beyond that accomplished by Presearch Data Conditioning (PDC) (Stumpe et al. 2012; Smith et al. 2012) by using a running median filter to remove any long-period systematics. The start of each new quarter is marked with a vertical dashed red line and labeled with the quarter number (e.g., Q2 for Quarter 2). The module and output number of the CCD that the star falls on each quarter is indicated in brackets next to each quarter number (e.g., [19.1] for output 1 of module 19). Along the bottom of the plot are triangles that mark the expected position of the transits for this particular TCE, corresponding to the best period and epoch identified.

This plot helps identify any potential inter-quarter systematics that may have triggered the TCE. Gaps in the data along the quarter boundaries, and at monthly intervals within each quarter, are expected because the spacecraft is re-oriented to download data. If the TCE is a planet candidate, then a transit should occur at every triangle where data exists. TCEs whose transits occur primarily near quarter boundaries are more suspect because the strongest systematics are at the start of quarters. Additional transits may be visible, especially if DV has identified more than one TCE for the system. The total number of TCEs found for the KIC target is shown at the very top of the DV report.

B. Phased Full-Orbit Flux Plot

Plot B shows the phase-folded light curve for the TCE, folded according to its best-fit period so that the phase in days is plotted on the x-axis. The epoch of the primary transit is indicated by an upward blue triangle on the bottom of the plot at phase 0.0. The location of the strongest secondary eclipse candidate is indicated by a downward blue triangle, (see Section C). The phase locations of transits from other TCEs detected on this star are indicated by upward triangles of different colors (e.g., the red triangle on the sample plot in Figure 1). The small cyan-filled blue circles are phase-binned averages of the data. A transit model fit is performed on the whitened data, (see Section E), and the resulting (de-whitened) model is shown on this plot via the solid red line.

This plot helps assess whether the phased data can be adequately explained by a physical transit model. If the TCE is a viable planet candidate, the transit model should accurately fit the phased transit, although discrepancies may occur since the transit model is actually fit to the whitened data in Plot E. In plot B, the out-of-transit baseline should generally be flat. A secondary eclipse typically should not be visible, except in cases of hot-Jupiter planets with very short orbital periods. If an eclipse is visible, this suggests the TCE may be an eclipsing binary false positive. It is not unusual to observe additional transits scattered about in this light curve, especially if DV has identified more than one TCE for the system. Generally, these transits should not be in-phase at the period of the current TCE under examination.

C. Secondary Eclipse Plot

Plot C shows the strongest secondary eclipse candidate identified for the TCE under investigation. Starting with the Q1-Q16 search, a new diagnostic test is performed called the Weak Secondary Test. First, the primary transit signal is removed, and the whitening filter is re-applied to the light curve. The Transiting Planet Search (TPS) algorithm is then run on the resulting data with the same duration as the primary TCE. Finally, the resulting single event detection time-series is folded at the same period as the primary TCE. This produces, among many other useful quantities, Multiple Event Statistic (MES) value and phase of the strongest transit-like signal at the TCE's period, aside from the primary TCE itself.

The phased data centered on the secondary eclipse candidate is shown in Plot C, with black dots representing the raw data and cyan-filled blue circles representing phase-binned averages of the data. The values of the MES and phase of the secondary eclipse candidate (in days relative to the TCE) are displayed at the top of Plot C. If the MES is greater than 7.1 (the formal mission detection threshold) then it is colored red to indicate the secondary eclipse candidate is statistically significant.

This plot helps assess whether the secondary eclipse candidate is real. If so, depending on the strength of the secondary eclipse, the period of the planet, and its other properties, this may indicate the candidate is an eclipsing binary and not a viable planet candidate. Typically, secondary eclipses of validated planets are only observed for hot Jupiters using Kepler data. This plot and the secondary eclipse candidate may also help to highlight transit-like artifacts in the data, which may cast doubt on the uniqueness of the primary TCE and thus validity of the planet candidate.

D. Phased Transti-Only Flux Plot

Plot D shows the phase-folded light curve for the TCE, with the range on the x-axis reduced so that only the primary transit is visible. The x-axis unit is hours, and the cyan-filled blue circles are phase-binned averages of the original data. As explained in Section B, a transit model fit is performed on the whitened data (see Section E), and the resulting (de-whitened) model is shown on this plot via the solid red line.

This plot allows a detailed assessment of the primary transit and theoretical model fit. If the TCE is a viable planet candidate, the transit model should accurately fit the phased transit, although discrepancies may occur since the transit model is actually fit to the whitened data in Plot E. The primary transit should also be fairly symmetric around Phase 0.0. Asymmetry in the light curve is an indication that the TCE could be a result of spacecraft systematics or other astrophysical phenomena.

E. Whitened, Phased Transit-Only Plot

Plot E shows the phase-folded, binned light curve for the TCE with a whitening filter applied to remove any correlated noise (e.g., stellar variability, remaining systematics). The y-axis shows the Whitened Flux Values (Tenenbaum et al. 2010). A best-fit transit model is shown via a solid red line, which has also been passed through the whitening filter. Residuals of the best-fit to the binned data are shown by green dots (offset in flux for clarity), while the magenta dots are data centered around phase 0.5 (also offset in flux for clarity). The secondary eclipse may occur elsewhere for non-circular orbits (see Section C). Above the plot are values for the Multiple Event Statistic (MES), the total number of individual transits that have been fit, the Signal-to-Noise Ratio (SNR) of the iterative whitened transit model fit, the reduced Chi-Squared value (χ2/DoF), where DoF is the number of Degrees of Freedom in the fit, and the transit depth in parts per million (ppm), with the error on the transit depth shown in brackets.

This plot compares the primary transit and the model fit, to determine how any systematics in the data are affected by the whitening filter. It is not unusual to see an increase in flux in both the binned data and the transit model, immediately before and after the transit, due to the whitening filter. The transit model for a good planet candidate should fit the binned data, with no obvious trends observed in the residuals that would indicate an asymmetric transit. A good fit should have a reduced Chi-Squared near 1.0. Although the signal-to-noise should be somewhat similar to the MES, it will generally be higher than the MES due to fitting a fully detailed transit model. High MES and SNR values indicate a more significant detection of a transit-like signature.

F. Odd-Even Transit Plot

Plot F shows the phase-folded light curve (black dots) separately for the odd and even transits. Binned data are indicated by cyan-filled blue circles. On the left side, only the odd (i.e., the first, third, fifth, etc.) transit signatures are phase-folded and shown, while on the right side only the even (i.e., the second, fourth, sixth, etc.) transit signatures are shown. A transit model has been independently fit to the odd and even sets (in the whitened domain) to determine the transit depth of each set. The red line indicates the transit depth of all the data fitted together, with the red boxes indicating the uncertainty in that measurement. At the top of the plot the significance of the difference in depth for the odd and even numbered transits is shown, both in terms of a percentile and sigma.

This plot exposes any alternating difference in transit depth. If the TCE under investigation is a valid planetary candidate, there should be no statistically significant difference between the depths of the odd and even numbered transits. A significant difference could indicate that the object is an eclipsing binary with a secondary eclipse at phase 0.5 that is slightly less deep than the primary, and the TCE's period is half that of the binary. Note, however, that an eclipsing binary could have equal eclipse depths, and thus a lack of significant transit depth variations does not, by itself, confirm the planetary nature of a TCE. Additionally, for TCEs ≳90 days, where only one transit occurs per quarter, seasonal variations in crowding can induce an apparent odd-even difference when the object is truly planetary. Thus, caution is encouraged when applying this test to long-period TCEs.

G. Centroid Offset Plot

Plot G shows the PRF centroid offset with the RA Offset in arcseconds on the x-axis, and the Dec Offset in arcseconds on the y-axis. For each quarter, two separate pixel-level images of the source are computed, one using the average of only the in-transit data, and the other using the average of data just outside of transit. The difference of the in and out-of-transit images is used to produce a difference image. The difference image produces a star image at the location of the transit signal.

The Kepler Pixel Response Function (PRF) is the Kepler point spread function combined with expected spacecraft pointing jitter and other systematic effects (Bryson et al. 2010). The PRF is fit separately to the difference and out-of-transit images to compute centroid positions. The fit to the difference image gives the location of the transit source, and the fit to the out-of-transit image gives the location of the target star (assuming there are no other bright stars in the aperture). Subtracting the target star location from the transit source location gives the offset of the transit source from the target star. This is performed on a per-quarter basis, and the quarterly offsets are shown as green cross-hairs and labeled with the quarter number, where the length of the arms of each cross-hair represents the 1σ error in RA and Dec. Asterisks in the image show the location of known stars in the aperture, with the red asterisk being the target star. The coordinates of these stars are chosen so that the target star is at (0,0). A robust fit (i.e., an error-weighted fit that iteratively removes extreme outliers) is performed using all the quarterly centroid offsets to compute an average in-transit offset position, and is shown with 1σ error bars as a magenta cross. A dark blue circle is shown, always centered on the magenta cross, that represents the 3σ limit on the magnitude of the robustly-fit, quarter-averaged offset of the transit source from the target star. The numerical value of the quarterly-averaged offset source from the target star is given by OotOffset-rm in the DV analysis table (H).

This plot graphically indicates whether there is a significant centroid offset between the transit source and target star location during transits, and if an associated KIC star is likely to be the true source of the TCE. In general, a significant (i.e., >3σ) centroid offset is seen if the red asterisk lies outside the dark blue circle. In this case it is likely that the observed transit is not due to a transit on the target star. However, here are several ways in which this diagnostic can be misleading: 1) if the offset (distance of the center of the magenta cross-hair from the target star) is less than ∼0.1 arcsec, then the offset is likely due to systematic measurement error and the transit is likely to be on the target star regardless of the offset value in sigma, 2) If there are other stars in the aperture with brightness equal to or greater than the target star, then the offset computation can be very inaccurate. This situation can be detected by comparing OotOffset-rm with KicOffset-rm in the DV analysis table (H). When they differ by more than 2 arcsec and there are bright stars in the aperture, then the plot is likely invalid. In this case OotOffset-rm may be invalid and KicOffset-rm may be used to estimate the offset of the transit source from the target. Finally, these diagnostics are valid only if the TCE is due to a transit or eclipse on a star in the aperture. If the TCE results from a systematic error, such as a spacecraft pointing tweak, pixel sensitivity dropout, or other similar effect, then this method of measuring centroids is invalid.

H. DV Analysis Table

Section H shows a table of fit parameters, derived parameters, and vetting statistics generated by the DV analysis. The left column contains best-fit parameters from a Mandel-Agol (2002) transit model to the whitened data, assuming the TCE is a transiting planet. The right column contains various diagnostic parameters, most of which are used to determine the location of the transit signal relative to the target star using a variety of methods, as well as the quality of the centroid measurements.

The parameters for the left column are:

  • Period: The orbital period of the planetary candidate in days. The measurement error is shown in brackets.
  • Epoch: The epoch (i.e., the central time of the first transit) shown in BKJD. The measurement error is shown in brackets.
  • Rp/R*: The ratio of the planetary radius to the stellar radius. The measurement error is shown in brackets.
  • a/R*: The ratio of the planet-star separation at time of transit to the stellar radius. The measurement error is shown in brackets.
  • b: The impact parameter. (A value of b = 0 represents a central transit and b = 1 represents a grazing transit where the center of the planet aligns with the limb of the star at the time of central transit.) The measurement error is shown in brackets.
  • Teq: The calculated equilibrium temperature of the planet's surface in Kelvin.
  • Rp: The calculated planetary radius in units of Earth radii.
  • a: The calculated semi-major axis of the system in au.

The parameters for the right column are:

  • Epoch-sig: A metric for how well the epochs computed separately for the odd-only and even-only transits agree with each other. 100% (0.0σ) indicates a perfect match, while lower percentages (higher σs) indicate more significant odd-even epoch differences. A significant value of Epoch-sig suggests that the TCE is an eclipsing binary with a slightly eccentric orbit, (so that the secondary eclipse is slightly offset from phase 0.5) with the TCE period half of the binary's true orbital period. The measurement significance is shown in brackets.
  • ShortPeriod-sig: A comparison of the period of the current TCE to the next shortest period TCE in the system. A value of 100% (0.0σ) indicates no match at all between the two periods, with lower percentages (higher sigmas) indicating increasingly more significant agreements between the TCE periods. A significant value of ShortPeriod-sig may indicate that the system contains an eclipsing binary whose primary and secondary eclipse events have been detected as two different TCEs, thus having very similar periods but different epochs. If ShortPeriod-sig has a value of "NA" it means that there are no additional TCEs detected in the system with a shorter period than the current TCE under examination. The measurement significance is shown in brackets.
  • LongPeriod-sig: A comparison of the period of the current TCE to the next longest period TCE in the system. A value of 100% (0.0σ) indicates no match at all between the two periods, with lower percentages (higher sigmas) indicating increasingly more significant agreements between the TCE periods. A significant value of LongPeriod-sig may indicate that the system contains an eclipsing binary whose primary and secondary eclipse events have been detected as two different TCEs, thus having very similar periods but different epochs. If LongPeriod-sig have a value of "NA" it means that there are no additional TCEs detected in the system with a longer period than the current TCE under examination. The measurement significance is shown in brackets.
  • Bootstrap-pfa: The probability of a false alarm due to statistical fluctuations. When Bootstrap-pfa ≤ 1e-12 we believe we have a credible transit detection. In essence the test works by searching the transit-removed data for signals with the same period and duration of the TCE, comparing the resulting MES values to the MES of the TCE.
  • Centroid-sig is a measure of whether there is a statistically significant centroid shift correlated with the transit as measured by flux-weighted centroids. A Centroid-sig value near 100% indicates that a flux-weighted centroid shift was not detected.
  • Centroid-so: The measured angular distance between the target star position and the location of the transiting source, determined from the in- and out-of-transit flux-weighted centroid shift. The measurement significance is shown in brackets.
  • OotOffset-rm: The measured angular distance between the quarterly-averaged out-of-transit source location and the quarterly averaged location of the transiting source, both determined via PRF fitting. The measurement significance is shown in brackets.
  • KicOffset-rm: The measured angular distance between the quarterly-averaged transit location determined via PRF fitting and the target star position listed in the KIC. The measurement significance is shown in brackets.
  • OotOffset-bf: The measured angular distance between the out-of-transit source location and the location of the transiting source, both determined via a joint multi-quarter PRF fit. The measurement significance is shown in brackets.
  • KicOffset-bf: The measured angular distance between the transit location determined via a joint multi-quarter PRF fit and the target star position listed in the KIC. The measurement significance is shown in brackets.
  • OotOffset-st: The number of quarters for which offsets of the transit source from the out-of-transit source location, both determined via PRF fitting, were successfully computed. The data is broken down into each season (S1/S2/S3/S4) with the total number shown in brackets. This is useful to determine if the centroid measurements are all in the same season.
  • KicOffset-st: The number of quarters for which offsets of the transit source location, determined via PRF fitting, from the target star position listed in the KIC were successfully computed. The data is broken down into each season (S1/S2/S3/S4) with the total number shown in brackets. This is useful to determine if the centroid measurements are all in the same season.
  • DiffImageQuality-fgm: A measure of the the quality of the PRF fit to the difference image for each quarter by computing the correlation between the fitted PRF and the difference image. When the correlation is > 0.7 the fit is declared to be "high-quality." PRF fits that are not high quality are not necessarily invalid and are used in the centroid offset plot, though examination of the pixel images in the full DV report to determine their validity is recommended. The values in brackets are the number of quarters with high-quality centroids / the number of quarters for which the centroids (and thus associated metrics) were successfully computed.

Last updated: 10 February 2021