Description of Kepler Data Validation One-Page Summary Reports


Quarters 1-17 DR25

Version: 2.0
Delivered by the Kepler Project on April 15, 2016


CONTENT

  1. Introduction
    1. Full Time-Series Flux Plot
    2. Phased Full-Orbit Flux Plot
    3. Secondary Eclipse Plot
    4. Phased Transit-Only Flux Plot
    5. Whitened, Phased Transit-Only Plot
    6. Odd-Even Transit Plot
    7. Centroid Offset Plot
    8. DV Analysis Table



Figure 1: Example of a one-page summary that corresponds to Q1–Q17 DR25 TCE 5 of 5 in KIC 008120608. The event is also known as Kepler-186f and KOI 571.05. The large, red letters identify each part of the one-page summary described below.

1. Introduction

Stars that have been identified by the Transiting Planet Search (TPS) module of the Kepler pipeline (Jenkins et al. 2010) as having at least one Threshold Crossing Event (TCE) — a periodic sequence of flux decrements that may be consistent with a transiting exoplanet — are put through a process called Data Validation (DV) (Wu et al. 2010). In DV, diagnostic parameters are computed and plotted for each TCE to help determine if it is an instrumental artifact, a blended binary or other astrophysical false positive, or a true planetary candidate. Although very comprehensive multi-page reports are generated and archived for each Kepler star with at least one TCE, these simple one-page summaries provide much of the critical information for a quick assessment of candidacy. This document describes these one-page summaries, with an example shown in Figure 1. Large, red letters have been added to the figure for guidance throughout the rest of this document.

At the very top of each report is a line of text that contains the Kepler Input Catalog (KIC) number, the candidate number, and its orbital period. An additional line of text may appear below if the TCE ephemeris matches the ephemeris of a Kepler Object of Interest (KOI) that was known at the time of the Data Validation run. This line will contain the matching KOI number (e.g., KOI 571.05), Kepler name (e.g., Kepler-186f) if it matches a confirmed planet, and correlation coefficient, which is required to be 0.75 or greater. Note that this TCE-KOI matching may differ from that ultimately employed in the KOI catalog. Immediately below this is another line of text that contains the Kepler magnitude (Kp), radius (R*), effective temperature (Teff), surface gravity (log g), and metallicity ([Fe/H]) of the host star. The remainder of the one-page summary is divided into sections designated by letters A–H. Each is explained in the following sections of this document, along with an explanation of how each plot and parameter can be used to help disposition a TCE. The software revision URL that appears on the bottom of the page identifies the version of the pipeline code used for the DV run. The date of summary generation is also provided.

A. Full Time-Series Flux Plot

Plot A shows the full flux time-series for the TCE with relative flux on the y-axis and time in Barycentric Kepler Julian Date (BKJD) on the x-axis (BJD = BKJD + 2,454,833.0). The Presearch Data Conditioning (PDC) (Stumpe et al. 2012; Smith et al. 2012) light curve has been detrended by being first run though a harmonic filter, and then a median filter, to remove any long-duration systematics. The start of each new quarter is marked with a vertical dashed red line and labeled with the quarter number (e.g., Q2 for Quarter 2). The module and output number of the CCD that the star falls on each quarter is indicated in brackets next to the quarter number (e.g., [19.1] for output 1 of module 19). Along the bottom of the plot are triangles that mark the expected position of the transits for this particular TCE, corresponding to the best period and epoch identified. These triangles are colored red to identify transits that are coincident with rolling band image artifacts with a similar transit duration (at severity levels > 0), and are colored blue when this is not the case (or for which rolling band severity diagnostics are unavailable).

This plot helps identify any potential inter-quarter systematics that may have triggered the TCE. Gaps in the data at quarter boundaries, and at monthly intervals within each quarter, are expected because the spacecraft is re-oriented to download data. If the TCE is a planet candidate, then a transit should occur at every triangle where data exists. TCEs whose transits occur primarily near quarter boundaries are more suspect because the strongest systematics are at the start of quarters. Additional (unrelated) transits may be visible, especially if DV has identified more than one TCE for the system. The total number of TCEs found for the KIC target is shown at the very top of the DV summary associated with the candidate number.

B. Phased Full-Orbit Flux Plot

Plot B shows the phase-folded light curve for the TCE, folded according to its best-fit period with the phase in days plotted on the x-axis. The epoch of the primary transit is indicated by an upward triangle on the bottom of the plot at phase 0.0. The color of this triangle is dependent on the candidate number (1 = red, 2 = blue, 3 = green, 4 = black, 5 = magenta, 6 = gold, 7 = red, 8 = blue, 9 = green, and 10 = black). The location of the strongest secondary eclipse candidate is indicated by a downward triangle of the same color (see Section C). In the case of KIC 008120608-05 / Kepler-186f / KOI 571.05 shown above, the magenta downward facing triangle at -30 days shows the position of the strongest secondary eclipse candidate. The phased locations of transits from other TCEs detected on this star are indicated by upward triangles of different colors corresponding to their candidate numbers. The small cyan-filled blue circles are phase-binned averages of the data. A transit model fit is performed in the whitened domain (see Section E), and the resulting (de-whitened) model is shown on this plot via the solid red line.

This plot helps assess whether the phased data can be adequately explained by a physical transit model. If the TCE is a viable planet candidate, the transit model should accurately fit the phased transit, although since the transit model is actually fit in the whitened domain in Plot E, discrepancies may occur, especially in the presence of instrumental artifacts or stellar variability. In plot B, the out-of-transit baseline should generally be flat. A secondary eclipse typically should not be visible, except in cases of hot-Jupiter planets with short orbital periods. If an eclipse is visible, this suggests the TCE may be an eclipsing binary false positive. It is not unusual to observe additional transits scattered about in this light curve, especially if DV has identified more than one TCE for the system. Generally, these transits should not be in-phase at the period of the current TCE under examination.

C. Secondary Eclipse Plot

Plot C shows the strongest secondary eclipse candidate identified by the Weak Secondary test for the TCE under investigation. In the Weak Secondary test, the primary transit signal is removed, and the whitening filter is re-applied to the light curve. The TPS algorithm is then run on the resulting data with the trial pulse duration of the primary TCE. Finally, the resulting single event detection time-series is folded at the same period as the primary TCE. This produces, among many other useful quantities, the Multiple Event Statistic (MES) value and phase of the strongest transit-like signal at the TCE's period, aside from the primary TCE itself.

The phased data is shown in Plot C with the time axis centered on the secondary eclipse candidate. An upward facing red arrow is always shown at phase 0.0. Black dots represent the raw data and cyan-filled blue circles represent phase-binned averages of the data. Above the plot are values for the depth of the secondary eclipse candidate in parts per million (ppm), with the error on the transit depth shown in brackets, the phase of the secondary eclipse candidate in days, and the MES of the secondary eclipse candidate. If the MES is greater than 7.1 (the formal mission detection threshold), then it is colored red to indicate the secondary eclipse candidate is statistically significant.

This plot helps assess whether the secondary eclipse candidate is real. If so, depending on the strength of the secondary eclipse, the period, and other properties, this may indicate the candidate is an eclipsing binary and not a viable planet candidate. Typically, secondary eclipses of validated planets are only observed in Kepler data for hot Jupiters. This plot may also help to highlight transit-like artifacts in the data, which may cast doubt on the uniqueness of the primary TCE and its validity as a planet candidate.

D. Phased Transit-Only Flux Plot

Plot D shows the phase-folded light curve for the TCE, with the range on the x-axis reduced so that only the primary transit is visible. The x-axis unit is hours, and the cyan-filled blue circles are phase-binned averages of the original data. An upward facing red arrow is always shown at phase 0.0. As explained in Section B, a transit model fit is performed in the whitened domain (see Section E), and the resulting (de-whitened) model is shown on this plot via the solid red line.

This plot allows a detailed assessment of the primary transit and theoretical model fit. If the TCE is a viable planet candidate, the transit model should accurately fit the phased transit, although since the transit model is actually fit in the whitened domain in Plot E, discrepancies may occur, especially in the presence of instrumental artifacts or stellar variability. The primary transit should also be fairly symmetric around Phase 0.0. Asymmetry in the light curve is an indication that the TCE could be a result of instrumental systematics or transit-like astrophysical phenomena.

E. Whitened, Phased Transit-Only Plot

Plot E shows the phase-folded, binned light curve for the TCE via the blue points with a whitening filter (Jenkins et al. 2010) applied to remove any correlated noise (e.g., stellar variability, remaining systematics). A best-fit transit model, which has also been passed through the whitening filter, is shown via a solid red line with red points. Residuals of the best-fit to the binned data are shown by green dots (offset in flux for clarity), while the magenta dots are data centered around phase 0.5 (also offset in flux for clarity). The secondary eclipse may occur elsewhere for non-circular orbits (see Section C). Above the plot are values for the MES, the total number of transits that have been fit, the Signal-to-Noise Ratio (SNR) of the iterative whitened transit model fit, the reduced Chi-Squared value (χ2/DoF), where DoF is the number of Degrees of Freedom in the fit, and the transit depth in parts per million (ppm), with the error on the transit depth shown in brackets.

This plot compares the primary transit and the model fit, to determine the goodness of fit and how any systematics in the data are affected by the whitening filter. It is not unusual to see an increase in flux in both the binned data and the transit model, immediately before and after the transit, due to the whitening filter. The transit model for a good planet candidate should fit the binned data, with no obvious trends observed in the residuals that would indicate an asymmetric transit. A good fit should have a reduced Chi-Squared near 1.0. Although the signal-to-noise should be somewhat similar to the MES, it will generally be larger than the MES due to fitting a fully detailed transit model. High MES and SNR values indicate a more significant detection of a transit-like signature.

F. Odd-Even Transit Plot

Plot F shows the phase-folded light curve (black dots) separately for the odd- and even-numbered transits. Binned data are indicated by cyan-filled blue circles. On the left side, only the odd (i.e., the first, third, fifth, etc.) transit signatures are phase-folded and shown, while on the right side only the even (i.e., the second, fourth, sixth, etc.) transit signatures are shown. A transit model has been independently fit to the odd and even sets (in the whitened domain) to determine the transit depth of each set. The red solid line indicates the transit depth of all the data fitted together, with the red dashed lines indicating the uncertainty in that measurement. At the top of the plot the significance of the difference in depth for the odd and even numbered transits is shown, both in terms of a percentile and sigma.

This plot exposes any alternating difference in transit depth. If the TCE under investigation is a valid planetary candidate, there should be no statistically significant difference between the depths of the odd and even numbered transits. A significant difference could indicate that the object is an eclipsing binary with a secondary eclipse at phase 0.5 that is slightly less deep than the primary, and the TCE's period is half that of the binary. Note, however, that an eclipsing binary could have equal eclipse depths, and thus a lack of significant transit depth variations does not, by itself, confirm the planetary nature of a TCE. Additionally, for TCEs ≳90 days, where only one transit occurs per quarter, seasonal variations in crowding can induce an apparent odd-even difference for an object that is truly planetary. Thus, caution is encouraged when applying this test to long-period TCEs.

G. Centroid Offset Plot

Plot G shows the PRF centroid offset with the RA Offset in arcseconds on the x-axis, and the Dec Offset in arcseconds on the y-axis. For each quarter, two separate pixel-level images of the source are computed, one using the average of only the in-transit data, and the other using the average of data just outside of transit. In principle, the difference of the in- and out-of-transit images is used to produce a difference image. The difference image produces a star image at the location of the transit signal.

The Kepler Pixel Response Function (PRF) is the Kepler point spread function combined with expected spacecraft pointing jitter and other systematic effects (Bryson et al. 2010). The PRF is fit separately to the difference and out-of-transit images to compute centroid positions. The fit to the difference image gives the location of the transit source, and the fit to the out-of-transit image gives the location of the target star (assuming there are no other bright stars in the aperture). Subtracting the target star location from the transit source location gives the offset of the transit source from the target star. This is performed on a per-quarter basis, and the quarterly offsets are shown as green cross-hairs and labeled with the quarter number, where the length of the arms of each cross-hair represents the 1σ error in RA and Dec. Asterisks in the image show the location of known stars in the aperture, with the red asterisk being the target star. The coordinates of the plot are chosen so that the target star is at (0,0). A robust fit (i.e., an error-weighted fit that iteratively removes extreme outliers) is performed using all the quarterly centroid offsets to compute an average in-transit offset position, and is shown with 1σ error bars as a magenta cross. A dark blue circle is shown, always centered on the magenta cross, that represents the 3σ limit on the magnitude of the robustly-fit, quarter-averaged offset of the transit source from the target star. The numerical value of the quarterly-averaged offset source from the target star is given by OotOffset-rm in the DV analysis table (H).

This plot graphically indicates whether there is a significant centroid offset between the transit source and target star locations during transits, and if an associated KIC star is likely to be the true source of the TCE. In general, a significant (i.e., >3σ) centroid offset is seen if the red asterisk lies outside the dark blue circle. In this case it is likely that the observed transit is not on the target star. However, there are several ways in which this diagnostic can be misleading: 1) if the offset (distance of the center of the magenta cross-hair from the target star) is less than ∼0.1 arcsec, then the offset is likely due to systematic measurement error and the transit is likely to be on the target star regardless of the offset value in sigma, 2) If there are other stars in the aperture with brightness equal to or greater than the target star, then the offset computation can be very inaccurate. This situation can be detected by comparing OotOffset-rm with KicOffset-rm in the DV analysis table (H). When they differ by more than 2 arcsec and there are bright stars in the aperture, then OotOffset-rm is likely invalid. In this case KicOffset-rm may be used to estimate the offset of the transit source from the target, though this has its own caveats, namely that OotOffset-rm may not be accurate due to proper motion of the target, or an error in catalog position. Finally, these diagnostics are valid only if the TCE is due to a transit or eclipse on a star in the aperture. If the TCE results from a systematic error, such as a spacecraft pointing tweak, pixel sensitivity dropout, or other similar effect, then this method of measuring centroids is invalid.

H. DV Analysis Table

Section H shows a table of fit parameters, derived parameters, and vetting statistics generated by the DV analysis. The left column contains best-fit parameters from a Mandel-Agol (2002) transit model in whitened domain, assuming the TCE is a transiting planet. The right column contains various diagnostic parameters, most of which are used to determine the location of the transit signal relative to the target star using a variety of methods, as well as the quality of the centroid measurements.

The parameters for the left column are:

  • Period: The orbital period of the planetary candidate in days. The measurement error is shown in brackets.
  • Epoch: The epoch (i.e., the central time of the first transit) shown in BKJD. The measurement error is shown in brackets.
  • Rp/R*: The ratio of the planetary radius to the stellar radius. The measurement error is shown in brackets.
  • a/R*: The ratio of the planet-star separation at time of transit to the stellar radius. The measurement error is shown in brackets.
  • b: The impact parameter, with the measurement error shown in brackets. A value of b = 0 represents a central transit and b = 1 represents a grazing transit where the center of the planet aligns with the limb of the star at the time of central transit. The measurement error is shown in brackets. Note that the DV fit does not allow for models with b > 1.0.
  • Seff: The calculated insolation flux relative to the Solar flux received at the top of Earth's atmosphere. The measurement error is shown in brackets.
  • Teq: The calculated equilibrium temperature of the planet's surface in Kelvin. The measurement error is shown in brackets.
  • Rp: The calculated planetary radius in units of Earth radii. The measurement error is shown in brackets.
  • a: The calculated semi-major axis of the system in au. The measurement error is shown in brackets.
  • Ag: The calculated geometric albedo based on the depth of the most significant secondary event identified by the Weak Secondary test (see Section C). The measurement error is shown in the first set of brackets. The difference in standard deviations between the geometric albedo and 1.0 is shown in the second set of brackets. The geometric albedo is displayed in red if the secondary multiple event statistic exceeds the transiting planet detection threshold and the geometric albedo is significantly greater than 1.0.
  • Teffp: The calculated planet effective temperature in Kelvin based on the depth of the most significant secondary event at the period and trial pulse duration of the TCE. The measurement error is shown in the first set of brackets. The difference in standard deviations between the planet effective temperature and equilibrium temperature is shown in the second set of brackets. The planet effective temperature is displayed in red if the secondary multiple event statistic exceeds the transiting planet detection threshold and the planet effective temperature is significantly greater than the equilibrium temperature.

The parameters for the right column are:

  • ShortPeriod-sig: A comparison of the period of the current TCE to the next shortest period TCE in the system. A value of 0% [0.0σ] indicates a perfect match between the two periods (i.e., no difference). Larger percentages (larger sigmas) indicate an increasing difference between the TCE periods. The text will appear in red if the two periods are considered to be significantly related. A significant value of ShortPeriod-sig may indicate that the system contains an eclipsing binary whose primary and secondary eclipse events have been detected as two different TCEs, thus having very similar periods but different epochs. If ShortPeriod-sig has a value of "NA" it means that there are no TCEs detected in the system with shorter periods than the current TCE under examination.
  • LongPeriod-sig: A comparison of the period of the current TCE to the next longest period TCE in the system. A value of 0% [0.0σ] indicates a perfect match between the two periods (i.e., no difference). Larger percentages (larger sigmas) indicate an increasing difference between the TCE periods. The text will appear in red if the two periods are considered to be significantly related. A significant value of LongPeriod-sig may indicate that the system contains an eclipsing binary whose primary and secondary eclipse events have been detected as two different TCEs, thus having very similar periods but different epochs. If LongPeriod-sig has a value of "NA" it means that there are no TCEs detected in the system with longer periods than the current TCE under examination.
  • ModelChiSquare2-sig: The significance of the chi-square2 discriminator calculated using the transit model (Seader et al. 2015). If this value is close to 100% then it indicates the shape of the transit events are well described by a transit model. If this value is close to 0% then the transit events are not well described by a transit model, and the event is likely a false positive. A value of N/A indicates ModelChiSquare2-sig was not calculated for this TCE.
  • ModelChiSquareGof-sig: The significance of the chi-square goodness of fit discriminator calculated using the transit model (Seader et al. 2015). A value of 100% indicates that the model fits the flux time series data, while smaller percentages indicate lower quality fits. A value of N/A indicates ModelChiSquareGof-sig was not calculated for this TCE.
  • Bootstrap-pfa: The probability of a false alarm due to statistical fluctuations. When Bootstrap-pfa ≤ 1e-12 it is likely that the transit detection is credible. In essence the test works by searching the transit-removed data for signals with the same period and duration as the TCE — if signals of comparable strength are found the validity of the original TCE is called into question. Technically speaking, the bootstrap-pfa compares the distribution of MES values that result when searching the transit-removed data at the same period and duration as the original TCE, to the MES of the original TCE. See appendix A of Seader et al. 2015 for a detailed explanation of the bootstrap test.
  • RollingBand-fgt: The fraction of good transits (fgt) with respect to coincidence with rolling band image artifacts at durations near the transit duration associated with the TCE. Good transits are those that occur on cadences for which the rolling band severity level = 0, i.e., no rolling bands are detected. The values in brackets are the number of transits at severity level = 0 / the total number of transits on cadences with rolling band diagnostics. Rolling band diagnostics are not available for certain quarters, so their transits are not represented in the denominator.
  • GhostDiagnostic-chr: The ratio of the core to halo aperture correlation statistics for the optical ghost diagnostic test. This test calculates the correlation of the TCE signal with two separate light curves — one created using the average of the pixels inside the target's optimal aperture minus the average of the pixels in an annulus surrounding the target aperture (core aperture correlation statistic), and the other using the average of the pixels in the annulus surrounding the target aperture (halo aperture correlation statistic). GhostDiagnostic-chr is displayed in red if the core aperture correlation statistic is less than the halo aperture correlation statistic, which indicates that the source of the transit signature is not likely to be contained in the optimal aperture associated with the target star.
  • Centroid-sig: A measure of whether there is a statistically significant (sig) centroid shift correlated with the transit signature as measured by flux-weighted centroids. A Centroid-sig value near 0% indicates that a flux-weighted centroid shift is detected, however this method is unreliable if there is any significant amount of light from nearby stars in the target's aperture.
  • Centroid-so: The measured angular distance between the target star position and the location of the transiting source, determined from the in- and out-of-transit flux-weighted centroid shift. This helps determine if there is a significant offset (so) between the target and the source of the transit signal. The measurement significance is shown in brackets.
  • OotOffset-rm: The measured angular distance between the quarterly-averaged out-of-transit source location and the quarterly averaged in-transit source location, both determined via PRF fitting, using a robust mean (rm). The measurement significance is shown in brackets.
  • KicOffset-rm: The measured angular distance between the quarterly-averaged transit location determined via PRF fitting and the target star position listed in the KIC, using a robust mean (rm). The measurement significance is shown in brackets.
  • OotOffset-st: The number of quarters for which offsets of the transit source from the out-of-transit (OOT) source location were successfully computed, as determined from PRF fitting. The data is broken down into each season (S1/S2/S3/S4) with the season total (st) number shown in brackets. This is useful to determine if the centroid measurements are all in the same season.
  • KicOffset-st: The number of quarters for which offsets of the transit source location from the target star position listed in the KIC were successfully computed, using PRF fitting. The data is broken down into each season (S1/S2/S3/S4) with the season total (st) number shown in brackets. This is useful to determine if the existing centroid measurements are from multiple seasons, or if they are all from the same season, which may indicate a bias.
  • DiffImageQuality-fgm: A measure of the quality of the PRF fit to the difference images. The correlation between the fitted PRF and the difference image is computed for each quarter; when the correlation is > 0.7 the fit is declared to be "high-quality". PRF fits that are not high quality are not necessarily invalid and are used in the centroid offset plot (see Section G), though examination of the pixel images in the full DV report is recommended to determine their validity. The reported value is the fraction of quarters with successful PRF fits that were deemed "high-quality", also known as the fraction of good measurements (fgm). The three numbers in brackets represent the number of quarters with high-quality centroids, the number of quarters with a successful PRF fit, and the total number of quarters with difference images.
  • DiffImageOverlap-fno: The fraction of difference images that are generated from non-overlapping transits only. Transits that overlap transits associated with other TCEs on the same star are not excluded from computation of a difference image if doing so would leave no clean transits in the given quarter. The values in brackets are the number of quarters with difference images based on non-overlapping transits / the total number of quarters with difference images. Difference images based on overlapping transits may be very difficult to interpret; caution is advised.