Preprocessing and cleaning techniques for astronomical light curves. Use when preparing light curve data for period analysis, including outlier removal, trend removal, flattening, and handling data quality flags. Works with lightkurve and general time series data.
Preprocessing is essential before period analysis. Raw light curves often contain outliers, long-term trends, and instrumental effects that can mask or create false periodic signals.
Common preprocessing steps:
import lightkurve as lk
# Remove outliers using sigma clipping
lc_clean, mask = lc.remove_outliers(sigma=3, return_mask=True)
outliers = lc[mask] # Points that were removed
# Common sigma values:
# sigma=3: Standard (removes ~0.3% of data)
# sigma=5: Conservative (removes fewer points)
# sigma=2: Aggressive (removes more points)
import numpy as np
# Calculate median and standard deviation
median = np.median(flux)
std = np.std(flux)
# Remove points beyond 3 sigma
good = np.abs(flux - median) < 3 * std
time_clean = time[good]
flux_clean = flux[good]
error_clean = error[good]
# Flatten to remove low-frequency variability
# window_length: number of cadences to use for smoothing
lc_flat = lc_clean.flatten(window_length=500)
# Common window lengths:
# 100-200: Remove short-term trends
# 300-500: Remove medium-term trends (typical for TESS)
# 500-1000: Remove long-term trends
The flatten() method uses a Savitzky-Golay filter to remove trends while preserving transit signals.
For removing high-frequency stellar variability (rotation, pulsation):
def sine_fitting(lc):
"""Remove dominant periodic signal by fitting sine wave."""
pg = lc.to_periodogram()
model = pg.model(time=lc.time, frequency=pg.frequency_at_max_power)
lc_new = lc.copy()
lc_new.flux = lc_new.flux / model.flux
return lc_new, model
# Iterate multiple times to remove multiple periodic components
lc_processed = lc_clean.copy()
for i in range(50): # Number of iterations
lc_processed, model = sine_fitting(lc_processed)
Warning: This removes periodic signals, so use carefully if you're searching for periodic transits.
IMPORTANT: Quality flag conventions vary by data source!
# For standard TESS files (flag=0 is GOOD):
good = flag == 0
time_clean = time[good]
flux_clean = flux[good]
error_clean = error[good]
# For some exported files (flag=0 is BAD):
good = flag != 0
time_clean = time[good]
flux_clean = flux[good]
error_clean = error[good]
Always verify your data format! Check which approach gives cleaner results.
When building a preprocessing pipeline for exoplanet detection:
flatten() that preserve short-duration dipsFor transit detection, be careful not to remove the transit signal:
Always plot your light curve to verify preprocessing quality:
import matplotlib.pyplot as plt
# Use .plot() method on LightCurve objects
lc.plot()
plt.show()
Best practice: Plot before and after each major step to ensure you're improving data quality, not removing real signals.
pip install lightkurve numpy matplotlib