Data conditioning techniques for gravitational wave detector data. Use when preprocessing raw detector strain data before matched filtering, including high-pass filtering, resampling, removing filter wraparound artifacts, and estimating power spectral density (PSD). Works with PyCBC TimeSeries data.
Data conditioning is essential before matched filtering. Raw gravitational wave detector data contains low-frequency noise, instrumental artifacts, and needs proper sampling rates for computational efficiency.
The conditioning pipeline typically involves:
Remove low-frequency noise and instrumental artifacts:
from pycbc.filter import highpass
# High-pass filter at 15 Hz (typical for LIGO/Virgo data)
strain_filtered = highpass(strain, 15.0)
# Common cutoff frequencies:
# 15 Hz: Standard for ground-based detectors
# 20 Hz: Higher cutoff, more aggressive noise removal
# 10 Hz: Lower cutoff, preserves more low-frequency content
Why 15 Hz? Ground-based detectors like LIGO/Virgo have significant low-frequency noise. High-pass filtering removes this noise while preserving the gravitational wave signal (typically >20 Hz for binary mergers).
Downsample the data to reduce computational cost:
from pycbc.filter import resample_to_delta_t
# Resample to 2048 Hz (common for matched filtering)
delta_t = 1.0 / 2048
strain_resampled = resample_to_delta_t(strain_filtered, delta_t)
# Or to 4096 Hz for higher resolution
delta_t = 1.0 / 4096
strain_resampled = resample_to_delta_t(strain_filtered, delta_t)
# Common sampling rates:
# 2048 Hz: Standard, computationally efficient
# 4096 Hz: Higher resolution, better for high-mass systems
Note: Resampling should happen AFTER high-pass filtering to avoid aliasing. The Nyquist frequency (half the sampling rate) must be above the signal frequency of interest.
Remove edge artifacts introduced by filtering:
# Crop 2 seconds from both ends to remove filter wraparound
conditioned = strain_resampled.crop(2, 2)
# The crop() method removes time from start and end:
# crop(start_seconds, end_seconds)
# Common values: 2-4 seconds on each end
# Verify the duration
print(f"Original duration: {strain_resampled.duration} s")
print(f"Cropped duration: {conditioned.duration} s")
Why crop? Digital filters introduce artifacts at the edges of the time series. These artifacts can cause false triggers in matched filtering.
Calculate the PSD needed for matched filtering:
from pycbc.psd import interpolate, inverse_spectrum_truncation
# Estimate PSD using Welch's method
# seg_len: segment length in seconds (typically 4 seconds)
psd = conditioned.psd(4)
# Interpolate PSD to match data frequency resolution
psd = interpolate(psd, conditioned.delta_f)
# Inverse spectrum truncation for numerical stability
# This limits the effective filter length
psd = inverse_spectrum_truncation(
psd,
int(4 * conditioned.sample_rate),
low_frequency_cutoff=15
)
# Check PSD properties
print(f"PSD length: {len(psd)}")
print(f"PSD delta_f: {psd.delta_f}")
print(f"PSD frequency range: {psd.sample_frequencies[0]:.2f} - {psd.sample_frequencies[-1]:.2f} Hz")
pip install pycbc
Problem: PSD estimation fails with "must contain at least one sample" error
Problem: Filter wraparound artifacts in matched filtering
Problem: Poor SNR due to low-frequency noise