Simplify crypto hard filters to essential checks only. Trigger when: (1) crypto symbols fail multiple filters, (2) data_quality/spread/trading_status fail for crypto, (3) yfinance data gaps causing false failures.
| Item | Details |
|---|---|
| Date | 2024-12-28 |
| Goal | Stop crypto symbols from failing irrelevant filters |
| Environment | alpaca_trading/selection/filters/hard_filters.py |
| Status | Success |
User reported ALL 18 crypto symbols failing selection with different filter failures:
BTCUSD: failed price (max_price $10k but BTC is $87k)
ETHUSD: failed data_quality (51% zero-volume but 5% max allowed)
SOLUSD: failed trading_status (50% activity but 80% required)
DOGEUSD: failed volume (consistency 28% but 70% required)
Each filter had different asset-type assumptions that didn't apply to crypto.
| Filter | Problem for Crypto | Reality |
|---|---|---|
price | max_price=$10,000 | BTC is $87k+, no upper limit |
data_quality | max 5% zero-volume | yfinance has 50% gaps |
trading_status | 80% activity required | yfinance gaps cause 50% |
spread | 0.5% max spread | Crypto volatility is higher |
volume | 70% consistency | yfinance has ~30% consistency |
Key Insight: These filters were designed to catch BAD ASSETS, but they were flagging BAD DATA from yfinance. When using Alpaca API with quality data, these filters would pass.
Instead of patching each filter, we created a separate code path for crypto that only checks essentials:
def apply_hard_filters(symbol, df, ..., is_crypto=False):
result = HardFilterResult(symbol=symbol, passed=True)
# v2.5.0: For crypto, only check essential filters
# Skip spread/data_quality/trading_status - they just flag yfinance gaps
if is_crypto:
# Essential: Has enough data?
if df is None or len(df) < min_bars:
result.add_result("min_bars", False, {...})
return result
result.add_result("min_bars", True, {"n_bars": len(df)})
# Essential: Price above minimum? (filter dead coins)
current_price = df['close'].iloc[-1] if 'close' in df.columns else 0
passed = current_price >= min_price
result.add_result("price", passed, {...})
# Essential: Has reasonable volume? (relaxed for yfinance gaps)
passed, details = check_volume_filter(df, min_daily_volume_usd, is_crypto=True)
result.add_result("volume", passed, details)
return result # Skip spread, data_quality, trading_status
# Equities: Apply full filter chain
# ... existing code ...
| Filter | Equities | Crypto | Reason |
|---|---|---|---|
min_bars | Check | Check | Essential for training |
price | min/max | min only | No upper limit (BTC $87k+) |
volume | 70% consistency | 30% consistency | yfinance gaps |
spread | Check | SKIP | Flags yfinance volatility |
data_quality | Check | SKIP | Flags yfinance gaps |
trading_status | Check | SKIP | Flags yfinance gaps |
| Attempt | Why it Failed | Lesson Learned |
|---|---|---|
| Patch max_price check for crypto | Still fails data_quality | Whack-a-mole approach |
| Lower data_quality thresholds | Still fails trading_status | Same problem |
| Add is_crypto to each filter | Complex, hard to maintain | Separate code path is cleaner |
| Remove all filters for crypto | No quality control | Keep essential checks |
def check_volume_filter(df, min_daily_volume_usd, min_volume_consistency=0.7, is_crypto=False):
# v2.5.0: Cap consistency requirement for crypto (yfinance has many zero-volume bars)
if is_crypto:
min_volume_consistency = min(min_volume_consistency, 0.30) # Cap at 30%
# ... rest of calculation ...
alpaca_trading/selection/filters/hard_filters.py:
- Line 362-378: New crypto-specific path in apply_hard_filters()
- Line 66-68: Volume consistency cap for crypto
Prefer Alpaca API over filter simplification.
The simplified filters are a fallback for when yfinance must be used. With Alpaca API:
See skill data-source-priority for ensuring Alpaca API is used.
alpaca_trading/selection/filters/hard_filters.py: Filter implementationsdata-source-priority - Ensure quality data sourcesymbol-selection-asset-filters - Asset-type filter patterns