Problems (by symptom)¶
Cross-index from known weak spots in processing.py to candidate methods in the knowledgebase. Each problem lists ≥2 candidates so you can compare approaches.
P1. Narrow seasonal keywords (Halloween, Super Bowl)¶
Symptom: Forecasts that capture the shape (peak at the right month) get rejected by SMAPE because they're off in magnitude.
Current handling: Seasonal alt-gate — if SMAPE fails but Pearson r > 0.5 between forecast and held-out year, accept the forecast anyway (processing.py:2054).
Why this is a stopgap: Correlation alone doesn't capture peak alignment or amplitude error well; the threshold (0.5) is unprincipled.
Candidates:
- DTW (Dynamic Time Warping)
- Why SMAPE fails on narrow seasonals
- MASE
- Prophet holiday/event regressors
- Fourier feature regression
P2. Short history (<24 months) — Tier 1/2 keywords¶
Symptom: AutoCES and Holt-Winters need ≥24mo for reliable seasonality; we have many keywords with 6-23mo of data.
Current handling: calculate_hybrid_growth falls back to MoM (Tier 1, <6mo) or rolling-window + change-point (Tier 2, 6-23mo) heuristics (processing.py:1247).
Why this is a stopgap: Pure heuristics; no statistical confidence; can't extrapolate seasonality at all.
Candidates:
P3. Sparse / bursty keywords (event-driven, episodic)¶
Symptom: Series with many zero-months and a few large peaks (e.g., sports leagues out-of-season, viral events) have unstable growth rates.
Current handling: _is_spiky_series heuristic suppresses growth tags (processing.py:1180, 1310).
Why this is a stopgap: Suppression hides volatility but doesn't forecast it; we still report a (likely wrong) volume.
Candidates:
P4. GT collapse / structural breaks¶
Symptom: Google Trends series collapses to zero (signal dropoff, normalization change) or has a sudden permanent shift. Forecast over-extrapolates.
Current handling: Ad-hoc clipping — when GT peaks then zeros, average is clipped to the last month (processing.py:1775-1781).
Why this is a stopgap: Doesn't distinguish a data artifact from a real change.
Candidates:
- Bayesian online change-point detection
- PELT
- Prophet changepoints
- Level shift vs transient spike classification
P5. Multi-source blending = max(JS, GSC)¶
Symptom: All sources are noisy and partial; max-pick ignores source quality, recency, and the case where two sources disagree.
Current handling: max(JS, GSC) per month (processing.py:1763), with quality gates for GT (processing.py:1679-1735).
Why this is a stopgap: Doesn't propagate uncertainty; doesn't down-weight low-confidence sources.
Candidates:
- Kalman filter / Unobserved Components Model (UCM)
- Bayesian sensor fusion
- Bates-Granger optimal weighting
- MinT hierarchical reconciliation
P6. Long-horizon decay = 0.85^h × historical_min¶
Symptom: Forecasts decay to a floor with a hardcoded geometric rate; not adapted to keyword volatility.
Current handling: decay_floor = 0.85^months_ahead × historical_min (processing.py:2100).
Why this is a stopgap: Same dampening for all keywords; ignores actual volatility/drift; no statistical motivation for 0.85.
Candidates:
- Damped trend (Gardner-McKenzie)
- ETS dampening parameter
- Mean-reversion priors
- Regularized long-horizon forecasts
P7. Confidence score is a point estimate¶
Symptom: We publish a single confidence value per keyword but no calibrated forecast interval. Downstream tools can't reason about uncertainty.
Current handling: Weighted geometric mean of 4 subscores: coverage, agreement, freshness, forecast reliance (processing.py:1041).
Why this is a stopgap: Score is heuristic, not probabilistic; can't be interpreted as a probability/quantile.
Candidates:
- Conformal prediction
- ETS/ARIMA prediction intervals
- Quantile regression forecasts
- Probabilistic metrics (CRPS, pinball loss)
P8. Modern ML / foundation models — should we pilot?¶
Symptom: Not strictly a "problem," but the question: do modern ML / foundation models add value over our statistical ensemble? Current handling: None — pure statistical ensemble.
Candidates: