Seasonality¶

Methods for detecting, decomposing, and modeling seasonal patterns in monthly keyword volume series — alternatives and complements to the current single-threshold ACF-lag-12 detector in detect_seasonality().

title: STL decomposition tags: [seasonality, decomposition] applies_to: [tier_3] data_needs: "≥2 full seasonal cycles (24+ monthly observations); equally spaced timestamps" status: candidate

STL decomposition¶

Source: Cleveland, Cleveland, McRae & Terpenning 1990; Hyndman & Athanasopoulos, Forecasting: Principles and Practice (fpp3) ch. 3.6 Link: https://otexts.com/fpp3/stl.html Retrieved: 2026-05-15

What it is: Seasonal-Trend decomposition using LOESS. Splits a series into trend, seasonal, and remainder components via iterated locally-weighted regression. Unlike classical decomposition, STL handles any seasonal period, allows the seasonal component to drift slowly over time, and is robust to outliers (with robust=True). The smoothness of trend vs seasonal is controlled by separate window parameters.

When to use: - You want a clean trend/seasonal split for diagnostic plots or feature engineering. - The seasonal pattern evolves slowly across years (e.g., a holiday peak that grows year-over-year). - A few outliers shouldn't dominate the seasonal estimate.

Fit for our model: - ✅ Drop-in upgrade for detect_seasonality() at processing.py:1130: compute var(remainder) / var(seasonal + remainder) for an objective seasonal-strength score (see Seasonal strength scoring) instead of an ACF threshold. - ✅ The trend component is a more stable input to yoy_growth() at processing.py:1147 than the raw series. - ⚠ Requires ≥2 cycles (24mo); won't help Tier 1/2 keywords — but those use the heuristic growth path anyway (processing.py:1247). - 🔧 statsmodels.tsa.seasonal.STL(period=12, robust=True).fit().

title: MSTL (Multiple Seasonal-Trend decomposition) tags: [seasonality, decomposition, multiple-seasonality] applies_to: [tier_3] data_needs: "Multiple seasonal cycles; typically used with sub-daily/daily data, but applies to any multi-period series" status: candidate

MSTL (Multiple Seasonal-Trend decomposition)¶

Source: Bandara, Hyndman & Bergmeir 2021, MSTL: A Seasonal-Trend Decomposition Algorithm for Time Series with Multiple Seasonal Patterns Link: https://nixtlaverse.nixtla.io/statsforecast/docs/models/mstl.html Retrieved: 2026-05-15

What it is: Extends STL to multiple seasonal periods by iteratively fitting one STL per seasonality and subtracting. Forecasts the trend with a non-seasonal model (AutoARIMA, ETS, etc.) and each seasonality with a SeasonalNaive. Faster and often more accurate than Prophet or TBATS on multi-seasonal series.

When to use: - Series has two or more clear periodicities (e.g., weekly + yearly, intraday + daily). - You want a single decomposition framework that scales to several seasonalities without TBATS's compute cost.

Fit for our model: - ⚠ Monthly volume is single-seasonal (annual only) — MSTL gives no benefit over STL here. - ✅ Would become relevant if we ever moved to weekly aggregation: weekly-of-year (52) + monthly-of-year (12) could matter for some events. - 🔧 statsforecast.models.MSTL(season_length=[12, ...], trend_forecaster=AutoARIMA()).

title: X-13ARIMA-SEATS tags: [seasonality, decomposition, official-statistics] applies_to: [tier_3] data_needs: "Monthly or quarterly data with ≥3 years history; needs X-13 binary installed" status: candidate

X-13ARIMA-SEATS¶

Source: US Census Bureau Link: https://www.statsmodels.org/dev/generated/statsmodels.tsa.x13.x13_arima_analysis.html Retrieved: 2026-05-15

What it is: Industry-standard seasonal adjustment program from the US Census Bureau. Combines RegARIMA pre-adjustment (for outliers, calendar effects, trading-day, Easter) with either X-11 or SEATS decomposition for the seasonal/trend split. Provides diagnostics (M-stats, F-tests) for whether seasonality is statistically significant.

When to use: - Official statistics / regulatory contexts where a documented, audited seasonal-adjustment method is required. - You need explicit modeling of trading-day or holiday calendar effects.

Fit for our model: - ⚠ Heavyweight: requires installing the X-13 binary and writing per-series spec files; subprocess invocation is slow for our ~billions of keywords. - ⚠ Designed for ~50 macro-economic series, not for high-throughput batch forecasting. - ✅ Its built-in F-test for seasonality could inspire a better gate than the ACF-threshold in detect_seasonality() at processing.py:1130. - 🔧 statsmodels.tsa.x13.x13_arima_analysis(series) (binary must be on PATH).

title: Prophet seasonality and holidays tags: [seasonality, additive-model, holidays] applies_to: [tier_2, tier_3] data_needs: "≥1 cycle preferred; holiday calendars are optional but useful" status: candidate

Prophet seasonality and holidays¶

Source: Taylor & Letham 2018, Forecasting at Scale Link: https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html Retrieved: 2026-05-15

What it is: Additive decomposable model: y(t) = trend + seasonality + holidays + noise. Seasonality is fit via partial Fourier series (configurable order per period). Holidays are user-supplied date lists with adjustable windows; each holiday gets its own learned effect. Robust to missing data and shift changes by design.

When to use: - The series has known calendar events whose magnitude you want to estimate (Black Friday, Ramadan, Super Bowl). - You have <2 full cycles but enough data for at least one peak to inform the prior. - You need a forecast that's auditable (each effect can be plotted separately).

Fit for our model: - ✅ Directly addresses P1 in problems.md (narrow-seasonal SMAPE failures at processing.py:2054): a Halloween regressor with a 1-2 week window captures the peak even if magnitude is wrong. - ⚠ Default Prophet is slower than StatsForecast's AutoCES/HW ensemble at processing.py:1984 (Stan backend); batch-scoring billions of series is non-trivial. - ⚠ Holiday list maintenance is real work — we'd need a curated calendar per locale. - 🔧 prophet.Prophet(yearly_seasonality=True, holidays=holidays_df) or NeuralProphet for faster torch backend.

title: Fourier feature regression tags: [seasonality, regression, features] applies_to: [tier_2, tier_3] data_needs: "≥1 cycle; works with short history because K is tunable" status: candidate

Fourier feature regression¶

Source: Hyndman & Athanasopoulos, fpp3 ch. 7.4 (useful predictors) and ch. 10.5 (dynamic harmonic regression) Link: https://otexts.com/fpp3/dhr.html Retrieved: 2026-05-15

What it is: Encode seasonality as K pairs of sin/cos terms — sin(2πkt/m), cos(2πkt/m) for k = 1..K — and regress the series on them (typically with ARMA errors → "dynamic harmonic regression"). K controls smoothness: K=1 is a pure annual sinusoid; K=6 for monthly data is the maximum (Nyquist). Cheaper and more flexible than seasonal dummies for long periods.

When to use: - Short history where seasonal dummies would over-fit (one dummy per month needs 12+ observations per dummy). - You want a smooth seasonal that interpolates across missing months. - Multiple seasonalities can be summed (e.g., yearly K=4 + half-yearly K=2).

Fit for our model: - ✅ Could provide a parametric seasonal forecast for Tier 2 keywords (6-23mo) where AutoCES/HW need ≥2 cycles. Slots into the fallback path around processing.py:1247. - ✅ Cheap to fit (just OLS on K terms); easy to batch. - ⚠ Picks the wrong shape for narrow peaks (Halloween) with low K; high K starts to over-fit. Choose K via AICc. - 🔧 statsforecast.models.AutoARIMA(season_length=12) accepts xreg Fourier features, or sktime.forecasting.fbprophet.Prophet exposes the same idea.

title: Multiple seasonalities (weekly + yearly etc.) tags: [seasonality, multiple-seasonality] applies_to: [tier_3] data_needs: "High-frequency data (daily/hourly) with periods nested at different scales" status: candidate

Multiple seasonalities (weekly + yearly etc.)¶

Source: Hyndman & Athanasopoulos, fpp3 ch. 13.1 Link: https://otexts.com/fpp3/weekly.html Retrieved: 2026-05-15

What it is: Framework for series with several nested seasonalities — e.g., daily data exhibits both weekly (period 7) and yearly (period 365.25) patterns. Three common approaches: (a) MSTL iterated decomposition, (b) sum of Fourier terms at different periods via dynamic harmonic regression, or (c) state-space models like TBATS that handle non-integer periods natively.

When to use: - Daily or sub-daily data with both intra-week and intra-year patterns. - Series with non-integer periods (365.25 days/year, 7.0 days/week with leap years).

Fit for our model: - ⚠ Our monthly aggregation eliminates all intra-month seasonality (weekday effects, intra-week traffic). At monthly granularity we have a single period of 12. - ✅ Worth bookmarking if we ever publish weekly or daily forecasts (e.g., for short-term spike detection at processing.py:1180). - 🔧 MSTL is the easiest entry; TBATS for non-integer periods.

title: Seasonal strength scoring tags: [seasonality, feature-engineering, detection] applies_to: [tier_2, tier_3] data_needs: "Decomposable series (≥1 full cycle); works as a continuous score, no threshold required" status: candidate

Seasonal strength scoring¶

Source: Wang, Smith & Hyndman 2006, Characteristic-based clustering for time series data; Hyndman & Athanasopoulos, fpp3 §4.3 Link: https://otexts.com/fpp3/stlfeatures.html Retrieved: 2026-05-15

What it is: Reduce a series to a small set of summary features. The seasonal-strength feature is F_S = max(0, 1 - Var(R_t) / Var(S_t + R_t)) where S_t, R_t come from an STL decomposition. Equals 0 when the seasonal component explains no variance beyond the remainder; approaches 1 when seasonality dominates. Same recipe gives trend strength via the trend component. Companion feature seasonal_peak_year identifies which month carries the largest seasonal value.

When to use: - You need a continuous, interpretable seasonality score across many series for ranking, filtering, or feature input to a downstream classifier. - You want a more principled replacement for ad-hoc thresholds on ACF, FFT magnitude, or coefficient of variation.

Fit for our model: - ✅ Direct replacement candidate for the ACF-lag-12 threshold at processing.py:1130: F_S > 0.3 (a common rule of thumb) is more interpretable than "ACF at lag 12 > 0.5" and is robust to non-stationarity. - ✅ Pair with seasonal_peak_year to enable peak-aware gates in the alt-gate at processing.py:2054 (e.g., "did the forecast peak land in the right month?" — see DTW). - ✅ Cheap: one STL fit per series, reuse decomposition for forecasting. - 🔧 statsmodels.tsa.seasonal.STL for the decomposition; compute F_S directly from .seasonal and .resid. R users have feasts::feat_stl().

title: TBATS tags: [seasonality, state-space, multiple-seasonality, non-integer-period] applies_to: [tier_3] data_needs: "≥2 full cycles; computational cost grows with number of seasonalities" status: candidate

TBATS¶

Source: De Livera, Hyndman & Snyder 2011, Forecasting time series with complex seasonal patterns using exponential smoothing Link: https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.tbats.TBATS.html Retrieved: 2026-05-15

What it is: Exponential smoothing state-space model with Trigonometric seasonality (via Fourier-like terms), Box-Cox transformation, ARMA errors, and Trend with damping plus Seasonal components. Handles multiple, possibly non-integer, periods (52.18 weeks/year etc.) within one ETS framework. Damping is built-in. Each seasonality uses its own Fourier order K, fitted automatically.

When to use: - Multi-seasonal series where one of the periods isn't an integer. - You want a unified ETS-style point-forecast model rather than chaining a decomposition with a separate forecaster.

Fit for our model: - ⚠ Single-seasonal monthly data already handled fine by HW additive/multiplicative in our ensemble at processing.py:1984 — TBATS gives no new capability there. - ⚠ TBATS is noticeably slower than HW; would need benchmarking on our shard scale. - ✅ Could replace the dual HW models in the ensemble with a single auto-damped TBATS if the variance-pattern detection (additive vs multiplicative seasonality) proves better. Worth a notebook spike on a sample. - 🔧 tbats.TBATS(seasonal_periods=[12]) or sktime.forecasting.tbats.TBATS.