Dampening / decay methods¶

Principled alternatives to the hardcoded 0.85^months_ahead × historical_min long-horizon floor at processing.py:2100 and the frozen-months prior-trend shadow at processing.py:2122.

title: Damped trend (Gardner-McKenzie) tags: [dampening, exponential-smoothing, long-horizon] applies_to: [tier_2, tier_3] data_needs: "≥18-24 observations to estimate trend + φ jointly; works on monthly with 24+ months" status: candidate

Damped trend (Gardner-McKenzie)¶

Source: Gardner & McKenzie (1985), "Forecasting Trends in Time Series," Management Science 31(10):1237-1246 Link: https://pubsonline.informs.org/doi/10.1287/mnsc.31.10.1237 Retrieved: 2026-05-15

What it is: A modification of Holt's linear-trend exponential smoothing that multiplies the trend by a damping parameter φ (typically 0.8 ≤ φ ≤ 0.98) at each forecast step, so the trend's contribution shrinks geometrically with horizon and the forecast converges to a finite asymptote rather than extrapolating linearly forever. Equivalent to an ARIMA(0,1,1)+drift process with autoregressive damping. Methods that include a damped trend have proven so dependable that they are widely considered the strongest "set-and-forget" automatic forecaster in the literature.

When to use: - When unfettered linear-trend extrapolation gives obviously implausible long-horizon values (which is the entire reason processing.py:2100 exists). - When you want the dampening rate learned per-keyword rather than fixed at 0.85 across the whole catalog. - As the default for medium-horizon (12-24 month) forecasts on Tier 3 keywords.

Fit for our model: - ✅ Direct, principled replacement for the decay_floor = 0.85^h × historical_min formula at processing.py:2100. Per-keyword φ replaces the global 0.85; the asymptote is the level component, not a multiple of historical_min. - ✅ Already available via our existing stack: statsforecast.models.AutoETS with damped=True, or as one of the ETS error/trend/seasonality triplets (ETS(A,Ad,A), ETS(M,Md,M), etc.) automatically chosen by AICc. - ✅ Slots into the ensemble at processing.py:1984 — adding the damped-trend ETS variants to the model list gives the SMAPE backtest the option of picking it. - ⚠ φ is typically constrained to 0.8-0.98 to avoid pathological estimates on short series; sub-0.8 estimates are usually a fitting failure. - ⚠ Damped trend does not reach toward zero or historical_min — it converges to the learned asymptote. If the policy intent is "decline toward a floor," combine with mean-reversion priors or asymptotic-mean models. - 🔧 from statsforecast.models import AutoETS; AutoETS(season_length=12, damped=True). See also sktime ExponentialSmoothing with damped_trend=True.

title: ETS dampening parameter (φ) tags: [dampening, ets, parameter] applies_to: [tier_2, tier_3] data_needs: "same as the underlying ETS — ≥18-24 monthly observations for stable estimation" status: candidate

ETS dampening parameter (φ)¶

Source: Hyndman & Athanasopoulos, Forecasting: Principles and Practice (3rd ed.), §8.2 "Methods with trend" Link: https://otexts.com/fpp3/holt.html Retrieved: 2026-05-15

What it is: φ is the single number that controls how aggressively the ETS trend dampens with horizon. φ=1 recovers undamped Holt (the trend persists linearly forever); φ→0 collapses the trend after one step (forecast is flat at the level). The long-run forecast asymptote is level + (trend × φ / (1 − φ)) — a closed-form geometric series. In the fable package and Nixtla's StatsForecast, φ is restricted to (0.8, 0.98) to keep estimation numerically stable.

When to use: - Any time you fit ETS / Holt with damped=True — knowing how φ controls horizon behavior is essential for interpretation. - As a diagnostic on the existing ensemble: if AutoETS keeps picking damped variants with φ≈0.85, that empirically validates (or invalidates) the hardcoded 0.85 floor.

Fit for our model: - ✅ Inspecting the fitted φ distribution across keywords would empirically calibrate (or refute) the global 0.85 used at processing.py:2100. If most keywords fit φ ≈ 0.9, the global 0.85 is too aggressive; if many fit φ ≈ 0.82, it is reasonable. - ✅ Per-keyword φ from AutoETS could be persisted alongside the forecast and used to replace the global geometric floor in the long-horizon path. - ⚠ Without seasonal handling, low φ on a seasonal keyword produces flat forecasts; pair with a seasonal component. - ⚠ φ is not identifiable on very short series — frozen-months logic at processing.py:2122 still needed as a guard. - 🔧 Same AutoETS(damped=True) as above; inspect .model_["damped"] and .model_["phi"] to log per-keyword.

title: Mean-reversion priors tags: [dampening, bayesian, shrinkage] applies_to: [tier_2, tier_3] data_needs: "a credible population mean (per category, per cohort, per country); per-keyword history of any length" status: candidate

Mean-reversion priors¶

Source: Stein's paradox / James-Stein estimator (1961); Efron (2010), Large-Scale Inference, ch. 1-3. Shares the same conceptual core as empirical-Bayes priors and hierarchical pooling on the short history page. Link: https://efron.ckirby.su.domains/other/2010LSIexcerpt.pdf Retrieved: 2026-05-15

What it is: A Bayesian shrinkage policy for long-horizon forecasts: as the horizon grows, the forecast should revert to a learned population mean (cohort average, category mean) rather than continuing the trend or decaying to a hardcoded floor. The shrinkage weight grows with horizon — at h=1 we trust the model entirely; at h=24 we trust the population mean entirely. Closed-form for Normal-Normal: forecast_h = w(h) × population_mean + (1 − w(h)) × model_forecast_h, with w(h) monotonically increasing.

When to use: - Long-horizon (12+ months) forecasts where the per-keyword model has high posterior variance. - When you have meaningful population groupings (category, country, brand-vs-generic) that give an actuarially defensible "this is roughly what keywords like this run at." - As a softer alternative to the hard floor at processing.py:2100.

Fit for our model: - ✅ Principled replacement for 0.85^h × historical_min at processing.py:2100. Instead of decaying toward each keyword's individual minimum, decay toward the cohort mean — which is more informative and less prone to one outlier zero-month anchoring the floor. - ✅ Works on top of any underlying forecaster — bolt on after the StatsForecast ensemble at processing.py:1984. - ✅ Reuses cohort priors from Empirical Bayes work (see empirical Bayes priors). - ⚠ "Population mean" needs to be well-defined; bad cohorts cause systematic long-horizon bias. - ⚠ Schedule of w(h) is a hyperparameter — start with w(h) = 1 − exp(−h/τ) for τ ≈ 12 months and tune. - 🔧 Closed-form NumPy implementation; no new library needed. For the full Bayesian version, pymc or numpyro over the hierarchical model in Hierarchical pooling.

title: Regularized long-horizon forecasts tags: [dampening, regularization, penalty] applies_to: [tier_2, tier_3] data_needs: "≥12 observations; an ML-style forecaster (gradient-boosted, MLP, or RNN) where you can add a loss term" status: candidate

Regularized long-horizon forecasts¶

Source: Common ML-forecasting practice; e.g., Lim et al. (2021) "Temporal Fusion Transformers"; multi-horizon Quantile Regression with horizon-wise penalties; see also sktime forecasting cookbook Link: https://www.sktime.net/en/latest/examples/01_forecasting.html Retrieved: 2026-05-15

What it is: Adds an explicit penalty term to the training/fitting loss that discourages large forecast variation at long horizons — e.g., an L2 penalty on the trend slope Σ_h (ŷ_h − ŷ_{h-1})², or a smoothness penalty on the forecast curve, or a penalty on the magnitude of the slope coefficient itself. Forces the long-horizon shape to be smooth and bounded without imposing a fixed geometric decay.

When to use: - When the underlying forecaster (e.g., a learned GBM/MLP/RNN) is unconstrained and tends to produce spiky long-horizon shapes. - When you have a calibrated reason to prefer smooth long-horizons (most marketing decisions can't act on a spike 18 months out anyway).

Fit for our model: - ⚠ Our current statsforecast ensemble at processing.py:1984 already has implicit horizon dampening (ETS damping, AutoCES's level decay) — explicit regularization adds little on top. - ✅ Would become valuable if we add neural forecasters (NeuralProphet, N-HiTS, TFT) — they need explicit smoothness penalties. - ✅ Could replace the post-hoc geometric floor at processing.py:2100 with a smoother monotone-decay penalty applied to the fitted forecast, but the gain over damped ETS is marginal. - ⚠ Hyperparameter (penalty weight) is hard to choose; cross-validate per cohort. - 🔧 sktime pipelines support custom loss terms; in darts use a custom loss_fn. For statistical models, achieved more naturally via damped trend.

title: Volatility-aware dampening tags: [dampening, adaptive, heteroskedastic] applies_to: [tier_2, tier_3] data_needs: "≥12 observations to estimate keyword-level volatility (rolling CV, residual variance)" status: candidate

Volatility-aware dampening¶

Source: Generalization of Gardner-McKenzie; per-series φ chosen as a function of historical CV / residual variance. Conceptually related to ARCH-family heteroskedastic models (Engle 1982) and to the volatility-targeting literature in finance. Link: https://otexts.com/fpp3/holt.html (background on φ); no canonical paper for the per-series adaptive variant Retrieved: 2026-05-15

What it is: Instead of a single global dampening rate (our 0.85), let the dampening rate be a function of the keyword's historical volatility. High-CV (volatile) keywords get aggressive dampening (φ small, e.g., 0.80); low-CV (stable) keywords get gentle dampening (φ close to 0.98). The intuition: high volatility implies high model uncertainty, which warrants faster reversion to the level. Simple parameterization: φ_i = max(0.8, 1 − k × CV_i) for some k chosen on a validation cohort.

When to use: - When a single dampening parameter clearly under-dampens noisy keywords and over-dampens stable ones — which is the most plausible empirical failure mode of the current global 0.85. - As a stepping stone to fully Bayesian variance-aware forecasting.

Fit for our model: - ✅ Direct, lightweight upgrade over the global 0.85 at processing.py:2100. Per-keyword CV is already implicitly computed for _is_spiky_series (processing.py:1180); reuse it. - ✅ Compatible with damped trend — feed the volatility-derived φ into AutoETS as a fixed parameter instead of letting it optimize. - ⚠ Mapping CV → φ is heuristic and needs validation; the AutoETS-fitted φ on a long-history keyword should be a benchmark. - ⚠ For very short series, CV is itself noisy — fall back to a global default or to a cohort-pooled CV estimate. - 🔧 Pure-NumPy. Compute CV on the residuals of the model (not the raw series — raw CV is dominated by seasonality), then set φ accordingly.

title: Asymptotic-mean models tags: [dampening, asymptote, level-reversion] applies_to: [tier_2, tier_3] data_needs: "≥12 observations; a credible asymptote target (per-keyword median, per-cohort mean, or a fitted level component)" status: candidate

Asymptotic-mean models¶

Source: A general feature of well-behaved exponential smoothing — ETS(A,Ad,) converges to level + trend × φ / (1−φ); AutoCES level component is itself a learned asymptote. Conceptually overlaps with damped trend and mean-reversion priors; cleanly summarized in FPP3 §8.2. Link: https://otexts.com/fpp3/holt.html Retrieved:* 2026-05-15

What it is: Models whose long-horizon forecast deterministically converges to a learned constant (the "asymptote") rather than to zero (geometric decay) or to infinity (linear trend). The asymptote is typically the level component of the model (Holt damped, BSTS local-level, AutoCES) or an externally supplied target (cohort mean, multi-year median). Useful when you have a defensible policy answer to "what does this keyword do in the long run."

When to use: - Whenever you want the long-horizon forecast to be a stable number rather than a decaying or exploding one. - As a contract: "the forecast will plateau at value X, reached approximately by month T."

Fit for our model: - ✅ AutoCES already exposes a level component that is an asymptote estimate — currently unused for the long-horizon floor at processing.py:2100. Could replace historical_min with AutoCES.level_ as the floor target. - ✅ Pairs naturally with the seasonal alt-gate at processing.py:2054 — once the model passes shape validation, the level-asymptote becomes the trustworthy long-horizon anchor. - ⚠ "Asymptote" needs definition per source — for Holt damped it is level + trend × φ/(1−φ); for BSTS it is the level posterior mean. Document which you mean. - ⚠ For genuinely declining keywords, an asymptote ≥ 0 may over-predict — combine with TSB probability decay if obsolescence is plausible. - 🔧 No new library — inspect the model output of the existing AutoCES / AutoETS calls at processing.py:1984 and use the level estimate as the floor anchor.

title: Drift smoothing (random walk with drift, dampened) tags: [dampening, drift, baseline, comparison] applies_to: [tier_2, tier_3] data_needs: "≥6 observations (drift requires at least 2; dampened variant benefits from more)" status: candidate

Drift smoothing (random walk with drift, dampened)¶

Source: Hyndman & Athanasopoulos, Forecasting: Principles and Practice (3rd ed.), §5.2 "Some simple forecasting methods" Link: https://otexts.com/fpp3/simple-methods.html Retrieved: 2026-05-15

What it is: The drift method forecasts the last observed value plus a slope equal to the average per-period change in history (drift = (y_T − y_1) / (T − 1)); the dampened variant multiplies the drift contribution by a φ-style damping so the drift's effect decays with horizon. Sits between naive (no drift) and damped Holt (smoothed level and trend) — useful as the benchmark for any horizon-dampening change.

When to use: - As the benchmark to compare any candidate dampening policy against — drift+damping is the simplest principled alternative to a hardcoded 0.85^h floor. - For very short series where Holt damped is over-parameterized.

Fit for our model: - ✅ Useful comparison point for the current 0.85^h × historical_min formula at processing.py:2100 — log dampened-drift alongside the floor and audit which is closer to held-out months. - ✅ Cheap: statsforecast.models.RandomWalkWithDrift is a one-liner; dampened version is a 10-line custom model. - ⚠ Not a serious candidate for the production path — anything in the ensemble at processing.py:1984 will already beat it. - ⚠ For seasonal keywords, sNaive (see naive baselines) is a better baseline than drift. - 🔧 from statsforecast.models import RandomWalkWithDrift. For the dampened variant, custom Python wrapping the drift × φ^h pattern.