Change Points and Spikes¶

Methods for detecting structural breaks, level shifts, and transient spikes — relevant to GT collapses (processing.py:1775-1781), AI-mode launches that suddenly shift traffic, and distinguishing permanent level shifts post-Google-update from one-off viral spikes.

title: Bayesian online change-point detection (BOCPD) tags: [change-points, online, bayesian, structural-breaks] applies_to: [tier_2, tier_3] data_needs: "Sequential observations; assumes exponential-family likelihood with conjugate prior (e.g., Normal-Gamma)." status: candidate

Bayesian online change-point detection (BOCPD)¶

Source: Adams, R. P. & MacKay, D. J. C., "Bayesian Online Changepoint Detection" (2007) Link: https://arxiv.org/abs/0710.3742 Retrieved: 2026-05-15

What it is: A streaming Bayesian algorithm that maintains, at every time step, a posterior distribution over the run length — how many steps have passed since the last change point. The run-length posterior is updated recursively as each new observation arrives by combining a hazard prior (probability of a change) with the predictive likelihood of the new point under the current segment. A change point is declared when the posterior mass collapses to run-length 0. Conjugate priors (e.g., Normal-Gamma for unknown mean and variance) make the updates closed-form.

When to use: - Online / streaming settings where you must decide now if a change just happened. - You want a calibrated probability of change rather than a binary flag. - Series with locally stationary segments and a roughly known noise model.

Fit for our model: - ✅ Could replace the ad-hoc GT clipping at processing.py:1775-1781 with a principled "did a structural break just happen?" probability — and downstream we could weight GT lower for months after a high-confidence change. - ✅ Run-length posterior is a natural input to compute_confidence_score() (processing.py:1041) for a "structural stability" subscore. - ⚠ We run in batch (per-shard, per-month), not streaming — most of the online benefit is lost. Use the offline batch form, or just iterate it across the history. - ⚠ Conjugate-prior version assumes Gaussian-like residuals; GT-on-[0,100] and zero-heavy GSC need a non-Gaussian or log-transformed treatment. - 🔧 Python libs: bayesian_changepoint_detection (hildensia), bocd (BayesOnBikes), or Meta's kats.detectors.bocpd. Kats is closest to production-grade.

title: PELT (Pruned Exact Linear Time) tags: [change-points, offline, exact, multiple-changepoints] applies_to: [tier_2, tier_3] data_needs: "Full historical series available offline; cost function (l1, l2, rbf) appropriate to the noise." status: candidate

PELT (Pruned Exact Linear Time)¶

Source: Killick, R., Fearnhead, P. & Eckley, I. A., "Optimal Detection of Changepoints with a Linear Computational Cost" (JASA, 2012) Link: https://arxiv.org/abs/1101.1438 Retrieved: 2026-05-15

What it is: An offline algorithm that finds the segmentation minimising the sum of segment cost + penalty × number of change points. PELT uses dynamic programming with a pruning step that discards candidate change points which can never become optimal, giving exact (not approximate) results in average-case linear time O(n). The penalty parameter trades off fit vs number of breaks; common choices are BIC (log n) or a tuned constant.

When to use: - Offline batch processing of a full series (matches our pipeline). - You want the exact optimal segmentation, not a greedy approximation. - Multiple change points expected — PELT handles them jointly rather than recursively.

Fit for our model: - ✅ Natural fit for processing.py:1654-1735 quality gates — segment the GT series, then evaluate each segment for the existing "isolated peak" / "zero-heavy 2014-2018" criteria rather than applying global heuristics. - ✅ Could replace the GT collapse hack at processing.py:1775-1781: if the final segment is a long zero-flat segment after a high-amplitude one, classify as collapse rather than averaging. - ✅ Could feed into _is_spiky_series (processing.py:1180) — a segment with a single short high-variance break is a spike, not a regime. - ⚠ Penalty tuning is crucial — under-penalize and you over-segment; the BIC default is reasonable but worth grid-searching on a labelled set. - 🔧 Python: ruptures.Pelt(model="l2").fit(series).predict(pen=…). Use model="rbf" for non-parametric robustness.

title: Binary segmentation tags: [change-points, offline, approximate, fast] applies_to: [tier_2, tier_3] data_needs: "Full historical series; same cost-function choices as PELT." status: candidate

Binary segmentation¶

Source: Classical recursive splitting (Scott & Knott 1974; popularised by Sen & Srivastava 1975). Implementation reference: ruptures docs. Link: https://centre-borelli.github.io/ruptures-docs/user-guide/detection/binseg/ Retrieved: 2026-05-15

What it is: A greedy recursive algorithm. Find the single best change point in the whole series (the split that minimises total cost), then recurse on the two halves. Stop when a penalty/threshold says additional splits aren't worth it. Much faster than PELT — O(n log n) — but only an approximation to the optimal segmentation, because the first split is chosen without knowing the later ones (and a poor early split contaminates downstream segments).

When to use: - You need fast, "good enough" segmentation across millions of series. - Initial exploration / heuristic baseline before investing in PELT. - The signal is dominated by a single dominant break.

Fit for our model: - ✅ Cheap to run inside the per-shard loop in processing.py — fast enough to add to every keyword before the existing GT quality gates (processing.py:1654-1735). - ✅ Wild Binary Segmentation (WBS) variant is more robust to multiple breaks and similarly fast. - ⚠ Greedy nature can miss multiple closely-spaced breaks; if you need them all, use PELT instead. - ⚠ Approximate, so don't use it as the only signal for confidence-affecting decisions — pair with a stability check. - 🔧 Python: ruptures.Binseg(model="l2").fit(series).predict(n_bkps=…) — supply either n_bkps or pen.

title: Prophet automatic changepoint detection tags: [change-points, prophet, sparse-prior, trend-changes] applies_to: [tier_2, tier_3] data_needs: "≥ ~24 months for the sparse prior to identify changes reliably; daily/weekly/monthly all work." status: candidate

Prophet automatic changepoint detection¶

Source: Taylor, S. J. & Letham, B., "Forecasting at Scale" (2018) — describes Prophet's changepoint mechanism. Link: https://facebook.github.io/prophet/docs/trend_changepoints.html Retrieved: 2026-05-15

What it is: Prophet specifies a large set of potential changepoints (default: 25 uniformly placed in the first 80% of the history) and places a Laplace (double-exponential) prior on the slope changes at each. The Laplace prior is equivalent to L1 regularization, which is sparsity-inducing: most slope changes shrink to zero and only a handful are kept. The changepoint_prior_scale parameter (default 0.05) tunes flexibility — large values admit more changes (risk: overfitting), small values keep the trend smoother.

When to use: - You already use Prophet (or are open to adding it) and want changepoint detection "for free" inside the same forecast. - You want a smooth piecewise-linear trend that adapts to gradual regime shifts rather than declaring discrete breaks. - The series has enough history (>1-2 years) for the sparse prior to localise changes.

Fit for our model: - ✅ The sparse-prior framing is a clean replacement for the GT clipping rule at processing.py:1775-1781 — the absence of recent admitted changepoints means "no regime shift, treat last month as noise". - ✅ changepoint_prior_scale is one tuneable knob; we could pick it per-tier or via cross-validation on the held-out year already used at processing.py:1984 (StatsForecast ensemble). - ⚠ Adds Prophet as a dependency alongside StatsForecast; coexistence is fine but increases CI / install surface. - ⚠ Prophet trends are piecewise-linear, which doesn't model multiplicative growth as naturally as ETS/AutoCES — pair it with multiplicative_seasonality if used. - 🔧 Python: prophet.Prophet(changepoint_prior_scale=0.05).fit(df); introspect m.changepoints and m.params['delta'] for the admitted slope changes.

title: CUSUM (cumulative sum) tags: [change-points, online, mean-shift, simple] applies_to: [tier_1, tier_2, tier_3] data_needs: "An estimate of the in-control mean and standard deviation; works on any scalar series." status: candidate

CUSUM (cumulative sum)¶

Source: Page, E. S., "Continuous Inspection Schemes" (Biometrika, 1954) Link: https://en.wikipedia.org/wiki/CUSUM Retrieved: 2026-05-15

What it is: A classic sequential test for a shift in the mean of a process. Maintain two running sums — one accumulating positive deviations above a target (S+), one accumulating negative deviations below (S-) — and reset each to zero when it goes the "wrong" way. When either sum exceeds a control limit h, declare a change. Two knobs: the slack k (half the size of the smallest shift you want to detect, in standard deviations) and the limit h. Robust, transparent, easy to audit.

When to use: - Cheap online monitoring at scale: works in O(n) with a tiny constant. - The variable is roughly scalar and the change you care about is a mean shift (not variance or shape change). - You want a result you can hand-debug — every alarm has a clear (S, h) story.

Fit for our model: - ✅ Excellent for monitoring our blended monthly volume series (processing.py:1763) for sudden drops — e.g., to flag AI-mode launches or GSC outages after the fact. - ✅ Pairs well with _is_spiky_series (processing.py:1180) — CUSUM-detected upward break + quick reset = transient spike; sustained = level shift (see Level shift vs transient spike classification). - ⚠ Needs reasonable in-control mean/std; on highly seasonal series, run CUSUM on deseasonalised residuals (STL.resid) instead of the raw level. - ⚠ Sensitive to outliers — a single spike can trip the alarm. - 🔧 Python: kats.detectors.cusum_detection.CUSUMDetector, or a 20-line custom loop. Statsmodels offers CUSUM-of-recursive-residuals for regression stability rather than series shift.

title: Level shift vs transient spike classification tags: [change-points, post-processing, classification] applies_to: [tier_2, tier_3] data_needs: "A detected change point + at least k months of post-shift observations (k ≥ 3 recommended)." status: candidate

Level shift vs transient spike classification¶

Source: Standard post-processing after change-point detection; STL residual analysis. See Cleveland et al., "STL: A seasonal-trend decomposition procedure based on loess" (1990) and Hyndman & Athanasopoulos, "Forecasting: Principles and Practice" §3.4. Link: https://otexts.com/fpp3/decomposition.html Retrieved: 2026-05-15

What it is: Once a change point is flagged (by PELT, BOCPD, CUSUM, etc.), classify it as: - Level shift — the new mean persists for k+ months after the break (a regime change: algorithm update, AI-mode launch, business model change). - Transient spike — the deviation reverts within ≤k months (viral event, one-off promotion, news cycle).

A simple rule: compute STL residuals, then measure |mean(post)| / median(|pre-residuals|); if it stays > threshold for k+ months it's a level shift, else a spike. More principled: fit a piecewise model with and without a permanent step term and pick by AIC/BIC.

When to use: - You already have candidate change points and need to decide how to handle them. - The downstream treatment is different — level shifts should reset baselines, spikes should be masked from training data.

Fit for our model: - ✅ Directly addresses what the clipping rule at processing.py:1775-1781 is trying to do: distinguish "GT collapsed to zero permanently" (level shift → trust the zero) from "GT had a spike, then back to noise" (spike → ignore the spike). - ✅ Should feed into the GT quality gates at processing.py:1654-1735 — a "post-shift" segment should not be evaluated against the same isolated-peak criteria as a clean stationary one. - ✅ Pairs naturally with the prior-trend shadow / frozen months logic at processing.py:2122 — if a level shift is detected, don't shadow the prior trend across it. - ⚠ Needs at least 3-6 months of post-shift data to be confident; recent breaks are ambiguous (call them "tentative"). - 🔧 Python: statsmodels.tsa.seasonal.STL for residuals; custom logic for the classification. sktime's Detrender + STLDetector covers the residual pipeline.

title: E-Divisive (nonparametric multiple changepoint) tags: [change-points, offline, nonparametric, multivariate] applies_to: [tier_2, tier_3] data_needs: "Univariate or multivariate series; no distributional assumptions." status: candidate

E-Divisive (nonparametric multiple changepoint)¶

Source: Matteson, D. S. & James, N. A., "A nonparametric approach for multiple change point analysis of multivariate data" (JASA, 2014) Link: https://arxiv.org/abs/1306.4933 Retrieved: 2026-05-15

What it is: A hierarchical divisive algorithm that recursively splits the series by maximising an energy statistic — a U-statistic measuring how different two empirical distributions are, with no distributional assumptions. At each step it finds the best single break (any difference in the full distribution, not just the mean), confirms its significance via a permutation test, and recurses. Computational cost O(k·T²) for k breaks and T observations; slower than PELT but assumption-free.

When to use: - You don't want to commit to a noise model (Gaussian, Poisson…); important for our heterogeneous mix of GT/GSC/GKP. - The change you care about is in the full distribution, not just the mean — e.g., variance shifts, distribution-shape changes. - Multivariate series — you can run it on (GT, GSC, JS, GKP) jointly to detect joint regime shifts.

Fit for our model: - ✅ Multivariate variant is interesting for the multi-source blend at processing.py:1763 — a joint change across two sources is strong evidence of a real regime shift; a change in only GT is more likely a data artifact. - ✅ Nonparametric strength is useful for processing.py:1654-1735 GT quality gates, where the series shape varies wildly. - ⚠ Slower than PELT — O(T²) per break — likely too slow per-keyword across 200 shards without optimisation; use it as a validation method on a sample, not for production scoring. - ⚠ Permutation-test threshold needs calibration; the default α=0.05 can be aggressive on long series. - 🔧 R: ecp::e.divisive() is the canonical implementation. Python: there's no first-class equivalent; ruptures provides a KernelCPD with RBF kernel which is conceptually close and offline.

title: Bayesian Online Changepoint Detection with Gaussian Process priors tags: [change-points, online, bayesian, gaussian-process, non-stationary] applies_to: [tier_3] data_needs: "Sufficient data per regime to fit a GP (~30+ points); typically slow." status: candidate

Bayesian Online Changepoint Detection with Gaussian Process priors¶

Source: Saatçi, Y., Turner, R. & Rasmussen, C. E., "Gaussian Process Change Point Models" (ICML, 2010) Link: https://www.gatsby.ucl.ac.uk/~turner/Publications/SaatciTurnerRasmussen2010.pdf Retrieved: 2026-05-15

What it is: An extension of BOCPD that replaces the i.i.d.-within-segment assumption with a Gaussian Process prior. Within each segment the GP models temporal correlation (so the segment can have its own trend / seasonality / autocorrelation), and the same run-length posterior tracks change points between segments. Solves a key limitation of vanilla BOCPD: standard BOCPD assumes observations within a segment are independent, which fails for any real time series with structure.

When to use: - You have a non-stationary series where each regime itself has trend/seasonality — exactly our case for popular keywords with autocorrelated month-to-month dynamics. - You can afford the GP fit cost (cubic in segment length without approximations).

Fit for our model: - ✅ Conceptually a perfect match for high-value, long-history tier-3 keywords where the GT series has real structure (not white noise) and we still need to detect regime breaks (e.g., post-Google-update). - ⚠ Computationally heavy — O(n³) per segment without sparse approximations; not realistic for the full 200-shard sweep but feasible for top-N keywords or as a validation tool. - ⚠ More machinery than the team is used to; harder to debug than PELT or CUSUM. - ⚠ Recent improvements (Sellier et al. 2023 — Hilbert-space approximate Student-t process) make this tractable but are research-grade. - 🔧 Python: no off-the-shelf package; build on top of GPflow or GPyTorch. For a lightweight first try, use vanilla BOCPD from bayesian_changepoint_detection and only escalate to the GP variant where simpler models clearly fail.