Skip to content

Recovery window gate (cross-source upstream masking)

The second condition that gates the cross-source upstream bot masking path. Distinguishes a bot incursion from a real regime change: bots are transient — the traffic spikes and then drops back to baseline within a few months. Real organic growth doesn't collapse on that timescale.

Combined with the bot-like threshold, this defines the precise shape we mask: an extreme peak (≥ 500× neighbors) that also recovers (≤ 0.2× the peak within 4 months).

What it is

def _has_recovery(run):
    vals = run['_vals']
    peak_idx, peak_val = run['peak_idx'], run['peak_value']
    for k in range(1, recovery_window + 1):     # k=1..4
        idx = peak_idx + k
        if idx >= len(vals):
            return False
        if vals[idx] <= recovery_ratio * peak_val:   # recovery_ratio = 0.2
            return True
    return False

Per source run, looks at the 4 months immediately following the peak. The run "shows recovery" if any of those months drops to ≤ 20% of the peak value. If no month in the window drops that far, no recovery — and _is_bot_like may still be true, but without recovery the masking doesn't fire.

How it's computed

At processing.py:KB-ANCHOR:recovery-window-gate (inside detect_transient_cross_source_spikes, nested function). Parameters come from detect_transient_cross_source_spikes()'s defaults: recovery_window=4, recovery_ratio=0.2.

The full cross-source mask predicate (lines 670–683): 1. ≥ 2 sources have spike runs overlapping the same months 2. None of the participating runs is seasonal (same calendar month elevated in prior years) 3. At least one participating run shows recovery (this gate) 4. At least one participating run is bot-like (bot-like threshold)

All four must hold.

Why this choice

Bot incursions are transient by definition. Real organic growth doesn't drop to 20% of peak within 4 months — that pattern is characteristic of scripted traffic that runs for a brief period and then stops (script gets killed, IP gets blocked, attacker moves on). The 4-month window is short enough to distinguish bots (transient) from sustained surges (real organic), and 0.2× is the "clearly back to baseline or below" threshold.

If we required recovery to be observed across the full 4 months, slow-decaying spikes would slip through. The "any month within the window" check (for k in range(1, recovery_window + 1)) is permissive in the right direction: a sharp one-month drop counts, and so does a gradual decline that hits 0.2× at month 4.

The asymmetry with bot-like detection matters: bot-like-threshold defers when post-window data is insufficient (returns False, "we don't know yet"). Recovery, by contrast, returns False decisively when the window can't even be observed (peak is too recent) — which makes sense, because we can't yet conclude transience either way, and the bot-like gate's deferral will mean the mask doesn't fire regardless.

Edge cases

  • Peak is in the most recent monthpeak_idx + 1 >= len(vals), so the loop immediately returns False. The spike is too recent to assess; masking is deferred until next iteration.
  • Multiple short spikes in a row — each is its own "run" with its own peak; each gets evaluated independently. A cluster of 1-month spikes with 1-month dips in between can all show recovery (the dip is ≤ 0.2× of each peak).
  • Genuinely persistent regime change — the run's vals stay near peak for ≥ 4 months; this gate returns False. Combined with the other 3 conditions failing on different grounds, the cross-source masking won't fire, and the values stay intact through blending.

See also

  • Bot-like threshold — the companion condition (extreme peak ratio)
  • GSC spike factor — the parallel in-blend path
  • _is_seasonal_spike() — the third gate in detect_transient_cross_source_spikes (skips spikes that recur in prior calendar years)