Skip to content

Forecast-divergence refresh signal

Third of the five refresh signals. Fires when the blended trend (the model's view of "what's really happening") drifts too far from raw GSC impressions (the ground-truth recent signal). A divergence means GT/GKP-driven smoothing is moving the published curve away from observed GSC — fresh GT/GKP can resolve the conflict.

What it is

Signal name: "divergence". Fires when, across the 3 most-recent months that are present in both the blended series and raw GSC, the relative gap between their averages exceeds 30%.

How it's computed

At processing.py:KB-ANCHOR:refresh-signal-forecast-divergence:

common_months = sorted(set(combined_volume_data_sorted.keys())
                     & set(impressions_dict_raw.keys()))
if len(common_months) >= 3 and processed_keyword_volume >= 100:
    last_3 = common_months[-3:]
    forecast_avg = mean(combined_volume_data_sorted[m] for m in last_3)
    raw_avg      = mean(impressions_dict_raw[m]        for m in last_3)
    divergence = abs(forecast_avg - raw_avg) / raw_avg
    if divergence > 0.3:
        strength = min(divergence, 5.0)
Side Component
GT min(2.0, strength × 3.0)
GKP min(2.5, strength × 4.0)

GKP gets the larger contribution because GKP is the most-likely source of the smoothing pulling the blend away from GSC — refreshing GKP often resolves the divergence.

Why this choice

Empirically, 30% is the noise floor. Below that, blended-vs-raw GSC differences are dominated by GT smoothing of normal month-to-month noise rather than a meaningful disagreement. Above 30%, the gap usually traces to one specific cause: stale GKP holding the blend down on a real surge, a bot spike that GT correctly suppressed, or a seasonal turn the model lagged.

The volume ≥ 100 floor is stricter than the surge signal's ≥ 50 floor: relative-divergence ratios are even more volatile on small-volume keywords than relative-change ratios, so a higher floor avoids garbage-triggered refreshes.

The asymmetric 3.0× / 4.0× multipliers on GT/GKP components (vs the surge signal's 1.0×) reflect that a confirmed divergence is a higher-quality signal than a raw GSC swing — the swing might be GSC noise, but a divergence after blending means something in the pipeline is contradicting something else.

Edge cases

  • Forecasted-only months — when combined_volume_data_sorted contains months past the GSC tail (forecasted), they're naturally excluded from the comparison because the intersection-of-keys filter drops them.
  • Newly added keyword — fewer than 3 overlapping months suppresses the signal even if relative divergence is huge. Real divergence on a young keyword waits for the 3rd month.
  • Brand keywords with high GT smoothing — high-volume brands often see a 5–10% blended-vs-GSC gap as a steady-state because GT compresses brand peaks. The 30% floor leaves that alone.

See also