Skip to content

processed_volume_trend_meta — confidence subscores

The second of three families inside processed_volume_trend_meta. Holds the 0–100 confidence score that's shown to customers, plus its four subscore breakdown.

Keys

Key Type Meaning
confidence_score int 0–100 The aggregated score shown in the product
confidence_breakdown.coverage int 0–30 How many of the 4 sources have data for this keyword, weighted
confidence_breakdown.agreement int 0–15 How consistent the sources' magnitudes / directions are
confidence_breakdown.freshness int 0–25 How recently each source was refreshed
confidence_breakdown.forecast int 0–30 Observed-fraction + normalised SMAPE of the forecast model

The four subscores sum (weighted-geometrically) to the headline confidence_score.

How it's computed

In processing.py's confidence-score helpers (around L891–L965). The score is a weighted geometric mean of the four subscore fractions, with a floor to prevent any single 0 from collapsing the total to ≈0.

Anchors that govern this:

  • processing.py:KB-ANCHOR:coverage-source-weighting — the 30-pt coverage subscore weights (GSC + 2 trend > GSC + 1 trend > GSC-only). Missing a trend source is penalised harder than the count alone would suggest.
  • processing.py:KB-ANCHOR:freshness-staleness-ladder — the per-source per-month staleness ladders (GT: [(1, 9), (3, 8), (6, 6), (12, 4), (18, 2)]; GKP: [(2, 7), (4, 6), (6, 4), (12, 2)]; GSC: flat 9).
  • processing.py:KB-ANCHOR:confidence-subscore-floor_CW_GEOM_FLOOR = 0.05. The floor that prevents a single zero subscore from killing the total.

Important caveat: stored values may be stale

Per memory project_confidence_meta_is_stale: the stored confidence_score may have been written by an older formula. Don't trust the stored number for any analysis; re-compute via predict_sv_trends if you need current-formula values.

This caveat exists because the confidence-score formula has evolved (the archived _archive/confidence_score.md documents the most-recent changes — removed Pearson, removed divergence penalty, rebalanced weights) and the stored values lag any formula change until the next full re-write of the meta. For most-recent keywords this isn't an issue (they get re-written hourly), but long-tail keywords whose meta hasn't been touched in months may still carry old-formula values.

Edge cases

  • JS-only keywords — coverage subscore caps at _CW_COV_1_TREND_NO_GSC = 4. Confidence is low by design.
  • Forecast subscore for sub-gate keywords_CW_FCST_NO_PREDICT_WITH_DATA = 0 when GT/GKP exist but is_predict_valid=False. Memory project_no_forecast_with_gt_gkp_is_low_conf covers this rule.
  • GSC-fresh-always_CW_FRESH_GSC_PRESENT = 9 regardless of last_update. Memory project_gsc_is_always_fresh covers this: GSC is treated as always-fresh when present because the pipeline guarantees its recency.

Customer-facing surface

The headline confidence_score is uploaded to ES and shown in the product as a quality indicator on each keyword. The subscore breakdown is not displayed directly; it lives in this meta JSON for debugging and internal analysis.

See also

  • Overview
  • processed_volume_trend — the forecast quality fed into the forecast subscore
  • Archive: _archive/confidence_score.md — full formula writeup; Phase 3 will decompose into decision nodes
  • Memory: project_confidence_meta_is_stale, project_gsc_is_always_fresh, project_no_forecast_with_gt_gkp_is_low_conf