processed_volume_trend_meta — confidence subscores¶
The second of three families inside processed_volume_trend_meta. Holds the 0–100 confidence score that's shown to customers, plus its four subscore breakdown.
Keys¶
| Key | Type | Meaning |
|---|---|---|
confidence_score |
int 0–100 |
The aggregated score shown in the product |
confidence_breakdown.coverage |
int 0–30 |
How many of the 4 sources have data for this keyword, weighted |
confidence_breakdown.agreement |
int 0–15 |
How consistent the sources' magnitudes / directions are |
confidence_breakdown.freshness |
int 0–25 |
How recently each source was refreshed |
confidence_breakdown.forecast |
int 0–30 |
Observed-fraction + normalised SMAPE of the forecast model |
The four subscores sum (weighted-geometrically) to the headline confidence_score.
How it's computed¶
In processing.py's confidence-score helpers (around L891–L965). The score is a weighted geometric mean of the four subscore fractions, with a floor to prevent any single 0 from collapsing the total to ≈0.
Anchors that govern this:
processing.py:KB-ANCHOR:coverage-source-weighting— the 30-pt coverage subscore weights (GSC + 2 trend > GSC + 1 trend > GSC-only). Missing a trend source is penalised harder than the count alone would suggest.processing.py:KB-ANCHOR:freshness-staleness-ladder— the per-source per-month staleness ladders (GT: [(1, 9), (3, 8), (6, 6), (12, 4), (18, 2)]; GKP: [(2, 7), (4, 6), (6, 4), (12, 2)]; GSC: flat 9).processing.py:KB-ANCHOR:confidence-subscore-floor—_CW_GEOM_FLOOR = 0.05. The floor that prevents a single zero subscore from killing the total.
Important caveat: stored values may be stale¶
Per memory project_confidence_meta_is_stale: the stored confidence_score may have been written by an older formula. Don't trust the stored number for any analysis; re-compute via predict_sv_trends if you need current-formula values.
This caveat exists because the confidence-score formula has evolved (the archived _archive/confidence_score.md documents the most-recent changes — removed Pearson, removed divergence penalty, rebalanced weights) and the stored values lag any formula change until the next full re-write of the meta. For most-recent keywords this isn't an issue (they get re-written hourly), but long-tail keywords whose meta hasn't been touched in months may still carry old-formula values.
Edge cases¶
- JS-only keywords — coverage subscore caps at
_CW_COV_1_TREND_NO_GSC = 4. Confidence is low by design. - Forecast subscore for sub-gate keywords —
_CW_FCST_NO_PREDICT_WITH_DATA = 0when GT/GKP exist butis_predict_valid=False. Memoryproject_no_forecast_with_gt_gkp_is_low_confcovers this rule. - GSC-fresh-always —
_CW_FRESH_GSC_PRESENT = 9regardless of last_update. Memoryproject_gsc_is_always_freshcovers this: GSC is treated as always-fresh when present because the pipeline guarantees its recency.
Customer-facing surface¶
The headline confidence_score is uploaded to ES and shown in the product as a quality indicator on each keyword. The subscore breakdown is not displayed directly; it lives in this meta JSON for debugging and internal analysis.
See also¶
- Overview
processed_volume_trend— the forecast quality fed into theforecastsubscore- Archive:
_archive/confidence_score.md— full formula writeup; Phase 3 will decompose into decision nodes - Memory:
project_confidence_meta_is_stale,project_gsc_is_always_fresh,project_no_forecast_with_gt_gkp_is_low_conf