Monitoring snapshots — `record_*_snapshot.py`¶

Code: record_confidence_snapshot.py, record_prediction_coverage_snapshot.py, record_bot_spike_counts_snapshot.py (all in project root). Last validated: 2026-05-21

What they do¶

Three small Python scripts that run once per hourly cycle (after pe_update.py, before upload_backend.sh) and write one row-set per iteration to history tables on isog. They never modify the hub — they're pure observability emitters.

All three share the same patterns:

Scatter-gather across all 200 shards × 256 ranges of keywords.keywords_data_local
Idempotent DDL via CREATE TABLE IF NOT EXISTS on first run
3-retry insert with exponential back-off; on full failure the snapshot is skipped (history just misses one row for that cycle)
Dated logs at /home/data/{confidence_snapshot_logs,prediction_coverage_snapshot_logs,bot_spike_counts_logs}/

`record_confidence_snapshot.py`¶

Bins approved keywords by confidence score (0–100) into 10 buckets of width 10, computes counts and sum-of-confidence per (scope × bucket), inserts 110 rows per run (1 timestamp × 11 scopes [global + top-10 countries] × 10 buckets) into keywords_volume.confidence_distribution_history.

Anchor	Constant	Note
`confidence-snapshot-top-countries`	`TOP_N_COUNTRIES = 10`	Different from bot-spike snapshot's 25 — pick deliberate

Schema:

CREATE TABLE keywords_volume.confidence_distribution_history (
  snapshot_at DateTime,
  scope LowCardinality(String),  -- 'global' or country code
  bucket UInt8,                  -- 0–9 (× 10 = lower edge of the score bucket)
  cnt UInt64,
  sum_conf Float64
) ENGINE = MergeTree
ORDER BY (snapshot_at, scope, bucket)

Consumed by the demo app's api_confidence.py.

`record_prediction_coverage_snapshot.py`¶

Counts approved keywords per country whose processed_volume_trend JSON contains any month key greater than the current YYYYMM — i.e. keywords with a usable 12-month forecast attached. One row per country per run.

No load-bearing constants — uses today.strftime('%Y%m') dynamically. Nothing to anchor here.

Schema:

CREATE TABLE keywords_volume.prediction_coverage_history (
  snapshot_at DateTime,
  country LowCardinality(String),
  cnt UInt64
) ENGINE = MergeTree
ORDER BY (snapshot_at, country)

Consumed by the demo app's api_prediction_coverage.py.

`record_bot_spike_counts_snapshot.py`¶

Two parallel scatter-gather scans:

spiked_kw_count — keywords matching the spike predicate (GSC > SPIKE_MIN_RATIO × median AND GSC > SPIKE_MIN_ABS AND GT/GKP < p75). Same predicate as the /spikes demo page so the snapshot and the page stay in sync.
bot_corrected_kw_count — keywords with non-empty bot_spike_months in processed_volume_trend_meta (i.e. processing.py repaired a bot-driven spike for them).

Aggregated globally + per top-25 country. ~26 rows per run (1 global + 25 countries).

Anchor	Constants	Note
`bot-spike-snapshot-predicate`	`SPIKE_MIN_RATIO = 10`, `SPIKE_MIN_ABS = 50`	Same defaults as `/spikes` page; tight coupling
`bot-spike-snapshot-top-countries`	`TOP_N_COUNTRIES = 25`	Wider than confidence snapshot's 10 — bot patterns are country-localised

Schema:

CREATE TABLE keywords_volume.bot_spike_counts_history (
  snapshot_at DateTime,
  scope LowCardinality(String),
  spiked_kw_count UInt64,
  bot_corrected_kw_count UInt64
) ENGINE = MergeTree
ORDER BY (snapshot_at, scope)

Failure modes (all 3)¶

Symptom	Cause	Behaviour
Scatter-gather shard timeout	CH replica slow	Logged, that shard's contribution missing from the row
Insert into `keywords_volume.*_history` fails	isog write outage	3 retries with 5 s back-off, then skip this cycle's snapshot
Snapshot script raises uncaught exception	Bug, OOM, etc.	`keep_running.sh` has no early-exit — next stages still run; one row missing from history

A few missed snapshots are harmless; the time-series history just has a gap.

Monitoring snapshots — record_*_snapshot.py¶

What they do¶

record_confidence_snapshot.py¶

record_prediction_coverage_snapshot.py¶

record_bot_spike_counts_snapshot.py¶

Failure modes (all 3)¶

See also¶

Monitoring snapshots — `record_*_snapshot.py`¶

`record_confidence_snapshot.py`¶

`record_prediction_coverage_snapshot.py`¶

`record_bot_spike_counts_snapshot.py`¶