Skip to content

Monitoring snapshots — record_*_snapshot.py

Code: record_confidence_snapshot.py, record_prediction_coverage_snapshot.py, record_bot_spike_counts_snapshot.py (all in project root). Last validated: 2026-05-21

What they do

Three small Python scripts that run once per hourly cycle (after pe_update.py, before upload_backend.sh) and write one row-set per iteration to history tables on isog. They never modify the hub — they're pure observability emitters.

All three share the same patterns:

  • Scatter-gather across all 200 shards × 256 ranges of keywords.keywords_data_local
  • Idempotent DDL via CREATE TABLE IF NOT EXISTS on first run
  • 3-retry insert with exponential back-off; on full failure the snapshot is skipped (history just misses one row for that cycle)
  • Dated logs at /home/data/{confidence_snapshot_logs,prediction_coverage_snapshot_logs,bot_spike_counts_logs}/

record_confidence_snapshot.py

Bins approved keywords by confidence score (0–100) into 10 buckets of width 10, computes counts and sum-of-confidence per (scope × bucket), inserts 110 rows per run (1 timestamp × 11 scopes [global + top-10 countries] × 10 buckets) into keywords_volume.confidence_distribution_history.

Anchor Constant Note
confidence-snapshot-top-countries TOP_N_COUNTRIES = 10 Different from bot-spike snapshot's 25 — pick deliberate

Schema:

CREATE TABLE keywords_volume.confidence_distribution_history (
  snapshot_at DateTime,
  scope LowCardinality(String),  -- 'global' or country code
  bucket UInt8,                  -- 0–9 (× 10 = lower edge of the score bucket)
  cnt UInt64,
  sum_conf Float64
) ENGINE = MergeTree
ORDER BY (snapshot_at, scope, bucket)

Consumed by the demo app's api_confidence.py.

record_prediction_coverage_snapshot.py

Counts approved keywords per country whose processed_volume_trend JSON contains any month key greater than the current YYYYMM — i.e. keywords with a usable 12-month forecast attached. One row per country per run.

No load-bearing constants — uses today.strftime('%Y%m') dynamically. Nothing to anchor here.

Schema:

CREATE TABLE keywords_volume.prediction_coverage_history (
  snapshot_at DateTime,
  country LowCardinality(String),
  cnt UInt64
) ENGINE = MergeTree
ORDER BY (snapshot_at, country)

Consumed by the demo app's api_prediction_coverage.py.

record_bot_spike_counts_snapshot.py

Two parallel scatter-gather scans:

  1. spiked_kw_count — keywords matching the spike predicate (GSC > SPIKE_MIN_RATIO × median AND GSC > SPIKE_MIN_ABS AND GT/GKP < p75). Same predicate as the /spikes demo page so the snapshot and the page stay in sync.
  2. bot_corrected_kw_count — keywords with non-empty bot_spike_months in processed_volume_trend_meta (i.e. processing.py repaired a bot-driven spike for them).

Aggregated globally + per top-25 country. ~26 rows per run (1 global + 25 countries).

Anchor Constants Note
bot-spike-snapshot-predicate SPIKE_MIN_RATIO = 10, SPIKE_MIN_ABS = 50 Same defaults as /spikes page; tight coupling
bot-spike-snapshot-top-countries TOP_N_COUNTRIES = 25 Wider than confidence snapshot's 10 — bot patterns are country-localised

Schema:

CREATE TABLE keywords_volume.bot_spike_counts_history (
  snapshot_at DateTime,
  scope LowCardinality(String),
  spiked_kw_count UInt64,
  bot_corrected_kw_count UInt64
) ENGINE = MergeTree
ORDER BY (snapshot_at, scope)

Failure modes (all 3)

Symptom Cause Behaviour
Scatter-gather shard timeout CH replica slow Logged, that shard's contribution missing from the row
Insert into keywords_volume.*_history fails isog write outage 3 retries with 5 s back-off, then skip this cycle's snapshot
Snapshot script raises uncaught exception Bug, OOM, etc. keep_running.sh has no early-exit — next stages still run; one row missing from history

A few missed snapshots are harmless; the time-series history just has a gap.

See also

  • keep_running.sh — orchestrator that invokes all three
  • Central hub table — read surface
  • demo-app — consumer of these history tables (Phase 5)
  • record_bot_spike_counts_snapshot.py:KB-ANCHOR:bot-spike-snapshot-predicate is the same spike-predicate used by /spikes in the demo app — changes need to be applied in both places.