GT (Google Trends)¶
Code: kvprocessor.cpp — PullGTData() (historical init), PullGTNewData() (continuous). src/extract_gt.py handles extraction. Validity gate: processing.py:KB-ANCHOR:gt-validity-50-gate.
Last validated: 2026-05-21
What it carries¶
A relative search-popularity index (UInt8 values 0–100) by month for each keyword, going back to ~2014. GT is the only signal that lets us see shape (seasonality, secular trend, sudden change) for the long tail of keywords where GSC alone has too little history.
Where it lives and how it gets here¶
src/extract_gt.py fetches GT data via SearchAPI (docs) and writes into keywords_volume.google_trends on isog. kvprocessor.cpp pulls from that table on every run.
Which keywords get refreshed when is driven by signals in processed_volume_trend_meta: needs_gt = true keywords are queued, prioritised by tier (1 = urgent vs 2 = deferred) and recency. Tier assignment lives in the refresh-detection logic; see the Decisions section.
Migration history¶
GT was previously fetched via DataForSEO's Google Trends Explore endpoint. It was swapped to SearchAPI because of cost and reliability: DFSEO's per-task pricing plus its 250-task-per-minute rate limit and 500K-daily cap across all users made it expensive and prone to throttling, and the response stability was worse than SearchAPI's. The kvprocessor side is API-agnostic — it reads from the table — so the swap was an extract_gt.py change only.
Validity gates in processing.py¶
Three rules cull bad / misleading GT in processing.py (anchor KB-ANCHOR:gt-validity-50-gate is the load-bearing gate; rules 1 and 2 sit just above it):
- Single-non-zero-not-recent — discard the GT series if there's exactly one non-zero data point and that point is not in the most-recent month. (One stray reading that already faded is more likely a bot blip than a real keyword.)
- Leading-zeros trim — five-plus years of zeros from the 2014–2018 GT history era are stripped before use.
max(gt_trend) > 50— the primary "is this a usable trend" gate. See below.
Why the > 50 threshold¶
The > 50 peak threshold is a guard against bot-driven and anomalous spikes. GT values can be inflated by short bursts of bot or non-organic activity; downstream spike-detection / repair logic may need to reduce or remove such peaks. The 50 threshold ensures we only commit to a GT trend when there's at least one prominently high point — high enough that the trend still has useful signal even after a spike-cleanup pass removes the worst peaks. Keywords whose entire GT series tops out below 50 are too marginal to trust through that cleanup.
See also¶
Reference docs (external):
- SearchAPI Google Trends — current upstream API. We use
engine=google_trendswithdata_type=TIMESERIES(the "Interest over time" 0–100 monthly index). - DataForSEO Google Trends Explore Live — legacy / deprecated upstream API; kept here as the prior implementation reference.
Internal:
- Central hub table —
gt_*_array_storecolumns - Refresh extractor
extract_gt.py— fetching + writing GT data - Decisions —
needs_gtflag computation, GT/GSC correlation gate, GT collapse override, GT 2× soft cap - Column-level reference — TBD:
../fields/