> ## Documentation Index > Fetch the complete documentation index at: https://mintlify.com/koala73/worldmonitor/llms.txt > Use this file to discover all available pages before exploring further. # Threat Classification Pipeline > Three-stage hybrid classifier: instant keywords, async ML, and batched LLM refinement ## Overview Every news item passes through a **three-stage classification pipeline** that provides instant results while progressively refining threat assessments using ML and LLM: 1. **Keyword classifier** (instant, `source: 'keyword'`) — \~120 threat keywords across 5 severity tiers 2. **Browser-side ML** (async, `source: 'ml'`) — Transformers.js NER + sentiment analysis 3. **LLM classifier** (batched async, `source: 'llm'`) — Groq Llama 3.1 8B or Ollama local The UI is never blocked waiting for AI. Users see keyword results instantly, with ML/LLM refinements arriving within seconds and persisting for all subsequent visitors. ## Stage 1: Keyword Classifier Pattern-matches against \~120 threat keywords organized by severity tier and event category. ### Severity Tiers Existential threats and major escalation: **Military/Conflict**: * `nuclear strike`, `nuclear attack`, `nuclear war` * `invasion`, `declaration of war`, `declares war` * `all-out war`, `full-scale war` * `martial law`, `coup`, `coup attempt` * `genocide`, `ethnic cleansing` * `massive strikes`, `military strikes`, `retaliatory strikes` **Iran-specific** (high geopolitical priority): * `attack iran`, `attacks iran`, `strikes iran` * `war with iran`, `war on iran` * `iran retaliates`, `iran strikes`, `iran attacks` **WMD**: * `chemical attack`, `biological attack`, `dirty bomb` **Health**: * `pandemic declared`, `health emergency` **Military alliance**: * `nato article 5` **Disaster**: * `nuclear meltdown`, `evacuation order` **Examples**: * "Russia invades Baltic states" → `critical: conflict` * "Iran launches retaliatory strikes" → `critical: military` * "NATO invokes Article 5" → `critical: military` Active conflict and severe threats: **Conflict**: * `war`, `armed conflict` * `airstrike`, `drone strike`, `bombing`, `shelling` * `casualties`, `killed in` * `strike on`, `attack on`, `launches attack` **Military**: * `missile`, `missile launch`, `missiles fired` * `troops deployed`, `military escalation` * `ground offensive`, `military operation` * `ballistic missile`, `cruise missile` **Terrorism**: * `hostage`, `terrorist`, `terror attack`, `assassination` **Cyber**: * `cyber attack`, `ransomware`, `data breach` **Economic**: * `sanctions`, `embargo` **Disaster**: * `earthquake`, `tsunami`, `hurricane`, `typhoon` **Compound escalation**: HIGH military/conflict + critical geopolitical target → escalated to CRITICAL Example: "US and Israel strikes on Iran" → `critical: military` (escalation logic) **Source**: `src/services/threat-classifier.ts:329-337` Political instability and infrastructure disruption: * `protest`, `riot`, `unrest`, `demonstration` * `military exercise`, `naval exercise` * `arms deal`, `weapons sale` * `diplomatic crisis`, `ambassador recalled`, `expel diplomats` * `trade war`, `tariff`, `recession`, `inflation` * `market crash` * `flood`, `wildfire`, `volcano`, `eruption` * `outbreak`, `epidemic` * `oil spill`, `pipeline explosion` * `blackout`, `power outage`, `internet outage` * `derailment` Diplomatic activity and low-intensity events: * `election`, `vote`, `referendum` * `summit`, `treaty`, `agreement`, `negotiation` * `talks`, `peacekeeping`, `humanitarian aid` * `ceasefire`, `peace treaty` * `climate change`, `emissions`, `pollution` * `vaccine`, `vaccination`, `disease`, `virus` * `interest rate`, `gdp`, `unemployment`, `regulation` General news with no specific threat classification. **Exclusions**: Headlines containing lifestyle/entertainment keywords are auto-classified as INFO to prevent false positives: * `protein`, `couples`, `relationship`, `dating` * `diet`, `fitness`, `recipe`, `cooking` * `shopping`, `fashion`, `celebrity`, `movie` * `tv show`, `sports`, `game`, `concert` * `strikes deal`, `strikes agreement` (not military strikes) ### Event Categories Wars, battles, armed clashes Civil unrest, demonstrations Troop movements, exercises Attacks, hostage situations Hacking, data breaches Natural disasters, accidents Treaties, summits, negotiations Sanctions, market events Pandemics, outbreaks Climate, pollution, spills Outages, pipeline explosions Assassinations, organized crime Tech-specific events (variant) Uncategorized news ### Keyword Matching Logic Short keywords (≤5 chars) use `\b` word boundaries to prevent false positives: * `war` matches "war in Ukraine" but not "award ceremony" * `riot` matches "riot police" but not "patriot" * `hack` matches "data hack" but not "hackathon" **Short keyword list**: `war`, `coup`, `ban`, `vote`, `riot`, `hack`, `talks`, `ipo`, `gdp`, `virus`, `disease`, `flood`, `strikes` Iran-specific keywords use trailing boundary only (allow prefix matches): * `attack iran` uses `(?![\w-])` instead of `\b..\b` * Prevents hyphen breaks: "US-Iran tensions" still matches **Trailing boundary keywords**: All Iran-specific phrases from CRITICAL tier Compiled regexes are cached in a `Map` to avoid recompiling on every headline (10-15x performance improvement). **Source**: `src/services/threat-classifier.ts:286-315` ### Variant-Specific Keywords The **Tech Monitor** variant includes additional keywords for tech industry threats: **High**: * `major outage`, `global outage`, `service down` * `zero-day`, `critical vulnerability`, `supply chain attack` * `mass layoff` **Medium**: * `outage`, `breach`, `hack`, `vulnerability` * `layoff`, `layoffs`, `antitrust`, `monopoly` * `ban`, `shutdown` **Low**: * `ipo`, `funding`, `acquisition`, `merger` * `launch`, `release`, `update`, `partnership` * `startup`, `ai model`, `open source` **Source**: `src/services/threat-classifier.ts:241-276` ## Stage 2: Browser-Side ML Transformers.js runs **Named Entity Recognition (NER)**, **sentiment analysis**, and **topic classification** entirely in the browser: * `Xenova/bert-base-NER` — entity extraction * `Xenova/distilbert-base-uncased-finetuned-sst-2-english` — sentiment * Topic classification model (custom fine-tuned) **Loading**: ONNX models are downloaded on first use and cached in browser IndexedDB. **User control**: "Browser Local Model" toggle in AI Flow settings. When disabled: * ML worker is never initialized * No ONNX model downloads * No WebGL memory allocation * Keyword classifier remains active Toggle propagates dynamically — enabling it mid-session initializes the worker immediately. ML confidence is typically lower than LLM but higher than keyword-only classification. **Source**: `src/services/ml-worker.ts` ## Stage 3: LLM Classifier Headlines are collected into a batch queue and fired as parallel `classifyEvent` RPCs: ### Batching Configuration Max headlines per batch. Wait time before flushing partial batch (if fewer than 20 items). Base delay between API requests to prevent rate limiting. Random jitter (±200ms) added to stagger timing. Minimum gap between requests enforced. Failed jobs are retried up to 2 times before dropping. Queue is capped at 100 items. Excess classifications are dropped with console warning. ### Error Handling * Batch queue pauses for **60 seconds** * Failed job increments attempt counter and is requeued (if attempts \< MAX\_RETRIES) * Remaining jobs in batch are requeued WITHOUT burning attempts * Console warning: `[Classify] 429 — pausing AI classification for 60s` * Batch queue pauses for **30 seconds** * Same retry logic as 429 * Prevents wasting API quota on transient failures * Console warning: `[Classify] 500 — pausing AI classification for 30s` * Individual job fails (no queue pause) * Job is retried up to MAX\_RETRIES * After max retries, returns `null` (keyword classification remains) **Source**: `src/services/threat-classifier.ts:412-495` ### LLM Provider Configuration ```typescript Groq (Cloud) theme={null} const GROQ_CONFIG = { model: 'llama-3.1-8b-instant', temperature: 0, maxTokens: 50, timeout: 5000 }; ``` ```typescript Ollama (Local) theme={null} const OLLAMA_CONFIG = { endpoint: 'http://localhost:11434/v1/chat/completions', model: 'llama3.1:8b', // auto-discovered temperature: 0.3 }; ``` ```typescript OpenRouter (Fallback) theme={null} const OPENROUTER_CONFIG = { model: 'meta-llama/llama-3.1-8b-instruct', temperature: 0 }; ``` ### Redis Caching LLM results are cached with 24h TTL to prevent redundant API calls: ```typescript theme={null} const cacheKey = `classify:${hashHeadline(title)}`; const cached = await redis.get(cacheKey); if (cached) return JSON.parse(cached); const result = await classifyClient.classifyEvent({ title, ... }); await redis.setex(cacheKey, 86400, JSON.stringify(result)); return result; ``` **Deduplication**: Same headline viewed by 1,000 concurrent users triggers exactly one LLM call. ## Classification Override Logic When multiple sources provide results, the **highest confidence wins**: ```typescript theme={null} function selectBestClassification( keyword: ThreatClassification, ml: ThreatClassification | null, llm: ThreatClassification | null ): ThreatClassification { const candidates = [keyword, ml, llm].filter(Boolean) as ThreatClassification[]; return candidates.reduce((best, current) => current.confidence > best.confidence ? current : best ); } ``` **Result tagging**: Each classification carries its `source` tag (`keyword`, `ml`, `llm`) so downstream consumers can weight confidence accordingly. ## Aggregate Threat for Clusters News clusters (multiple sources reporting same story) aggregate threat levels: ```typescript theme={null} export function aggregateThreats( items: Array<{ threat?: ThreatClassification; tier?: number }> ): ThreatClassification { // Level = max across items const maxLevel = Math.max(...items.map(i => THREAT_PRIORITY[i.threat!.level])); // Category = most frequent const catCounts = new Map(); for (const item of withThreat) { const cat = item.threat!.category; catCounts.set(cat, (catCounts.get(cat) ?? 0) + 1); } const topCat = [...catCounts.entries()].sort((a, b) => b[1] - a[1])[0][0]; // Confidence = weighted avg by source tier (lower tier = higher weight) let weightedSum = 0; let weightTotal = 0; for (const item of withThreat) { const weight = item.tier ? (6 - Math.min(item.tier, 5)) : 1; weightedSum += item.threat!.confidence * weight; weightTotal += weight; } return { level: maxLevel, category: topCat, confidence: weightTotal > 0 ? weightedSum / weightTotal : 0.5, source: 'keyword', }; } ``` **Source**: `src/services/threat-classifier.ts:521-570` ## Threat Color Mapping Threat levels are color-coded with CSS variables for theme support: Red `--threat-critical` Orange `--threat-high` Yellow `--threat-medium` Green `--threat-low` Blue `--threat-info` ```typescript theme={null} export function getThreatColor(level: ThreatLevel): string { return getCSSColor(THREAT_VAR_MAP[level] || '--text-dim'); } ``` **Runtime reads**: Use `getThreatColor()` instead of static `THREAT_COLORS` object to support light/dark theme switching. ## Example Classifications ```json Critical: Nuclear Threat theme={null} { "level": "critical", "category": "military", "confidence": 0.9, "source": "keyword", "matchedKeyword": "nuclear strike" } ``` ```json High: Military Escalation theme={null} { "level": "high", "category": "conflict", "confidence": 0.85, "source": "llm", "reasoning": "Active military strikes on critical infrastructure" } ``` ```json Medium: Protest Activity theme={null} { "level": "medium", "category": "protest", "confidence": 0.7, "source": "keyword", "matchedKeyword": "riots" } ``` ```json Low: Diplomatic Talks theme={null} { "level": "low", "category": "diplomatic", "confidence": 0.6, "source": "keyword", "matchedKeyword": "peace treaty" } ``` ```json Info: General News theme={null} { "level": "info", "category": "general", "confidence": 0.3, "source": "keyword", "reason": "No threat keywords matched" } ``` ## Key Files * `src/services/threat-classifier.ts` — Main classification engine * `src/services/ml-worker.ts` — Browser-side Transformers.js ML * `api/intelligence/classify-event.ts` — LLM classification handler * `src/components/ThreatBadge.tsx` — UI threat level indicators