Methodology
The full pipeline from RSS feed to published article. Last updated: April 2026.
In one sentence: market.news ingests 80+ financial news feeds, clusters articles about the same event, synthesizes each cluster with an LLM, runs every synthesis through a second LLM acting as quality reviewer, and publishes only those that pass. This page explains every step in detail.
Why we built it this way
Three architectural decisions shape every other choice:
- Synthesis over aggregation. Aggregators (Google News, Apple News) surface other people's articles. We synthesize multiple sources into one article β which means the reader gets the full picture without clicking through 10 tabs. This requires AI; the volume isn't feasible manually.
- Multi-source by default. Single-source articles are vulnerable to one outlet's biases or errors. We require multi-source clustering except for breaking news from Tier-1 wires (where we trust the wire reputation).
- Quality gating. AI generates plausible-sounding nonsense more often than people realize. Every article goes through a second-pass AI reviewer before publication. Currently ~40% of attempts get rejected.
The 8-stage pipeline
Stage 1: Ingestion (W1)
An n8n workflow polls 87 RSS feeds every 15 minutes. Sources are categorized by country (US, India, UK, Germany, etc.) and tier:
- Tier 1 β wire services (Reuters, AP, Bloomberg) and primary sources (central bank press releases, exchange filings)
- Tier 2 β major financial publishers (Financial Times, Wall Street Journal feeds, Mint, Handelsblatt, Korea Economic Daily)
- Tier 3 β niche specialists (industry-specific or regional outlets)
Each fetched article is normalized (title, excerpt, source name, source tier, published date, URL) and stored in a Postgres database. We dedupe by URL hash and title hash to avoid processing the same article twice.
Stage 2: Clustering (W2)
A Postgres function runs every 5 minutes and groups newly-ingested articles by topic similarity. We use trigram similarity on normalized titles (lowercased, alphanumeric only) to cluster articles about the same event together. A cluster is the unit we synthesize.
Stage 3: Promotion
Clusters are eligible for synthesis when they meet quality criteria:
- 3+ sources from any tier, OR
- 1+ Tier-1 wire source older than 5 minutes, OR
- 2+ Tier-2 sources older than 30 minutes, OR
- 2+ sources of any tier older than 2 hours, OR
- Marketaux pre-vetted source (older than 2 minutes), OR
- Extreme-sentiment article (|sentiment| > 0.5) older than 5 minutes
Single-source clusters that don't meet any of the above criteria stay in βcollectingβ status until they get a second source or expire.
Stage 4: Synthesis (W3)
An LLM (Anthropic Claude Sonnet) reads all the source articles in a cluster and produces a structured intelligence report. The prompt includes:
- Full text of every source article (titles + excerpts)
- Country and market context
- Earnings data from Finnhub if applicable
- FRED economic data if it's an economy/Fed story
- Strict instruction to use only facts from the source articles
- Structured JSON schema for the response
The output includes: headline, 5 fact bullets, sentiment score, source breakdown by tier, key numbers (earnings, prices), India/Asia cross-country angle, ripple effects across asset classes, and 3 forward signals (βwhat to watch nextβ).
Stage 5: Quality review (QC Agent)
Every synthesized article passes through a second-pass LLM (Anthropic Claude Haiku) acting as quality reviewer. The reviewer scores 0-100 across six criteria:
- Factual fidelity (40 pts): Every number, quote, and name in the synthesis must appear in the source articles. Hallucinations heavily penalized.
- Headline quality (15 pts): Specific, <90 chars, mentions concrete entity/event/number, not generic clickbait.
- Bullet quality (15 pts): 5 bullets, each a hard fact, each <120 chars.
- Source diversity (10 pts): 2+ distinct sources rewarded; single-source capped at 70.
- Cross-country angle (10 pts): India/Asia angle present unless the story is purely about India.
- Coherence (10 pts): All bullets about the same event; story category sensible.
Verdict thresholds:
- β₯75: publish
- 60-74: publish but flag for editorial review
- <60: reject (do not publish)
- Any factual hallucination: automatic reject regardless of score
The reviewer's verdict and itemized issues are stored alongside every article in our database, visible on our internal dashboard.
Stage 6: Publication
Articles passing the quality gate are pushed to Ghost CMS via the Admin API and automatically tagged with country, market, and category tags. Each Ghost post includes a hidden JSON-LD payload (Schema.org NewsArticle) for search engines and the structured indicators for our front-end to render.
Stage 7: Distribution
Published articles flow to:
- The relevant country page on market.news (auto-tagged)
- Per-country news sitemap (Google News-format)
- Daily morning briefing email (per country)
- An auto-created topic on community.market.news for discussion
- Our RSS feed
Stage 8: Monitoring
An internal dashboard tracks the pipeline health: ingestion rate, cluster counts, QC pass/review/reject rates, indexing status from Google Search Console, referral traffic from AI assistants. We use these signals to adjust prompt design, source-tier weights, and promotion thresholds over time.
What we don't do
- We don't scrape paywalled content. Only public RSS feeds.
- We don't republish source articles verbatim. Every article is an original synthesis.
- We don't generate fake quotes or attributions. The QC reviewer specifically flags this.
- We don't accept payment for coverage. Editorial is independent of our advertising business.
- We don't make individual investment recommendations. All content is informational.
- We don't provide real-time prices. Live prices come from third-party widgets (TradingView, Finnhub) with their own delays.
Limitations & honesty
AI-synthesized content has known limitations and we want to be honest about them:
- Even with quality review, errors can slip through. Report any you find.
- Source articles themselves may contain errors that propagate to our synthesis.
- We do not have human reporters in markets; we do not break news. We synthesize what others are reporting.
- The cross-country angle is generated by the same AI that writes the article β it's an inference, not always a sourced claim.
- Our system performs better on stories with multiple Tier-1 sources and worse on niche or single-source stories.
We are continuously improving. The quality reviewer's rejection rate, the promotion criteria, and the synthesis prompts are all under active development.
Tech stack (for the curious)
Frontend: Next.js 14 (App Router), Tailwind CSS, hosted on a self-managed VPS.
Backend: FastAPI (Python), Postgres, Redis.
CMS: Ghost (self-hosted, headless).
Workflow orchestration: n8n (self-hosted).
AI: Anthropic Claude (Sonnet for synthesis, Haiku for quality review).
Data: Finnhub (earnings, FX), FRED (economic indicators), Marketaux (entity-tagged news).
CDN: Cloudflare.
Community: Discourse (self-hosted).
Questions about our methodology? Email [email protected]. See also our Editorial Standards and Corrections Policy.