StoryIntel

100% Cloudflare-Native News Intelligence Platform

Overview

StoryIntel is a serverless news intelligence platform that crawls, enriches, classifies, and delivers relevant news articles to customers. The entire infrastructure runs on Cloudflare's edge platform.

Key Features

Real-time News Monitoring - Crawl multiple sources every 15 minutes
AI Classification - Multi-tier classification (rules, vectors, LLM)
Story Clustering - Group articles into evolving narratives
Pluggable Extraction - Extract structured data (entities, events, funding rounds)
Personalized Feeds - Match articles to customer profiles
Intelligent Alerts - Notify on high-relevance matches
Daily Briefings - AI-generated news summaries
Q&A Interface - Ask questions about your feed

Documentation

Architecture

Document	Description
architecture/overview.md	High-level stack diagram, why Cloudflare
architecture/system-flow.md	Granular pipeline flow (hero diagram)
architecture/cloudflare-foundation.md	Deep dive into each CF component

Plugin System

Document	Description
plugins/overview.md	Extension points: sources, extractors, outputs
plugins/extraction-plugins.md	Built-in extractors and custom creation
plugins/google-news-adapter.md	Google News source adapter

API Reference

Document	Description
api-reference/openapi.yaml	OpenAPI 3.1 specification

Operations

Document	Description
operations/deployment.md	Wrangler config, CI/CD, environments
operations/runbook.md	Operations procedures, troubleshooting
operations/security.md	Authentication, authorization, encryption
operations/testing.md	Test strategy, fixtures, mocks

Reference

Document	Description
reference/database-schema.sql	D1 schema (35+ tables)
reference/clickhouse-schema.sql	ClickHouse analytics schema
reference/taxonomy-seed.md	Classification labels
reference/external-apis.md	Google News, ZenRows, DataForSEO
reference/ai-agents.md	AI agents specification
reference/gaps-and-unknowns.md	Known issues, risks
reference/phased-build-plan.md	Implementation phases

Architecture at a Glance

                        CLOUDFLARE EDGE NETWORK
    ┌──────────────────────────────────────────────────────────────┐
    │                                                              │
    │   ┌──────────────────────────────────────────────────────┐   │
    │   │                    API GATEWAY                       │   │
    │   │              Cloudflare Workers (Edge)               │   │
    │   │         Auth • Rate Limiting • Routing               │   │
    │   └───────────────────────┬──────────────────────────────┘   │
    │                           │                                  │
    │   ┌───────────────────────┴──────────────────────────────┐   │
    │   │              PROCESSING PIPELINE                     │   │
    │   │  ┌─────────────────────────────────────────────────┐ │   │
    │   │  │ WORKFLOWS (Control)    QUEUES (Data)            │ │   │
    │   │  │ IngestKeyword          crawl.batch              │ │   │
    │   │  │ ProcessArticle         article.extract          │ │   │
    │   │  │ StoryCluster           notify.dispatch          │ │   │
    │   │  └─────────────────────────────────────────────────┘ │   │
    │   └──────────────────────────────────────────────────────┘   │
    │                           │                                  │
    │   ┌───────────────────────┴──────────────────────────────┐   │
    │   │              CLOUDFLARE FOUNDATION                   │   │
    │   │  D1 (SQLite) • R2 (Objects) • KV (Cache)            │   │
    │   │  Vectorize (7 indexes) • Workers AI                 │   │
    │   └──────────────────────────────────────────────────────┘   │
    │                                                              │
    └──────────────────────────────────────────────────────────────┘
                                │
    ┌───────────────────────────┴──────────────────────────────────┐
    │                    EXTERNAL SERVICES                         │
    │  Google News • ZenRows • DataForSEO • SharedCount • OpenAI  │
    └──────────────────────────────────────────────────────────────┘

See architecture/overview.md for the full stack diagram.

Quick Start

Prerequisites

Node.js 18+
pnpm 8+
Cloudflare account with Workers Paid plan
Wrangler CLI

Installation

# Clone repository
git clone https://github.com/lovelady/storyintel.git
cd storyintel/api

# Install dependencies
pnpm install

# Authenticate with Cloudflare
wrangler login

# Create D1 database
wrangler d1 create storyintel-dev

# Run migrations
wrangler d1 migrations apply storyintel-dev --local

# Start development server
pnpm dev

Configuration

Copy the example wrangler config:

cp wrangler.example.toml wrangler.toml

Set required secrets:

wrangler secret put ZENROWS_API_KEY
wrangler secret put DATA4SEO_API_KEY
wrangler secret put SHAREDCOUNT_API_KEY
wrangler secret put OPENAI_API_KEY
wrangler secret put JWT_SECRET

Project Structure

storyintel/
├── api/
│   ├── src/
│   │   ├── workers/          # Worker entry points
│   │   ├── workflows/        # Cloudflare Workflows
│   │   ├── services/         # Business logic
│   │   ├── db/               # Database layer
│   │   └── shared/           # Utilities
│   ├── migrations/           # D1 migrations
│   ├── docs/                 # Documentation (you are here)
│   │   ├── architecture/     # System design
│   │   ├── plugins/          # Extension points
│   │   ├── api-reference/    # OpenAPI spec
│   │   ├── operations/       # Deployment, runbook
│   │   └── reference/        # Schema, external APIs
│   └── tests/                # Test suites
├── console/                  # Admin console (React)
└── client-web/               # Customer web app

Cloudflare Resources

Workers (8)

Worker	Purpose
`api-gateway`	Public API routing
`admin-api`	Admin endpoints
`crawl-consumer`	Content fetching
`extract-consumer`	HTML parsing
`enrich-consumer`	Social metrics, backlinks
`classify-consumer`	Classification
`cluster-consumer`	Story clustering
`notify-consumer`	Alert dispatch

Queues (9)

Queue	Purpose
`crawl.batch`	Crawl jobs
`article.extract`	Extraction jobs
`article.enrich`	Enrichment jobs
`article.embed`	Embedding jobs
`article.classify`	Classification jobs
`story.cluster`	Clustering jobs
`profile.match`	Matching jobs
`notify.dispatch`	Notification jobs
`cost.track`	Cost tracking

Vectorize Indexes (7)

Index	Purpose
`articles`	Article embeddings (1536 dims)
`stories`	Story centroids
`profiles`	Customer preferences
`taxonomy`	Classification labels
`entities`	Named entities
`locations`	Geographic locations (225K+)
`authors`	Author embeddings

Workflows (3)

Workflow	Purpose
`IngestKeyword`	Cron-triggered acquisition
`StoryCluster`	Periodic re-clustering
`RetentionCleanup`	Data lifecycle

API Overview

Authentication

# Customer API
curl -H "X-API-Key: si_live_abc123..." https://api.storyintel.com/v1/feed

# Admin API
curl -H "Authorization: Bearer eyJ..." https://api.storyintel.com/v1/admin/...

Key Endpoints

# Feed and Discovery
GET  /v1/feed              - Personalized article feed
GET  /v1/stories           - Story feed
GET  /v1/articles/:id      - Single article
GET  /v1/search            - Full-text + semantic search

# Intelligence
GET  /v1/briefings/daily   - AI daily briefing
POST /v1/qa                - Ask questions

# Profile Management
GET  /v1/profiles          - List profiles
POST /v1/profiles          - Create profile

# Admin
POST /v1/admin/crawl/trigger   - Manual crawl
GET  /v1/admin/pipeline/status - System health
GET  /v1/admin/costs           - Cost tracking

See api-reference/openapi.yaml for complete specification.

Plugin Architecture

StoryIntel is extensible at three points:

Extension Point	Purpose	Examples
Source Adapters	Where we crawl from	Google News, RSS, Twitter
Extraction Plugins	What data we extract	Entities, Events, Funding Rounds
Output Adapters	Where data goes	Email, Slack, Webhook, Airtable

See plugins/overview.md for details.

External Services

Service	Purpose	Cost Model
Google News	Article discovery	Free (rate limited)
ZenRows	Anti-bot bypass	Per request (~$0.005)
DataForSEO	Backlinks, fallback	Per request (~$0.004)
SharedCount	Social metrics	Per request (~$0.0001)
Workers AI	Embeddings, LLM	Included in plan
OpenAI	Complex reasoning	Per token (fallback)

See reference/external-apis.md for integration details.

Cost Management

All paid operations are tracked:

SELECT service, SUM(cost_micros)/1000000.0 as usd
FROM cost_events
WHERE date(timestamp) = date('now')
GROUP BY service;

Typical cost: $0.005 - $0.015 per article

At 10,000 articles/day: ~$50-150/day

Contributing

Create feature branch from main
Write tests for new functionality
Ensure all tests pass: pnpm test
Submit PR for review

License

Built on Cloudflare Workers, D1, R2, KV, Vectorize, Queues, Workflows, and Workers AI

Last updated: December 19, 2024 at 10:45 PM PST

Overview​

Key Features​

Documentation​

Architecture​

Plugin System​

API Reference​

Operations​

Reference​

Architecture at a Glance​

Quick Start​

Prerequisites​

Installation​

Configuration​

Project Structure​

Cloudflare Resources​

Workers (8)​

Queues (9)​

Vectorize Indexes (7)​

Workflows (3)​

API Overview​

Authentication​

Key Endpoints​

Plugin Architecture​

External Services​

Cost Management​

Contributing​

License​