StoryIntel
100% Cloudflare-Native News Intelligence Platform
Overview
StoryIntel is a serverless news intelligence platform that crawls, enriches, classifies, and delivers relevant news articles to customers. The entire infrastructure runs on Cloudflare's edge platform.
Key Features
- Real-time News Monitoring - Crawl multiple sources every 15 minutes
- AI Classification - Multi-tier classification (rules, vectors, LLM)
- Story Clustering - Group articles into evolving narratives
- Pluggable Extraction - Extract structured data (entities, events, funding rounds)
- Personalized Feeds - Match articles to customer profiles
- Intelligent Alerts - Notify on high-relevance matches
- Daily Briefings - AI-generated news summaries
- Q&A Interface - Ask questions about your feed
Documentation
Architecture
| Document | Description |
|---|---|
| architecture/overview.md | High-level stack diagram, why Cloudflare |
| architecture/system-flow.md | Granular pipeline flow (hero diagram) |
| architecture/cloudflare-foundation.md | Deep dive into each CF component |
Plugin System
| Document | Description |
|---|---|
| plugins/overview.md | Extension points: sources, extractors, outputs |
| plugins/extraction-plugins.md | Built-in extractors and custom creation |
| plugins/google-news-adapter.md | Google News source adapter |
API Reference
| Document | Description |
|---|---|
| api-reference/openapi.yaml | OpenAPI 3.1 specification |
Operations
| Document | Description |
|---|---|
| operations/deployment.md | Wrangler config, CI/CD, environments |
| operations/runbook.md | Operations procedures, troubleshooting |
| operations/security.md | Authentication, authorization, encryption |
| operations/testing.md | Test strategy, fixtures, mocks |
Reference
| Document | Description |
|---|---|
| reference/database-schema.sql | D1 schema (35+ tables) |
| reference/clickhouse-schema.sql | ClickHouse analytics schema |
| reference/taxonomy-seed.md | Classification labels |
| reference/external-apis.md | Google News, ZenRows, DataForSEO |
| reference/ai-agents.md | AI agents specification |
| reference/gaps-and-unknowns.md | Known issues, risks |
| reference/phased-build-plan.md | Implementation phases |
Architecture at a Glance
CLOUDFLARE EDGE NETWORK
┌──────────────────────────────────────────────────────────────┐
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ API GATEWAY │ │
│ │ Cloudflare Workers (Edge) │ │
│ │ Auth • Rate Limiting • Routing │ │
│ └───────────────────────┬──────────────────────────────┘ │
│ │ │
│ ┌───────────────────────┴──────────────────────────────┐ │
│ │ PROCESSING PIPELINE │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ WORKFLOWS (Control) QUEUES (Data) │ │ │
│ │ │ IngestKeyword crawl.batch │ │ │
│ │ │ ProcessArticle article.extract │ │ │
│ │ │ StoryCluster notify.dispatch │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────┴──────────────────────────────┐ │
│ │ CLOUDFLARE FOUNDATION │ │
│ │ D1 (SQLite) • R2 (Objects) • KV (Cache) │ │
│ │ Vectorize (7 indexes) • Workers AI │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘
│
┌───────────────────────────┴──────────────────────────────────┐
│ EXTERNAL SERVICES │
│ Google News • ZenRows • DataForSEO • SharedCount • OpenAI │
└──────────────────────────────────────────────────────────────┘
See architecture/overview.md for the full stack diagram.
Quick Start
Prerequisites
- Node.js 18+
- pnpm 8+
- Cloudflare account with Workers Paid plan
- Wrangler CLI
Installation
# Clone repository
git clone https://github.com/lovelady/storyintel.git
cd storyintel/api
# Install dependencies
pnpm install
# Authenticate with Cloudflare
wrangler login
# Create D1 database
wrangler d1 create storyintel-dev
# Run migrations
wrangler d1 migrations apply storyintel-dev --local
# Start development server
pnpm dev
Configuration
Copy the example wrangler config:
cp wrangler.example.toml wrangler.toml
Set required secrets:
wrangler secret put ZENROWS_API_KEY
wrangler secret put DATA4SEO_API_KEY
wrangler secret put SHAREDCOUNT_API_KEY
wrangler secret put OPENAI_API_KEY
wrangler secret put JWT_SECRET
Project Structure
storyintel/
├── api/
│ ├── src/
│ │ ├── workers/ # Worker entry points
│ │ ├── workflows/ # Cloudflare Workflows
│ │ ├── services/ # Business logic
│ │ ├── db/ # Database layer
│ │ └── shared/ # Utilities
│ ├── migrations/ # D1 migrations
│ ├── docs/ # Documentation (you are here)
│ │ ├── architecture/ # System design
│ │ ├── plugins/ # Extension points
│ │ ├── api-reference/ # OpenAPI spec
│ │ ├── operations/ # Deployment, runbook
│ │ └── reference/ # Schema, external APIs
│ └── tests/ # Test suites
├── console/ # Admin console (React)
└── client-web/ # Customer web app
Cloudflare Resources
Workers (8)
| Worker | Purpose |
|---|---|
api-gateway | Public API routing |
admin-api | Admin endpoints |
crawl-consumer | Content fetching |
extract-consumer | HTML parsing |
enrich-consumer | Social metrics, backlinks |
classify-consumer | Classification |
cluster-consumer | Story clustering |
notify-consumer | Alert dispatch |
Queues (9)
| Queue | Purpose |
|---|---|
crawl.batch | Crawl jobs |
article.extract | Extraction jobs |
article.enrich | Enrichment jobs |
article.embed | Embedding jobs |
article.classify | Classification jobs |
story.cluster | Clustering jobs |
profile.match | Matching jobs |
notify.dispatch | Notification jobs |
cost.track | Cost tracking |
Vectorize Indexes (7)
| Index | Purpose |
|---|---|
articles | Article embeddings (1536 dims) |
stories | Story centroids |
profiles | Customer preferences |
taxonomy | Classification labels |
entities | Named entities |
locations | Geographic locations (225K+) |
authors | Author embeddings |
Workflows (3)
| Workflow | Purpose |
|---|---|
IngestKeyword | Cron-triggered acquisition |
StoryCluster | Periodic re-clustering |
RetentionCleanup | Data lifecycle |
API Overview
Authentication
# Customer API
curl -H "X-API-Key: si_live_abc123..." https://api.storyintel.com/v1/feed
# Admin API
curl -H "Authorization: Bearer eyJ..." https://api.storyintel.com/v1/admin/...
Key Endpoints
# Feed and Discovery
GET /v1/feed - Personalized article feed
GET /v1/stories - Story feed
GET /v1/articles/:id - Single article
GET /v1/search - Full-text + semantic search
# Intelligence
GET /v1/briefings/daily - AI daily briefing
POST /v1/qa - Ask questions
# Profile Management
GET /v1/profiles - List profiles
POST /v1/profiles - Create profile
# Admin
POST /v1/admin/crawl/trigger - Manual crawl
GET /v1/admin/pipeline/status - System health
GET /v1/admin/costs - Cost tracking
See api-reference/openapi.yaml for complete specification.
Plugin Architecture
StoryIntel is extensible at three points:
| Extension Point | Purpose | Examples |
|---|---|---|
| Source Adapters | Where we crawl from | Google News, RSS, Twitter |
| Extraction Plugins | What data we extract | Entities, Events, Funding Rounds |
| Output Adapters | Where data goes | Email, Slack, Webhook, Airtable |
See plugins/overview.md for details.
External Services
| Service | Purpose | Cost Model |
|---|---|---|
| Google News | Article discovery | Free (rate limited) |
| ZenRows | Anti-bot bypass | Per request (~$0.005) |
| DataForSEO | Backlinks, fallback | Per request (~$0.004) |
| SharedCount | Social metrics | Per request (~$0.0001) |
| Workers AI | Embeddings, LLM | Included in plan |
| OpenAI | Complex reasoning | Per token (fallback) |
See reference/external-apis.md for integration details.
Cost Management
All paid operations are tracked:
SELECT service, SUM(cost_micros)/1000000.0 as usd
FROM cost_events
WHERE date(timestamp) = date('now')
GROUP BY service;
Typical cost: $0.005 - $0.015 per article
At 10,000 articles/day: ~$50-150/day
Contributing
- Create feature branch from
main - Write tests for new functionality
- Ensure all tests pass:
pnpm test - Submit PR for review
License
Proprietary - All rights reserved
Built on Cloudflare Workers, D1, R2, KV, Vectorize, Queues, Workflows, and Workers AI
Last updated: December 19, 2024 at 10:45 PM PST