Skip to main content

External API Integrations

Integration specifications for Google News, ZenRows, DataForSEO, SharedCount, and AI services


Table of Contents

  1. Google News
  2. ZenRows
  3. RapidAPI Google News
  4. DataForSEO
  5. SharedCount
  6. OpenAI
  7. Workers AI
  8. Fetch Fallback Strategy
  9. Error Handling
  10. Cost Tracking
  11. Rate Limiting

Google News

See also: Google News Crawling Playbook for comprehensive crawl strategy, rate limiting, and failure handling.

Overview

Google News is our primary discovery source. We use both RSS feeds and HTML scraping.

RSS Feed Endpoints

Base URL: https://news.google.com/rss
EndpointDescriptionExample
/rssTop storieshttps://news.google.com/rss
/rss/searchKeyword searchhttps://news.google.com/rss/search?q=artificial+intelligence
/rss/topics/{topic}Topic feedhttps://news.google.com/rss/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGRqTVhZU0FtVnVHZ0pWVXlnQVAB

Query Parameters

ParameterDescriptionExample
qSearch queryq=artificial+intelligence
hlLanguagehl=en
glCountrygl=US
ceidCombined localeceid=US:en
whenTime filterwhen:1d, when:7d

Request Example

// src/services/google-news.ts

interface GoogleNewsOptions {
query?: string;
language?: string; // ISO 639-1
country?: string; // ISO 3166-1 alpha-2
when?: '1h' | '1d' | '7d' | '30d';
}

export async function fetchGoogleNewsRSS(options: GoogleNewsOptions): Promise<RSSFeed> {
const params = new URLSearchParams();

if (options.query) {
params.set('q', options.query);
}

const hl = options.language || 'en';
const gl = options.country || 'US';
params.set('hl', hl);
params.set('gl', gl);
params.set('ceid', `${gl}:${hl}`);

if (options.when) {
params.set('q', `${options.query || ''} when:${options.when}`.trim());
}

const url = options.query
? `https://news.google.com/rss/search?${params}`
: `https://news.google.com/rss?${params}`;

const response = await fetch(url, {
headers: {
'User-Agent': 'Mozilla/5.0 (compatible; Noozer/1.0)',
'Accept': 'application/rss+xml, application/xml, text/xml',
},
});

if (!response.ok) {
throw new GoogleNewsError(`RSS fetch failed: ${response.status}`);
}

return parseRSS(await response.text());
}

RSS Response Structure

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>artificial intelligence - Google News</title>
<link>https://news.google.com/search?q=artificial%20intelligence</link>
<item>
<title>AI Breakthrough Announced - Tech News</title>
<link>https://news.google.com/rss/articles/CBMi...</link>
<guid isPermaLink="false">CBMi...</guid>
<pubDate>Mon, 15 Jan 2024 12:00:00 GMT</pubDate>
<description>Latest developments in artificial intelligence...</description>
<source url="https://technews.com">Tech News</source>
</item>
</channel>
</rss>

URL Resolution

Google News URLs redirect to the actual article. We must follow redirects:

export async function resolveGoogleNewsUrl(gnUrl: string): Promise<string> {
// Google News URL format: https://news.google.com/rss/articles/CBMi...

const response = await fetch(gnUrl, {
redirect: 'manual',
headers: {
'User-Agent': 'Mozilla/5.0 (compatible; Noozer/1.0)',
},
});

// Follow the 302 redirect
if (response.status === 302 || response.status === 301) {
const location = response.headers.get('Location');
if (location) {
return location;
}
}

// Some URLs require JavaScript execution - fall back to HTML parsing
const html = await response.text();
const match = html.match(/data-n-au="([^"]+)"/);
if (match) {
return decodeURIComponent(match[1]);
}

throw new Error('Could not resolve Google News URL');
}

Rate Limits

LimitValueNotes
Requests/minute~60Estimated, not documented
Requests/hour~300Before soft blocks
IP rotationRecommendedUse multiple egress IPs

Best Practices

  1. Spread requests over time - Don't burst requests
  2. Rotate User-Agents - Vary the User-Agent header
  3. Respect robots.txt - Honor crawl delays
  4. Cache responses - RSS doesn't change frequently (5-15 min TTL)
  5. Use RSS over HTML - Less likely to be blocked

ZenRows

Overview

ZenRows provides anti-bot bypass for fetching article content when direct requests fail.

Authentication

API Key: Provided in dashboard
Base URL: https://api.zenrows.com/v1/

Request Format

interface ZenRowsRequest {
url: string;
apikey: string;
js_render?: boolean; // Execute JavaScript
antibot?: boolean; // Anti-bot bypass (extra cost)
premium_proxy?: boolean; // Premium residential proxies
proxy_country?: string; // ISO country code
wait?: number; // Wait ms after load
wait_for?: string; // CSS selector to wait for
css_extractor?: string; // Extract specific elements
json_response?: boolean; // Return structured JSON
}

Basic Request

// src/services/zenrows.ts

interface ZenRowsOptions {
jsRender?: boolean;
antibot?: boolean;
proxyCountry?: string;
waitFor?: string;
}

export async function fetchWithZenRows(
targetUrl: string,
options: ZenRowsOptions = {}
): Promise<FetchResult> {
const params = new URLSearchParams({
apikey: env.ZENROWS_API_KEY,
url: targetUrl,
});

if (options.jsRender) {
params.set('js_render', 'true');
}

if (options.antibot) {
params.set('antibot', 'true');
}

if (options.proxyCountry) {
params.set('proxy_country', options.proxyCountry);
}

if (options.waitFor) {
params.set('wait_for', options.waitFor);
}

const startTime = Date.now();

const response = await fetch(`https://api.zenrows.com/v1/?${params}`, {
headers: {
'Accept': 'text/html',
},
});

const latencyMs = Date.now() - startTime;

// Track cost
await trackCost({
service: 'zenrows',
operation: options.antibot ? 'fetch_antibot' : 'fetch_standard',
cost_micros: calculateZenRowsCost(options),
latency_ms: latencyMs,
success: response.ok,
});

if (!response.ok) {
const error = await response.text();
throw new ZenRowsError(response.status, error);
}

return {
html: await response.text(),
statusCode: response.status,
latencyMs,
};
}

Pricing and Cost Calculation

FeatureCredits per Request
Standard request1
JS rendering5
Anti-bot10-25
Premium proxy10-25
Residential proxy25
function calculateZenRowsCost(options: ZenRowsOptions): number {
// Base cost: $0.001 per credit, in microdollars
const COST_PER_CREDIT_MICROS = 1000; // $0.001 = 1000 microdollars

let credits = 1; // Base

if (options.jsRender) {
credits = 5;
}

if (options.antibot) {
credits = 25; // Max tier
}

return credits * COST_PER_CREDIT_MICROS;
}

Error Codes

CodeMeaningAction
401Invalid API keyCheck credentials
402Out of creditsTop up account
422Invalid URLValidate URL format
429Rate limitedBack off and retry
500ZenRows errorRetry with backoff
520Target blockedTry antibot mode
521Target timeoutIncrease wait time

Rate Limits

PlanRequests/secondConcurrent
Starter55
Professional2525
Enterprise100+100+

RapidAPI Google News

Overview

RapidAPI provides third-party Google News APIs as a fallback when both direct fetching and ZenRows fail. This is the third tier in our fetch strategy.

APIProviderReliabilityCost
Google News APInewscatcherHigh$0.001/req
Real-Time NewsapidojoMedium$0.0005/req
Google NewsserpdogHigh$0.002/req

Authentication

API Key: From RapidAPI dashboard
Base URL: Varies by provider
Header: X-RapidAPI-Key

Request Format (Newscatcher Example)

// src/services/rapidapi-news.ts

interface RapidAPINewsOptions {
query: string;
language?: string;
country?: string;
pageSize?: number;
}

export async function fetchWithRapidAPI(
options: RapidAPINewsOptions
): Promise<NewsResult[]> {
const startTime = Date.now();

const response = await fetch(
`https://google-news13.p.rapidapi.com/search?keyword=${encodeURIComponent(options.query)}&lr=${options.language || 'en-US'}`,
{
headers: {
'X-RapidAPI-Key': env.RAPIDAPI_KEY,
'X-RapidAPI-Host': 'google-news13.p.rapidapi.com',
},
}
);

const latencyMs = Date.now() - startTime;

if (!response.ok) {
throw new RapidAPIError(response.status, await response.text());
}

const data = await response.json();

// Track cost: ~$0.001 per request = 1000 microdollars
await trackCost({
service: 'rapidapi',
operation: 'google_news_search',
cost_micros: 1000,
latency_ms: latencyMs,
success: true,
});

return parseRapidAPIResults(data);
}

function parseRapidAPIResults(data: any): NewsResult[] {
return (data.items || []).map((item: any) => ({
title: item.title,
url: item.newsUrl,
source: item.publisher,
publishedAt: item.timestamp,
snippet: item.snippet,
imageUrl: item.images?.thumbnail,
}));
}

Response Structure

{
"status": "success",
"items": [
{
"title": "AI Breakthrough Announced",
"newsUrl": "https://example.com/article",
"publisher": "Tech News",
"timestamp": "2024-01-15T12:00:00Z",
"snippet": "Latest developments in AI...",
"images": {
"thumbnail": "https://example.com/thumb.jpg"
}
}
]
}

Error Handling

export class RapidAPIError extends ExternalAPIError {
constructor(statusCode: number, message: string) {
const retryable = [429, 500, 502, 503].includes(statusCode);
super('rapidapi', statusCode, message, retryable);
}
}

Error Codes

CodeMeaningAction
401Invalid API keyCheck RapidAPI credentials
403Not subscribedSubscribe to the API
429Rate limitedBack off and retry
500Provider errorRetry with backoff

Pricing

PlanRequests/monthCost
Basic500Free
Pro10,000$10/mo
Ultra100,000$50/mo
Mega1,000,000$200/mo

Rate Limits

PlanRequests/second
Basic1
Pro5
Ultra10
Mega50

Fetch Fallback Strategy

Three-Tier Approach

┌─────────────────────────────────────────────────────────────────────────────┐
│ FETCH FALLBACK STRATEGY │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ TIER 1: Direct Fetch (Free) │
│ └── Try direct HTTP request to publisher │
│ └── Success? → Done │
│ └── 403/Blocked/Timeout? → Tier 2 │
│ │
│ TIER 2: ZenRows ($$) │
│ └── Anti-bot bypass with JS rendering │
│ └── Success? → Done │
│ └── Failed? → Tier 3 │
│ │
│ TIER 3: RapidAPI ($) │
│ └── Third-party Google News API │
│ └── Success? → Done │
│ └── Failed? → Mark as failed, log for review │
│ │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation

// src/services/fetch-with-fallback.ts

type FetchMethod = 'direct' | 'zenrows' | 'rapidapi' | 'failed';

interface FetchResult {
html?: string;
articles?: NewsResult[];
method: FetchMethod;
latencyMs: number;
error?: string;
}

export async function fetchWithFallback(
url: string,
keywordId: string
): Promise<FetchResult> {
const startTime = Date.now();

// TIER 1: Direct fetch
try {
const html = await directFetch(url);
await logCrawlHistory(keywordId, 'direct', null);
return { html, method: 'direct', latencyMs: Date.now() - startTime };
} catch (directError) {
console.log(`Direct fetch failed: ${directError.message}`);
}

// TIER 2: ZenRows
try {
const result = await fetchWithZenRows(url, { antibot: true });
await logCrawlHistory(keywordId, 'zenrows', null);
return { html: result.html, method: 'zenrows', latencyMs: Date.now() - startTime };
} catch (zenrowsError) {
console.log(`ZenRows failed: ${zenrowsError.message}`);
}

// TIER 3: RapidAPI (for Google News search, not article fetch)
try {
const articles = await fetchWithRapidAPI({ query: extractKeyword(url) });
await logCrawlHistory(keywordId, 'rapidapi', null);
return { articles, method: 'rapidapi', latencyMs: Date.now() - startTime };
} catch (rapidError) {
console.log(`RapidAPI failed: ${rapidError.message}`);
}

// All methods failed
await logCrawlHistory(keywordId, 'failed', 'All fetch methods exhausted');
return {
method: 'failed',
latencyMs: Date.now() - startTime,
error: 'All fetch methods failed'
};
}

async function logCrawlHistory(
keywordId: string,
method: FetchMethod,
error: string | null
): Promise<void> {
await env.DB.prepare(`
INSERT INTO keyword_crawl_history (
keyword_id, crawled_at, fetch_method, error_message
) VALUES (?, datetime('now'), ?, ?)
`).bind(keywordId, method, error).run();
}

Fallback Decision Matrix

ScenarioTier 1 (Direct)Tier 2 (ZenRows)Tier 3 (RapidAPI)
Normal articleTry firstIf 403/blockedIf ZenRows fails
Paywall siteSkipTry with antibotLast resort
Rate limitedRetry laterTry immediatelyIf ZenRows fails
Known blockerSkipTry firstIf ZenRows fails
JS-heavy siteSkipTry with JS renderN/A

Cost Comparison

MethodCost per RequestBest For
Direct$0Most news sites
ZenRows (standard)$0.001Light anti-bot
ZenRows (antibot)$0.025Heavy anti-bot
RapidAPI$0.001Fallback discovery

Admin Dashboard Metrics

The admin dashboard shows fallback rates via v_fetch_fallback_rates view:

-- Example output
| date | total | direct_pct | zenrows_pct | rapidapi_pct | failed_pct |
|------------|-------|------------|-------------|--------------|------------|
| 2024-01-15 | 5000 | 85.0 | 12.0 | 2.5 | 0.5 |
| 2024-01-14 | 4800 | 82.0 | 14.0 | 3.0 | 1.0 |

Alert Thresholds:

  • Direct rate < 70%: Investigate blocking patterns
  • ZenRows rate > 20%: Cost concern, check blocked domains
  • RapidAPI rate > 5%: ZenRows may have issues
  • Failed rate > 2%: Critical - review failing domains

DataForSEO

Overview

DataForSEO provides:

  1. Google News SERP data (alternative discovery)
  2. Backlink analysis
  3. Domain authority scores

Authentication

Login: Email
Password: API password (from dashboard)
Base URL: https://api.dataforseo.com/v3/
Authorization: Basic base64(login:password)

Google News API

// src/services/dataforseo.ts

interface NewsSearchParams {
keyword: string;
location_code?: number; // 2840 = US
language_code?: string; // en
date_range?: string; // past_24_hours, past_week, past_month
limit?: number; // Max results (up to 100)
}

export async function searchGoogleNews(params: NewsSearchParams): Promise<NewsResult[]> {
const auth = Buffer.from(`${env.D4SEO_LOGIN}:${env.D4SEO_PASSWORD}`).toString('base64');

const body = [{
keyword: params.keyword,
location_code: params.location_code || 2840,
language_code: params.language_code || 'en',
date_range: params.date_range || 'past_24_hours',
}];

const startTime = Date.now();

const response = await fetch('https://api.dataforseo.com/v3/serp/google/news/live/advanced', {
method: 'POST',
headers: {
'Authorization': `Basic ${auth}`,
'Content-Type': 'application/json',
},
body: JSON.stringify(body),
});

const latencyMs = Date.now() - startTime;

if (!response.ok) {
throw new DataForSEOError(response.status, await response.text());
}

const data = await response.json();

// Track cost
await trackCost({
service: 'data4seo',
operation: 'google_news_search',
cost_micros: 2500, // $0.0025 per request
latency_ms: latencyMs,
success: true,
});

return parseNewsResults(data);
}

Response Structure

{
"tasks": [{
"result": [{
"keyword": "artificial intelligence",
"items_count": 100,
"items": [{
"type": "news_search",
"title": "AI Breakthrough",
"url": "https://example.com/article",
"domain": "example.com",
"source": "Example News",
"date": "2024-01-15T12:00:00+00:00",
"snippet": "Article description...",
"image_url": "https://example.com/image.jpg"
}]
}]
}]
}
interface BacklinkParams {
target: string; // Domain or URL
limit?: number;
}

export async function getBacklinkSummary(params: BacklinkParams): Promise<BacklinkSummary> {
const auth = Buffer.from(`${env.D4SEO_LOGIN}:${env.D4SEO_PASSWORD}`).toString('base64');

const body = [{
target: params.target,
limit: params.limit || 1,
}];

const response = await fetch('https://api.dataforseo.com/v3/backlinks/summary/live', {
method: 'POST',
headers: {
'Authorization': `Basic ${auth}`,
'Content-Type': 'application/json',
},
body: JSON.stringify(body),
});

// Cost: $0.02 per domain
await trackCost({
service: 'data4seo',
operation: 'backlink_summary',
cost_micros: 20000,
success: response.ok,
});

const data = await response.json();
return parseBacklinkSummary(data);
}
{
"tasks": [{
"result": [{
"target": "nytimes.com",
"total_backlinks": 15000000,
"referring_domains": 250000,
"referring_main_domains": 180000,
"rank": 85,
"backlinks_spam_score": 5,
"broken_backlinks": 50000,
"broken_pages": 1000
}]
}]
}

Pricing

APICost per Request
Google News SERP$0.0025
Backlink Summary$0.02
Backlink History$0.02
Domain Analytics$0.05

Rate Limits

PlanRequests/minute
Standard2000
Plus5000
Pro10000

SharedCount

Overview

SharedCount provides social media engagement metrics for URLs.

Authentication

API Key: From dashboard
Base URL: https://api.sharedcount.com/v1.0/

Request Format

// src/services/sharedcount.ts

interface SocialMetrics {
facebook: {
share_count: number;
comment_count: number;
reaction_count: number;
};
twitter: number;
pinterest: number;
linkedin: number;
total_engagement: number;
}

export async function fetchSocialMetrics(url: string): Promise<SocialMetrics> {
const params = new URLSearchParams({
apikey: env.SHAREDCOUNT_API_KEY,
url: url,
});

const startTime = Date.now();

const response = await fetch(`https://api.sharedcount.com/v1.0/?${params}`);

const latencyMs = Date.now() - startTime;

if (!response.ok) {
throw new SharedCountError(response.status);
}

const data = await response.json();

// Track cost: $0.0001 per request on Plus plan
await trackCost({
service: 'sharedcount',
operation: 'fetch_metrics',
cost_micros: 100,
latency_ms: latencyMs,
success: true,
});

return parseSocialMetrics(data);
}

function parseSocialMetrics(data: any): SocialMetrics {
const facebook = data.Facebook || {};
const twitter = data.Twitter || 0;
const pinterest = data.Pinterest || 0;
const linkedin = data.LinkedIn || 0;

const fbTotal = (facebook.share_count || 0) +
(facebook.comment_count || 0) +
(facebook.reaction_count || 0);

return {
facebook: {
share_count: facebook.share_count || 0,
comment_count: facebook.comment_count || 0,
reaction_count: facebook.reaction_count || 0,
},
twitter,
pinterest,
linkedin,
total_engagement: fbTotal + twitter + pinterest + linkedin,
};
}

Response Structure

{
"Facebook": {
"share_count": 1500,
"comment_count": 200,
"reaction_count": 850
},
"Twitter": 750,
"Pinterest": 50,
"LinkedIn": 120,
"StumbleUpon": 0
}

Pricing

PlanRequests/monthCost
Free500$0
Basic10,000$40/mo
Plus100,000$80/mo
Business500,000$200/mo

Rate Limits

PlanRequests/second
Free1
Basic5
Plus10
Business25

OpenAI

Overview

OpenAI provides:

  1. Text embeddings (text-embedding-3-small)
  2. LLM inference for classification and summarization

Authentication

API Key: sk-...
Base URL: https://api.openai.com/v1/

Embeddings API

// src/services/openai.ts

export async function generateEmbedding(text: string): Promise<number[]> {
const response = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: {
'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'text-embedding-3-small',
input: text,
dimensions: 1536,
}),
});

if (!response.ok) {
throw new OpenAIError(response.status, await response.text());
}

const data = await response.json();

// Cost: $0.00002 per 1K tokens
const tokenCount = Math.ceil(text.length / 4); // Rough estimate
await trackCost({
service: 'openai',
operation: 'embedding',
input_units: tokenCount,
cost_micros: Math.ceil(tokenCount / 1000 * 20), // $0.00002 = 20 microdollars per 1K
success: true,
});

return data.data[0].embedding;
}

export async function generateEmbeddingsBatch(texts: string[]): Promise<number[][]> {
const response = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: {
'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'text-embedding-3-small',
input: texts,
dimensions: 1536,
}),
});

if (!response.ok) {
throw new OpenAIError(response.status, await response.text());
}

const data = await response.json();

const totalTokens = data.usage.total_tokens;
await trackCost({
service: 'openai',
operation: 'embedding_batch',
input_units: totalTokens,
cost_micros: Math.ceil(totalTokens / 1000 * 20),
success: true,
});

return data.data.map((d: any) => d.embedding);
}

Chat Completions API

interface ClassificationResult {
topics: Array<{ label: string; confidence: number }>;
industries: Array<{ label: string; confidence: number }>;
sentiment: number;
entities: Array<{ name: string; type: string }>;
}

export async function classifyWithLLM(
headline: string,
bodyText: string,
taxonomyLabels: string[]
): Promise<ClassificationResult> {
const systemPrompt = `You are a news classifier. Given an article, extract:
1. Topics (from provided taxonomy)
2. Industries mentioned
3. Overall sentiment (-1 to 1)
4. Named entities

Respond in JSON format only.`;

const userPrompt = `Taxonomy labels: ${taxonomyLabels.join(', ')}

Article headline: ${headline}

Article body (truncated):
${bodyText.slice(0, 3000)}

Classify this article:`;

const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userPrompt },
],
response_format: { type: 'json_object' },
temperature: 0.3,
max_tokens: 500,
}),
});

if (!response.ok) {
throw new OpenAIError(response.status, await response.text());
}

const data = await response.json();

// Cost: $0.15/1M input, $0.60/1M output for gpt-4o-mini
const inputTokens = data.usage.prompt_tokens;
const outputTokens = data.usage.completion_tokens;
const costMicros = Math.ceil(inputTokens * 0.15 / 1000) + Math.ceil(outputTokens * 0.60 / 1000);

await trackCost({
service: 'openai',
operation: 'classification',
input_units: inputTokens,
output_units: outputTokens,
cost_micros: costMicros,
success: true,
});

return JSON.parse(data.choices[0].message.content);
}

Pricing

ModelInputOutput
text-embedding-3-small$0.02/1M tokens-
gpt-4o-mini$0.15/1M tokens$0.60/1M tokens
gpt-4o$2.50/1M tokens$10.00/1M tokens

Rate Limits

TierRPMTPM
Tier 150030,000
Tier 25,000150,000
Tier 35,0001,000,000

Workers AI

Overview

Cloudflare Workers AI provides serverless inference for embeddings and LLMs.

Available Models

TaskModelDimensions
Embeddings@cf/baai/bge-base-en-v1.5768
Embeddings@cf/baai/bge-large-en-v1.51024
Text Gen@cf/meta/llama-3-8b-instruct-
NER@cf/huggingface/distilbert-ner-

Embeddings

// src/services/workers-ai.ts

export async function generateEmbeddingWorkersAI(
ai: Ai,
text: string
): Promise<number[]> {
const result = await ai.run('@cf/baai/bge-base-en-v1.5', {
text: [text],
});

// Workers AI included in Workers Paid plan
// Pricing: $0.011 per 1M neurons
await trackCost({
service: 'workers_ai',
operation: 'embedding',
cost_micros: 11, // Rough estimate per embedding
success: true,
});

return result.data[0];
}

Text Generation

export async function classifyWithWorkersAI(
ai: Ai,
prompt: string
): Promise<string> {
const result = await ai.run('@cf/meta/llama-3-8b-instruct', {
messages: [
{ role: 'system', content: 'You are a news classifier. Respond in JSON.' },
{ role: 'user', content: prompt },
],
max_tokens: 500,
});

// Pricing: varies by model
await trackCost({
service: 'workers_ai',
operation: 'classification',
cost_micros: 50, // Estimate
success: true,
});

return result.response;
}

Named Entity Recognition

interface Entity {
word: string;
entity_group: string; // PER, ORG, LOC, MISC
score: number;
}

export async function extractEntitiesWorkersAI(
ai: Ai,
text: string
): Promise<Entity[]> {
const result = await ai.run('@cf/huggingface/distilbert-ner', {
text: text.slice(0, 5000), // Model limit
});

await trackCost({
service: 'workers_ai',
operation: 'ner',
cost_micros: 20,
success: true,
});

return result;
}

Error Handling

Unified Error Types

// src/errors/external-api.ts

export class ExternalAPIError extends Error {
constructor(
public service: string,
public statusCode: number,
public message: string,
public retryable: boolean = true,
public retryAfterMs?: number
) {
super(`${service} error (${statusCode}): ${message}`);
}
}

export class GoogleNewsError extends ExternalAPIError {
constructor(message: string) {
super('google_news', 0, message, true);
}
}

export class ZenRowsError extends ExternalAPIError {
constructor(statusCode: number, message: string) {
const retryable = [429, 500, 520, 521].includes(statusCode);
super('zenrows', statusCode, message, retryable);
}
}

export class DataForSEOError extends ExternalAPIError {
constructor(statusCode: number, message: string) {
const retryable = [429, 500].includes(statusCode);
super('data4seo', statusCode, message, retryable);
}
}

export class SharedCountError extends ExternalAPIError {
constructor(statusCode: number) {
const retryable = [429, 500].includes(statusCode);
super('sharedcount', statusCode, 'Request failed', retryable);
}
}

export class OpenAIError extends ExternalAPIError {
constructor(statusCode: number, message: string) {
const retryable = [429, 500, 503].includes(statusCode);
let retryAfter: number | undefined;

if (statusCode === 429) {
retryAfter = 60000; // 1 minute default
}

super('openai', statusCode, message, retryable, retryAfter);
}
}

Retry Logic

// src/utils/retry.ts

interface RetryOptions {
maxRetries: number;
baseDelayMs: number;
maxDelayMs: number;
exponentialBase: number;
}

const DEFAULT_OPTIONS: RetryOptions = {
maxRetries: 3,
baseDelayMs: 1000,
maxDelayMs: 30000,
exponentialBase: 2,
};

export async function withRetry<T>(
fn: () => Promise<T>,
options: Partial<RetryOptions> = {}
): Promise<T> {
const opts = { ...DEFAULT_OPTIONS, ...options };
let lastError: Error | undefined;

for (let attempt = 0; attempt <= opts.maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
lastError = error as Error;

if (error instanceof ExternalAPIError) {
if (!error.retryable) {
throw error;
}

if (error.retryAfterMs) {
await sleep(error.retryAfterMs);
continue;
}
}

if (attempt < opts.maxRetries) {
const delay = Math.min(
opts.baseDelayMs * Math.pow(opts.exponentialBase, attempt),
opts.maxDelayMs
);
const jitter = delay * 0.2 * Math.random();
await sleep(delay + jitter);
}
}
}

throw lastError;
}

Circuit Breaker

// src/utils/circuit-breaker.ts

interface CircuitBreakerState {
failures: number;
lastFailure: number;
state: 'closed' | 'open' | 'half-open';
}

const circuitBreakers = new Map<string, CircuitBreakerState>();

export async function withCircuitBreaker<T>(
service: string,
fn: () => Promise<T>,
options = { failureThreshold: 5, recoveryTimeMs: 60000 }
): Promise<T> {
const state = circuitBreakers.get(service) || {
failures: 0,
lastFailure: 0,
state: 'closed' as const,
};

// Check if circuit is open
if (state.state === 'open') {
if (Date.now() - state.lastFailure > options.recoveryTimeMs) {
state.state = 'half-open';
} else {
throw new Error(`Circuit breaker open for ${service}`);
}
}

try {
const result = await fn();

// Success - reset circuit
state.failures = 0;
state.state = 'closed';
circuitBreakers.set(service, state);

return result;
} catch (error) {
state.failures++;
state.lastFailure = Date.now();

if (state.failures >= options.failureThreshold) {
state.state = 'open';
}

circuitBreakers.set(service, state);
throw error;
}
}

Cost Tracking

Tracking Function

// src/services/cost-tracker.ts

interface CostEvent {
service: 'zenrows' | 'data4seo' | 'sharedcount' | 'openai' | 'workers_ai';
operation: string;
article_id?: string;
customer_id?: string;
workflow_id?: string;
input_units?: number;
output_units?: number;
cost_micros: number;
latency_ms?: number;
success: boolean;
error_code?: string;
metadata?: Record<string, any>;
}

export async function trackCost(event: CostEvent): Promise<void> {
await env.DB.prepare(`
INSERT INTO cost_events (
id, timestamp, service, operation, article_id, customer_id, workflow_id,
input_units, output_units, cost_micros, latency_ms, success, error_code, metadata
) VALUES (?, datetime('now'), ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
`).bind(
crypto.randomUUID(),
event.service,
event.operation,
event.article_id || null,
event.customer_id || null,
event.workflow_id || null,
event.input_units || 0,
event.output_units || 0,
event.cost_micros,
event.latency_ms || null,
event.success ? 1 : 0,
event.error_code || null,
event.metadata ? JSON.stringify(event.metadata) : null,
).run();
}

Cost Summary

export async function getDailyCostSummary(date: string): Promise<CostSummary> {
const rows = await env.DB.prepare(`
SELECT
service,
operation,
SUM(cost_micros) as total_micros,
COUNT(*) as operation_count,
SUM(CASE WHEN success = 1 THEN 1 ELSE 0 END) as success_count,
AVG(latency_ms) as avg_latency_ms
FROM cost_events
WHERE date(timestamp) = ?
GROUP BY service, operation
ORDER BY total_micros DESC
`).bind(date).all();

return {
date,
by_service: rows.results,
total_micros: rows.results.reduce((sum, r) => sum + r.total_micros, 0),
total_usd: rows.results.reduce((sum, r) => sum + r.total_micros, 0) / 1000000,
};
}

Rate Limiting

Per-Service Rate Limiters

// src/utils/rate-limiter.ts

const SERVICE_LIMITS = {
google_news: { rpm: 60, concurrent: 5 },
zenrows: { rpm: 300, concurrent: 25 },
data4seo: { rpm: 2000, concurrent: 50 },
sharedcount: { rpm: 600, concurrent: 10 },
openai: { rpm: 500, concurrent: 20 },
};

export async function withRateLimit<T>(
service: keyof typeof SERVICE_LIMITS,
fn: () => Promise<T>
): Promise<T> {
const limits = SERVICE_LIMITS[service];
const key = `ratelimit:${service}`;

// Check current minute's count
const current = await env.RATE_LIMITS.get(key);
const count = current ? parseInt(current, 10) : 0;

if (count >= limits.rpm) {
throw new RateLimitError(service, count);
}

// Increment count
await env.RATE_LIMITS.put(key, String(count + 1), { expirationTtl: 60 });

return fn();
}

Concurrent Request Limiter

class ConcurrencyLimiter {
private active = new Map<string, number>();

async acquire(service: string, limit: number): Promise<() => void> {
const current = this.active.get(service) || 0;

if (current >= limit) {
// Wait for a slot
await this.waitForSlot(service, limit);
}

this.active.set(service, (this.active.get(service) || 0) + 1);

return () => {
const count = this.active.get(service) || 1;
this.active.set(service, count - 1);
};
}

private async waitForSlot(service: string, limit: number): Promise<void> {
while ((this.active.get(service) || 0) >= limit) {
await sleep(100);
}
}
}

export const concurrencyLimiter = new ConcurrencyLimiter();

Last updated: 2024-01-15