External API Integrations

Integration specifications for Google News, ZenRows, DataForSEO, SharedCount, and AI services

Google News
ZenRows
RapidAPI Google News
DataForSEO
SharedCount
OpenAI
Workers AI
Fetch Fallback Strategy
Error Handling
Cost Tracking
Rate Limiting

Google News

See also: Google News Crawling Playbook for comprehensive crawl strategy, rate limiting, and failure handling.

Overview

Google News is our primary discovery source. We use both RSS feeds and HTML scraping.

RSS Feed Endpoints

Base URL: https://news.google.com/rss

Endpoint	Description	Example
`/rss`	Top stories	`https://news.google.com/rss`
`/rss/search`	Keyword search	`https://news.google.com/rss/search?q=artificial+intelligence`
`/rss/topics/{topic}`	Topic feed	`https://news.google.com/rss/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGRqTVhZU0FtVnVHZ0pWVXlnQVAB`

Query Parameters

Parameter	Description	Example
`q`	Search query	`q=artificial+intelligence`
`hl`	Language	`hl=en`
`gl`	Country	`gl=US`
`ceid`	Combined locale	`ceid=US:en`
`when`	Time filter	`when:1d`, `when:7d`

Request Example

// src/services/google-news.ts

interface GoogleNewsOptions {
  query?: string;
  language?: string;  // ISO 639-1
  country?: string;   // ISO 3166-1 alpha-2
  when?: '1h' | '1d' | '7d' | '30d';
}

export async function fetchGoogleNewsRSS(options: GoogleNewsOptions): Promise<RSSFeed> {
  const params = new URLSearchParams();
  
  if (options.query) {
    params.set('q', options.query);
  }
  
  const hl = options.language || 'en';
  const gl = options.country || 'US';
  params.set('hl', hl);
  params.set('gl', gl);
  params.set('ceid', `${gl}:${hl}`);
  
  if (options.when) {
    params.set('q', `${options.query || ''} when:${options.when}`.trim());
  }
  
  const url = options.query 
    ? `https://news.google.com/rss/search?${params}`
    : `https://news.google.com/rss?${params}`;
  
  const response = await fetch(url, {
    headers: {
      'User-Agent': 'Mozilla/5.0 (compatible; Noozer/1.0)',
      'Accept': 'application/rss+xml, application/xml, text/xml',
    },
  });
  
  if (!response.ok) {
    throw new GoogleNewsError(`RSS fetch failed: ${response.status}`);
  }
  
  return parseRSS(await response.text());
}

RSS Response Structure

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>artificial intelligence - Google News</title>
    <link>https://news.google.com/search?q=artificial%20intelligence</link>
    <item>
      <title>AI Breakthrough Announced - Tech News</title>
      <link>https://news.google.com/rss/articles/CBMi...</link>
      <guid isPermaLink="false">CBMi...</guid>
      <pubDate>Mon, 15 Jan 2024 12:00:00 GMT</pubDate>
      <description>Latest developments in artificial intelligence...</description>
      <source url="https://technews.com">Tech News</source>
    </item>
  </channel>
</rss>

URL Resolution

Google News URLs redirect to the actual article. We must follow redirects:

export async function resolveGoogleNewsUrl(gnUrl: string): Promise<string> {
  // Google News URL format: https://news.google.com/rss/articles/CBMi...
  
  const response = await fetch(gnUrl, {
    redirect: 'manual',
    headers: {
      'User-Agent': 'Mozilla/5.0 (compatible; Noozer/1.0)',
    },
  });
  
  // Follow the 302 redirect
  if (response.status === 302 || response.status === 301) {
    const location = response.headers.get('Location');
    if (location) {
      return location;
    }
  }
  
  // Some URLs require JavaScript execution - fall back to HTML parsing
  const html = await response.text();
  const match = html.match(/data-n-au="([^"]+)"/);
  if (match) {
    return decodeURIComponent(match[1]);
  }
  
  throw new Error('Could not resolve Google News URL');
}

Rate Limits

Limit	Value	Notes
Requests/minute	~60	Estimated, not documented
Requests/hour	~300	Before soft blocks
IP rotation	Recommended	Use multiple egress IPs

Best Practices

Spread requests over time - Don't burst requests
Rotate User-Agents - Vary the User-Agent header
Respect robots.txt - Honor crawl delays
Cache responses - RSS doesn't change frequently (5-15 min TTL)
Use RSS over HTML - Less likely to be blocked

ZenRows

Overview

ZenRows provides anti-bot bypass for fetching article content when direct requests fail.

Authentication

API Key: Provided in dashboard
Base URL: https://api.zenrows.com/v1/

Request Format

interface ZenRowsRequest {
  url: string;
  apikey: string;
  js_render?: boolean;        // Execute JavaScript
  antibot?: boolean;          // Anti-bot bypass (extra cost)
  premium_proxy?: boolean;    // Premium residential proxies
  proxy_country?: string;     // ISO country code
  wait?: number;              // Wait ms after load
  wait_for?: string;          // CSS selector to wait for
  css_extractor?: string;     // Extract specific elements
  json_response?: boolean;    // Return structured JSON
}

Basic Request

// src/services/zenrows.ts

interface ZenRowsOptions {
  jsRender?: boolean;
  antibot?: boolean;
  proxyCountry?: string;
  waitFor?: string;
}

export async function fetchWithZenRows(
  targetUrl: string,
  options: ZenRowsOptions = {}
): Promise<FetchResult> {
  const params = new URLSearchParams({
    apikey: env.ZENROWS_API_KEY,
    url: targetUrl,
  });
  
  if (options.jsRender) {
    params.set('js_render', 'true');
  }
  
  if (options.antibot) {
    params.set('antibot', 'true');
  }
  
  if (options.proxyCountry) {
    params.set('proxy_country', options.proxyCountry);
  }
  
  if (options.waitFor) {
    params.set('wait_for', options.waitFor);
  }
  
  const startTime = Date.now();
  
  const response = await fetch(`https://api.zenrows.com/v1/?${params}`, {
    headers: {
      'Accept': 'text/html',
    },
  });
  
  const latencyMs = Date.now() - startTime;
  
  // Track cost
  await trackCost({
    service: 'zenrows',
    operation: options.antibot ? 'fetch_antibot' : 'fetch_standard',
    cost_micros: calculateZenRowsCost(options),
    latency_ms: latencyMs,
    success: response.ok,
  });
  
  if (!response.ok) {
    const error = await response.text();
    throw new ZenRowsError(response.status, error);
  }
  
  return {
    html: await response.text(),
    statusCode: response.status,
    latencyMs,
  };
}

Pricing and Cost Calculation

Feature	Credits per Request
Standard request	1
JS rendering	5
Anti-bot	10-25
Premium proxy	10-25
Residential proxy	25

function calculateZenRowsCost(options: ZenRowsOptions): number {
  // Base cost: $0.001 per credit, in microdollars
  const COST_PER_CREDIT_MICROS = 1000; // $0.001 = 1000 microdollars
  
  let credits = 1; // Base
  
  if (options.jsRender) {
    credits = 5;
  }
  
  if (options.antibot) {
    credits = 25; // Max tier
  }
  
  return credits * COST_PER_CREDIT_MICROS;
}

Error Codes

Code	Meaning	Action
401	Invalid API key	Check credentials
402	Out of credits	Top up account
422	Invalid URL	Validate URL format
429	Rate limited	Back off and retry
500	ZenRows error	Retry with backoff
520	Target blocked	Try antibot mode
521	Target timeout	Increase wait time

Rate Limits

Plan	Requests/second	Concurrent
Starter	5	5
Professional	25	25
Enterprise	100+	100+

RapidAPI Google News

Overview

RapidAPI provides third-party Google News APIs as a fallback when both direct fetching and ZenRows fail. This is the third tier in our fetch strategy.

Recommended APIs

API	Provider	Reliability	Cost
Google News API	newscatcher	High	$0.001/req
Real-Time News	apidojo	Medium	$0.0005/req
Google News	serpdog	High	$0.002/req

Authentication

API Key: From RapidAPI dashboard
Base URL: Varies by provider
Header: X-RapidAPI-Key

Request Format (Newscatcher Example)

// src/services/rapidapi-news.ts

interface RapidAPINewsOptions {
  query: string;
  language?: string;
  country?: string;
  pageSize?: number;
}

export async function fetchWithRapidAPI(
  options: RapidAPINewsOptions
): Promise<NewsResult[]> {
  const startTime = Date.now();
  
  const response = await fetch(
    `https://google-news13.p.rapidapi.com/search?keyword=${encodeURIComponent(options.query)}&lr=${options.language || 'en-US'}`,
    {
      headers: {
        'X-RapidAPI-Key': env.RAPIDAPI_KEY,
        'X-RapidAPI-Host': 'google-news13.p.rapidapi.com',
      },
    }
  );
  
  const latencyMs = Date.now() - startTime;
  
  if (!response.ok) {
    throw new RapidAPIError(response.status, await response.text());
  }
  
  const data = await response.json();
  
  // Track cost: ~$0.001 per request = 1000 microdollars
  await trackCost({
    service: 'rapidapi',
    operation: 'google_news_search',
    cost_micros: 1000,
    latency_ms: latencyMs,
    success: true,
  });
  
  return parseRapidAPIResults(data);
}

function parseRapidAPIResults(data: any): NewsResult[] {
  return (data.items || []).map((item: any) => ({
    title: item.title,
    url: item.newsUrl,
    source: item.publisher,
    publishedAt: item.timestamp,
    snippet: item.snippet,
    imageUrl: item.images?.thumbnail,
  }));
}

Response Structure

{
  "status": "success",
  "items": [
    {
      "title": "AI Breakthrough Announced",
      "newsUrl": "https://example.com/article",
      "publisher": "Tech News",
      "timestamp": "2024-01-15T12:00:00Z",
      "snippet": "Latest developments in AI...",
      "images": {
        "thumbnail": "https://example.com/thumb.jpg"
      }
    }
  ]
}

Error Handling

export class RapidAPIError extends ExternalAPIError {
  constructor(statusCode: number, message: string) {
    const retryable = [429, 500, 502, 503].includes(statusCode);
    super('rapidapi', statusCode, message, retryable);
  }
}

Error Codes

Code	Meaning	Action
401	Invalid API key	Check RapidAPI credentials
403	Not subscribed	Subscribe to the API
429	Rate limited	Back off and retry
500	Provider error	Retry with backoff

Pricing

Plan	Requests/month	Cost
Basic	500	Free
Pro	10,000	$10/mo
Ultra	100,000	$50/mo
Mega	1,000,000	$200/mo

Rate Limits

Plan	Requests/second
Basic	1
Pro	5
Ultra	10
Mega	50

Fetch Fallback Strategy

Three-Tier Approach

┌─────────────────────────────────────────────────────────────────────────────┐
│                         FETCH FALLBACK STRATEGY                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   TIER 1: Direct Fetch (Free)                                              │
│   └── Try direct HTTP request to publisher                                 │
│       └── Success? → Done                                                  │
│       └── 403/Blocked/Timeout? → Tier 2                                    │
│                                                                             │
│   TIER 2: ZenRows ($$)                                                     │
│   └── Anti-bot bypass with JS rendering                                    │
│       └── Success? → Done                                                  │
│       └── Failed? → Tier 3                                                 │
│                                                                             │
│   TIER 3: RapidAPI ($)                                                     │
│   └── Third-party Google News API                                          │
│       └── Success? → Done                                                  │
│       └── Failed? → Mark as failed, log for review                         │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation

// src/services/fetch-with-fallback.ts

type FetchMethod = 'direct' | 'zenrows' | 'rapidapi' | 'failed';

interface FetchResult {
  html?: string;
  articles?: NewsResult[];
  method: FetchMethod;
  latencyMs: number;
  error?: string;
}

export async function fetchWithFallback(
  url: string,
  keywordId: string
): Promise<FetchResult> {
  const startTime = Date.now();
  
  // TIER 1: Direct fetch
  try {
    const html = await directFetch(url);
    await logCrawlHistory(keywordId, 'direct', null);
    return { html, method: 'direct', latencyMs: Date.now() - startTime };
  } catch (directError) {
    console.log(`Direct fetch failed: ${directError.message}`);
  }
  
  // TIER 2: ZenRows
  try {
    const result = await fetchWithZenRows(url, { antibot: true });
    await logCrawlHistory(keywordId, 'zenrows', null);
    return { html: result.html, method: 'zenrows', latencyMs: Date.now() - startTime };
  } catch (zenrowsError) {
    console.log(`ZenRows failed: ${zenrowsError.message}`);
  }
  
  // TIER 3: RapidAPI (for Google News search, not article fetch)
  try {
    const articles = await fetchWithRapidAPI({ query: extractKeyword(url) });
    await logCrawlHistory(keywordId, 'rapidapi', null);
    return { articles, method: 'rapidapi', latencyMs: Date.now() - startTime };
  } catch (rapidError) {
    console.log(`RapidAPI failed: ${rapidError.message}`);
  }
  
  // All methods failed
  await logCrawlHistory(keywordId, 'failed', 'All fetch methods exhausted');
  return { 
    method: 'failed', 
    latencyMs: Date.now() - startTime,
    error: 'All fetch methods failed' 
  };
}

async function logCrawlHistory(
  keywordId: string,
  method: FetchMethod,
  error: string | null
): Promise<void> {
  await env.DB.prepare(`
    INSERT INTO keyword_crawl_history (
      keyword_id, crawled_at, fetch_method, error_message
    ) VALUES (?, datetime('now'), ?, ?)
  `).bind(keywordId, method, error).run();
}

Fallback Decision Matrix

Scenario	Tier 1 (Direct)	Tier 2 (ZenRows)	Tier 3 (RapidAPI)
Normal article	Try first	If 403/blocked	If ZenRows fails
Paywall site	Skip	Try with antibot	Last resort
Rate limited	Retry later	Try immediately	If ZenRows fails
Known blocker	Skip	Try first	If ZenRows fails
JS-heavy site	Skip	Try with JS render	N/A

Cost Comparison

Method	Cost per Request	Best For
Direct	$0	Most news sites
ZenRows (standard)	$0.001	Light anti-bot
ZenRows (antibot)	$0.025	Heavy anti-bot
RapidAPI	$0.001	Fallback discovery

Admin Dashboard Metrics

The admin dashboard shows fallback rates via v_fetch_fallback_rates view:

-- Example output
| date       | total | direct_pct | zenrows_pct | rapidapi_pct | failed_pct |
|------------|-------|------------|-------------|--------------|------------|
| 2024-01-15 | 5000  | 85.0       | 12.0        | 2.5          | 0.5        |
| 2024-01-14 | 4800  | 82.0       | 14.0        | 3.0          | 1.0        |

Alert Thresholds:

Direct rate < 70%: Investigate blocking patterns
ZenRows rate > 20%: Cost concern, check blocked domains
RapidAPI rate > 5%: ZenRows may have issues
Failed rate > 2%: Critical - review failing domains

DataForSEO

Overview

DataForSEO provides:

Google News SERP data (alternative discovery)
Backlink analysis
Domain authority scores

Authentication

Login: Email
Password: API password (from dashboard)
Base URL: https://api.dataforseo.com/v3/
Authorization: Basic base64(login:password)

Google News API

// src/services/dataforseo.ts

interface NewsSearchParams {
  keyword: string;
  location_code?: number;     // 2840 = US
  language_code?: string;     // en
  date_range?: string;        // past_24_hours, past_week, past_month
  limit?: number;             // Max results (up to 100)
}

export async function searchGoogleNews(params: NewsSearchParams): Promise<NewsResult[]> {
  const auth = Buffer.from(`${env.D4SEO_LOGIN}:${env.D4SEO_PASSWORD}`).toString('base64');
  
  const body = [{
    keyword: params.keyword,
    location_code: params.location_code || 2840,
    language_code: params.language_code || 'en',
    date_range: params.date_range || 'past_24_hours',
  }];
  
  const startTime = Date.now();
  
  const response = await fetch('https://api.dataforseo.com/v3/serp/google/news/live/advanced', {
    method: 'POST',
    headers: {
      'Authorization': `Basic ${auth}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify(body),
  });
  
  const latencyMs = Date.now() - startTime;
  
  if (!response.ok) {
    throw new DataForSEOError(response.status, await response.text());
  }
  
  const data = await response.json();
  
  // Track cost
  await trackCost({
    service: 'data4seo',
    operation: 'google_news_search',
    cost_micros: 2500, // $0.0025 per request
    latency_ms: latencyMs,
    success: true,
  });
  
  return parseNewsResults(data);
}

Response Structure

{
  "tasks": [{
    "result": [{
      "keyword": "artificial intelligence",
      "items_count": 100,
      "items": [{
        "type": "news_search",
        "title": "AI Breakthrough",
        "url": "https://example.com/article",
        "domain": "example.com",
        "source": "Example News",
        "date": "2024-01-15T12:00:00+00:00",
        "snippet": "Article description...",
        "image_url": "https://example.com/image.jpg"
      }]
    }]
  }]
}

Backlinks API

interface BacklinkParams {
  target: string;  // Domain or URL
  limit?: number;
}

export async function getBacklinkSummary(params: BacklinkParams): Promise<BacklinkSummary> {
  const auth = Buffer.from(`${env.D4SEO_LOGIN}:${env.D4SEO_PASSWORD}`).toString('base64');
  
  const body = [{
    target: params.target,
    limit: params.limit || 1,
  }];
  
  const response = await fetch('https://api.dataforseo.com/v3/backlinks/summary/live', {
    method: 'POST',
    headers: {
      'Authorization': `Basic ${auth}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify(body),
  });
  
  // Cost: $0.02 per domain
  await trackCost({
    service: 'data4seo',
    operation: 'backlink_summary',
    cost_micros: 20000,
    success: response.ok,
  });
  
  const data = await response.json();
  return parseBacklinkSummary(data);
}

Backlink Response

{
  "tasks": [{
    "result": [{
      "target": "nytimes.com",
      "total_backlinks": 15000000,
      "referring_domains": 250000,
      "referring_main_domains": 180000,
      "rank": 85,
      "backlinks_spam_score": 5,
      "broken_backlinks": 50000,
      "broken_pages": 1000
    }]
  }]
}

Pricing

API	Cost per Request
Google News SERP	$0.0025
Backlink Summary	$0.02
Backlink History	$0.02
Domain Analytics	$0.05

Rate Limits

Plan	Requests/minute
Standard	2000
Plus	5000
Pro	10000

SharedCount

Overview

SharedCount provides social media engagement metrics for URLs.

Authentication

API Key: From dashboard
Base URL: https://api.sharedcount.com/v1.0/

Request Format

// src/services/sharedcount.ts

interface SocialMetrics {
  facebook: {
    share_count: number;
    comment_count: number;
    reaction_count: number;
  };
  twitter: number;
  pinterest: number;
  linkedin: number;
  total_engagement: number;
}

export async function fetchSocialMetrics(url: string): Promise<SocialMetrics> {
  const params = new URLSearchParams({
    apikey: env.SHAREDCOUNT_API_KEY,
    url: url,
  });
  
  const startTime = Date.now();
  
  const response = await fetch(`https://api.sharedcount.com/v1.0/?${params}`);
  
  const latencyMs = Date.now() - startTime;
  
  if (!response.ok) {
    throw new SharedCountError(response.status);
  }
  
  const data = await response.json();
  
  // Track cost: $0.0001 per request on Plus plan
  await trackCost({
    service: 'sharedcount',
    operation: 'fetch_metrics',
    cost_micros: 100,
    latency_ms: latencyMs,
    success: true,
  });
  
  return parseSocialMetrics(data);
}

function parseSocialMetrics(data: any): SocialMetrics {
  const facebook = data.Facebook || {};
  const twitter = data.Twitter || 0;
  const pinterest = data.Pinterest || 0;
  const linkedin = data.LinkedIn || 0;
  
  const fbTotal = (facebook.share_count || 0) + 
                  (facebook.comment_count || 0) + 
                  (facebook.reaction_count || 0);
  
  return {
    facebook: {
      share_count: facebook.share_count || 0,
      comment_count: facebook.comment_count || 0,
      reaction_count: facebook.reaction_count || 0,
    },
    twitter,
    pinterest,
    linkedin,
    total_engagement: fbTotal + twitter + pinterest + linkedin,
  };
}

Response Structure

{
  "Facebook": {
    "share_count": 1500,
    "comment_count": 200,
    "reaction_count": 850
  },
  "Twitter": 750,
  "Pinterest": 50,
  "LinkedIn": 120,
  "StumbleUpon": 0
}

Pricing

Plan	Requests/month	Cost
Free	500	$0
Basic	10,000	$40/mo
Plus	100,000	$80/mo
Business	500,000	$200/mo

Rate Limits

Plan	Requests/second
Free	1
Basic	5
Plus	10
Business	25

OpenAI

Overview

OpenAI provides:

Text embeddings (text-embedding-3-small)
LLM inference for classification and summarization

Authentication

API Key: sk-...
Base URL: https://api.openai.com/v1/

Embeddings API

// src/services/openai.ts

export async function generateEmbedding(text: string): Promise<number[]> {
  const response = await fetch('https://api.openai.com/v1/embeddings', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'text-embedding-3-small',
      input: text,
      dimensions: 1536,
    }),
  });
  
  if (!response.ok) {
    throw new OpenAIError(response.status, await response.text());
  }
  
  const data = await response.json();
  
  // Cost: $0.00002 per 1K tokens
  const tokenCount = Math.ceil(text.length / 4); // Rough estimate
  await trackCost({
    service: 'openai',
    operation: 'embedding',
    input_units: tokenCount,
    cost_micros: Math.ceil(tokenCount / 1000 * 20), // $0.00002 = 20 microdollars per 1K
    success: true,
  });
  
  return data.data[0].embedding;
}

export async function generateEmbeddingsBatch(texts: string[]): Promise<number[][]> {
  const response = await fetch('https://api.openai.com/v1/embeddings', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'text-embedding-3-small',
      input: texts,
      dimensions: 1536,
    }),
  });
  
  if (!response.ok) {
    throw new OpenAIError(response.status, await response.text());
  }
  
  const data = await response.json();
  
  const totalTokens = data.usage.total_tokens;
  await trackCost({
    service: 'openai',
    operation: 'embedding_batch',
    input_units: totalTokens,
    cost_micros: Math.ceil(totalTokens / 1000 * 20),
    success: true,
  });
  
  return data.data.map((d: any) => d.embedding);
}

Chat Completions API

interface ClassificationResult {
  topics: Array<{ label: string; confidence: number }>;
  industries: Array<{ label: string; confidence: number }>;
  sentiment: number;
  entities: Array<{ name: string; type: string }>;
}

export async function classifyWithLLM(
  headline: string,
  bodyText: string,
  taxonomyLabels: string[]
): Promise<ClassificationResult> {
  const systemPrompt = `You are a news classifier. Given an article, extract:
1. Topics (from provided taxonomy)
2. Industries mentioned
3. Overall sentiment (-1 to 1)
4. Named entities

Respond in JSON format only.`;

  const userPrompt = `Taxonomy labels: ${taxonomyLabels.join(', ')}

Article headline: ${headline}

Article body (truncated):
${bodyText.slice(0, 3000)}

Classify this article:`;

  const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${env.OPENAI_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'gpt-4o-mini',
      messages: [
        { role: 'system', content: systemPrompt },
        { role: 'user', content: userPrompt },
      ],
      response_format: { type: 'json_object' },
      temperature: 0.3,
      max_tokens: 500,
    }),
  });
  
  if (!response.ok) {
    throw new OpenAIError(response.status, await response.text());
  }
  
  const data = await response.json();
  
  // Cost: $0.15/1M input, $0.60/1M output for gpt-4o-mini
  const inputTokens = data.usage.prompt_tokens;
  const outputTokens = data.usage.completion_tokens;
  const costMicros = Math.ceil(inputTokens * 0.15 / 1000) + Math.ceil(outputTokens * 0.60 / 1000);
  
  await trackCost({
    service: 'openai',
    operation: 'classification',
    input_units: inputTokens,
    output_units: outputTokens,
    cost_micros: costMicros,
    success: true,
  });
  
  return JSON.parse(data.choices[0].message.content);
}

Pricing

Model	Input	Output
text-embedding-3-small	$0.02/1M tokens	-
gpt-4o-mini	$0.15/1M tokens	$0.60/1M tokens
gpt-4o	$2.50/1M tokens	$10.00/1M tokens

Rate Limits

Tier	RPM	TPM
Tier 1	500	30,000
Tier 2	5,000	150,000
Tier 3	5,000	1,000,000

Workers AI

Overview

Cloudflare Workers AI provides serverless inference for embeddings and LLMs.

Available Models

Task	Model	Dimensions
Embeddings	@cf/baai/bge-base-en-v1.5	768
Embeddings	@cf/baai/bge-large-en-v1.5	1024
Text Gen	@cf/meta/llama-3-8b-instruct	-
NER	@cf/huggingface/distilbert-ner	-

Embeddings

// src/services/workers-ai.ts

export async function generateEmbeddingWorkersAI(
  ai: Ai,
  text: string
): Promise<number[]> {
  const result = await ai.run('@cf/baai/bge-base-en-v1.5', {
    text: [text],
  });
  
  // Workers AI included in Workers Paid plan
  // Pricing: $0.011 per 1M neurons
  await trackCost({
    service: 'workers_ai',
    operation: 'embedding',
    cost_micros: 11, // Rough estimate per embedding
    success: true,
  });
  
  return result.data[0];
}

Text Generation

export async function classifyWithWorkersAI(
  ai: Ai,
  prompt: string
): Promise<string> {
  const result = await ai.run('@cf/meta/llama-3-8b-instruct', {
    messages: [
      { role: 'system', content: 'You are a news classifier. Respond in JSON.' },
      { role: 'user', content: prompt },
    ],
    max_tokens: 500,
  });
  
  // Pricing: varies by model
  await trackCost({
    service: 'workers_ai',
    operation: 'classification',
    cost_micros: 50, // Estimate
    success: true,
  });
  
  return result.response;
}

Named Entity Recognition

interface Entity {
  word: string;
  entity_group: string;  // PER, ORG, LOC, MISC
  score: number;
}

export async function extractEntitiesWorkersAI(
  ai: Ai,
  text: string
): Promise<Entity[]> {
  const result = await ai.run('@cf/huggingface/distilbert-ner', {
    text: text.slice(0, 5000), // Model limit
  });
  
  await trackCost({
    service: 'workers_ai',
    operation: 'ner',
    cost_micros: 20,
    success: true,
  });
  
  return result;
}

Error Handling

Unified Error Types

// src/errors/external-api.ts

export class ExternalAPIError extends Error {
  constructor(
    public service: string,
    public statusCode: number,
    public message: string,
    public retryable: boolean = true,
    public retryAfterMs?: number
  ) {
    super(`${service} error (${statusCode}): ${message}`);
  }
}

export class GoogleNewsError extends ExternalAPIError {
  constructor(message: string) {
    super('google_news', 0, message, true);
  }
}

export class ZenRowsError extends ExternalAPIError {
  constructor(statusCode: number, message: string) {
    const retryable = [429, 500, 520, 521].includes(statusCode);
    super('zenrows', statusCode, message, retryable);
  }
}

export class DataForSEOError extends ExternalAPIError {
  constructor(statusCode: number, message: string) {
    const retryable = [429, 500].includes(statusCode);
    super('data4seo', statusCode, message, retryable);
  }
}

export class SharedCountError extends ExternalAPIError {
  constructor(statusCode: number) {
    const retryable = [429, 500].includes(statusCode);
    super('sharedcount', statusCode, 'Request failed', retryable);
  }
}

export class OpenAIError extends ExternalAPIError {
  constructor(statusCode: number, message: string) {
    const retryable = [429, 500, 503].includes(statusCode);
    let retryAfter: number | undefined;
    
    if (statusCode === 429) {
      retryAfter = 60000; // 1 minute default
    }
    
    super('openai', statusCode, message, retryable, retryAfter);
  }
}

Retry Logic

// src/utils/retry.ts

interface RetryOptions {
  maxRetries: number;
  baseDelayMs: number;
  maxDelayMs: number;
  exponentialBase: number;
}

const DEFAULT_OPTIONS: RetryOptions = {
  maxRetries: 3,
  baseDelayMs: 1000,
  maxDelayMs: 30000,
  exponentialBase: 2,
};

export async function withRetry<T>(
  fn: () => Promise<T>,
  options: Partial<RetryOptions> = {}
): Promise<T> {
  const opts = { ...DEFAULT_OPTIONS, ...options };
  let lastError: Error | undefined;
  
  for (let attempt = 0; attempt <= opts.maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error as Error;
      
      if (error instanceof ExternalAPIError) {
        if (!error.retryable) {
          throw error;
        }
        
        if (error.retryAfterMs) {
          await sleep(error.retryAfterMs);
          continue;
        }
      }
      
      if (attempt < opts.maxRetries) {
        const delay = Math.min(
          opts.baseDelayMs * Math.pow(opts.exponentialBase, attempt),
          opts.maxDelayMs
        );
        const jitter = delay * 0.2 * Math.random();
        await sleep(delay + jitter);
      }
    }
  }
  
  throw lastError;
}

Circuit Breaker

// src/utils/circuit-breaker.ts

interface CircuitBreakerState {
  failures: number;
  lastFailure: number;
  state: 'closed' | 'open' | 'half-open';
}

const circuitBreakers = new Map<string, CircuitBreakerState>();

export async function withCircuitBreaker<T>(
  service: string,
  fn: () => Promise<T>,
  options = { failureThreshold: 5, recoveryTimeMs: 60000 }
): Promise<T> {
  const state = circuitBreakers.get(service) || {
    failures: 0,
    lastFailure: 0,
    state: 'closed' as const,
  };
  
  // Check if circuit is open
  if (state.state === 'open') {
    if (Date.now() - state.lastFailure > options.recoveryTimeMs) {
      state.state = 'half-open';
    } else {
      throw new Error(`Circuit breaker open for ${service}`);
    }
  }
  
  try {
    const result = await fn();
    
    // Success - reset circuit
    state.failures = 0;
    state.state = 'closed';
    circuitBreakers.set(service, state);
    
    return result;
  } catch (error) {
    state.failures++;
    state.lastFailure = Date.now();
    
    if (state.failures >= options.failureThreshold) {
      state.state = 'open';
    }
    
    circuitBreakers.set(service, state);
    throw error;
  }
}

Cost Tracking

Tracking Function

// src/services/cost-tracker.ts

interface CostEvent {
  service: 'zenrows' | 'data4seo' | 'sharedcount' | 'openai' | 'workers_ai';
  operation: string;
  article_id?: string;
  customer_id?: string;
  workflow_id?: string;
  input_units?: number;
  output_units?: number;
  cost_micros: number;
  latency_ms?: number;
  success: boolean;
  error_code?: string;
  metadata?: Record<string, any>;
}

export async function trackCost(event: CostEvent): Promise<void> {
  await env.DB.prepare(`
    INSERT INTO cost_events (
      id, timestamp, service, operation, article_id, customer_id, workflow_id,
      input_units, output_units, cost_micros, latency_ms, success, error_code, metadata
    ) VALUES (?, datetime('now'), ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
  `).bind(
    crypto.randomUUID(),
    event.service,
    event.operation,
    event.article_id || null,
    event.customer_id || null,
    event.workflow_id || null,
    event.input_units || 0,
    event.output_units || 0,
    event.cost_micros,
    event.latency_ms || null,
    event.success ? 1 : 0,
    event.error_code || null,
    event.metadata ? JSON.stringify(event.metadata) : null,
  ).run();
}

Cost Summary

export async function getDailyCostSummary(date: string): Promise<CostSummary> {
  const rows = await env.DB.prepare(`
    SELECT 
      service,
      operation,
      SUM(cost_micros) as total_micros,
      COUNT(*) as operation_count,
      SUM(CASE WHEN success = 1 THEN 1 ELSE 0 END) as success_count,
      AVG(latency_ms) as avg_latency_ms
    FROM cost_events
    WHERE date(timestamp) = ?
    GROUP BY service, operation
    ORDER BY total_micros DESC
  `).bind(date).all();
  
  return {
    date,
    by_service: rows.results,
    total_micros: rows.results.reduce((sum, r) => sum + r.total_micros, 0),
    total_usd: rows.results.reduce((sum, r) => sum + r.total_micros, 0) / 1000000,
  };
}

Rate Limiting

Per-Service Rate Limiters

// src/utils/rate-limiter.ts

const SERVICE_LIMITS = {
  google_news: { rpm: 60, concurrent: 5 },
  zenrows: { rpm: 300, concurrent: 25 },
  data4seo: { rpm: 2000, concurrent: 50 },
  sharedcount: { rpm: 600, concurrent: 10 },
  openai: { rpm: 500, concurrent: 20 },
};

export async function withRateLimit<T>(
  service: keyof typeof SERVICE_LIMITS,
  fn: () => Promise<T>
): Promise<T> {
  const limits = SERVICE_LIMITS[service];
  const key = `ratelimit:${service}`;
  
  // Check current minute's count
  const current = await env.RATE_LIMITS.get(key);
  const count = current ? parseInt(current, 10) : 0;
  
  if (count >= limits.rpm) {
    throw new RateLimitError(service, count);
  }
  
  // Increment count
  await env.RATE_LIMITS.put(key, String(count + 1), { expirationTtl: 60 });
  
  return fn();
}

Concurrent Request Limiter

class ConcurrencyLimiter {
  private active = new Map<string, number>();
  
  async acquire(service: string, limit: number): Promise<() => void> {
    const current = this.active.get(service) || 0;
    
    if (current >= limit) {
      // Wait for a slot
      await this.waitForSlot(service, limit);
    }
    
    this.active.set(service, (this.active.get(service) || 0) + 1);
    
    return () => {
      const count = this.active.get(service) || 1;
      this.active.set(service, count - 1);
    };
  }
  
  private async waitForSlot(service: string, limit: number): Promise<void> {
    while ((this.active.get(service) || 0) >= limit) {
      await sleep(100);
    }
  }
}

export const concurrencyLimiter = new ConcurrencyLimiter();

Last updated: 2024-01-15

Table of Contents​

Google News​

Overview​

RSS Feed Endpoints​

Query Parameters​

Request Example​

RSS Response Structure​

URL Resolution​

Rate Limits​

Best Practices​

ZenRows​

Overview​

Authentication​

Request Format​

Basic Request​

Pricing and Cost Calculation​

Error Codes​

Rate Limits​

RapidAPI Google News​

Overview​

Recommended APIs​

Authentication​

Request Format (Newscatcher Example)​

Response Structure​

Error Handling​

Error Codes​

Pricing​

Rate Limits​

Fetch Fallback Strategy​

Three-Tier Approach​

Implementation​

Fallback Decision Matrix​

Cost Comparison​

Admin Dashboard Metrics​

DataForSEO​

Overview​

Authentication​

Google News API​

Response Structure​

Backlinks API​

Backlink Response​

Pricing​

Rate Limits​

SharedCount​

Overview​

Authentication​

Request Format​

Response Structure​

Pricing​

Rate Limits​

OpenAI​

Overview​

Authentication​

Embeddings API​

Chat Completions API​

Pricing​

Rate Limits​

Workers AI​

Overview​

Available Models​

Embeddings​

Text Generation​

Named Entity Recognition​

Error Handling​

Unified Error Types​

Retry Logic​

Circuit Breaker​

Cost Tracking​

Tracking Function​

Cost Summary​

Rate Limiting​

Per-Service Rate Limiters​

Concurrent Request Limiter​

Table of Contents

Google News

Overview

RSS Feed Endpoints

Query Parameters

Request Example

RSS Response Structure

URL Resolution

Rate Limits

Best Practices

ZenRows

Overview

Authentication

Request Format

Basic Request

Pricing and Cost Calculation

Error Codes

Rate Limits

RapidAPI Google News

Overview

Recommended APIs

Authentication

Request Format (Newscatcher Example)

Response Structure

Error Handling

Error Codes

Pricing

Rate Limits

Fetch Fallback Strategy

Three-Tier Approach

Implementation

Fallback Decision Matrix

Cost Comparison

Admin Dashboard Metrics

DataForSEO

Overview

Authentication

Google News API

Response Structure

Backlinks API

Backlink Response

Pricing

Rate Limits

SharedCount

Overview

Authentication

Request Format

Response Structure

Pricing

Rate Limits

OpenAI

Overview

Authentication

Embeddings API

Chat Completions API

Pricing

Rate Limits

Workers AI

Overview

Available Models

Embeddings

Text Generation

Named Entity Recognition

Error Handling

Unified Error Types

Retry Logic

Circuit Breaker

Cost Tracking

Tracking Function

Cost Summary

Rate Limiting

Per-Service Rate Limiters

Concurrent Request Limiter