Skip to main content

Taxonomy Seed Data

Initial classification labels for topics, industries, tactics, and risk signals


Overview

The taxonomy provides a hierarchical classification system for news articles:

CategoryPurposeExample
topicWhat the article is aboutTechnology, Politics, Sports
industryBusiness sector relevanceHealthcare, Finance, Retail
tacticPR/communication patternsProduct Launch, Crisis Response
riskPotential concernsRegulatory, Reputation, Financial

Topics Hierarchy

Level 1: Primary Topics

  • Business - Economy, Markets, Corporate, Startups, Real Estate
  • Technology - AI/ML, Software, Hardware, Cybersecurity, Crypto, Social Media
  • Politics - US Politics, International Relations, Policy, Elections
  • Science - Research, Space, Environment, Climate
  • Health - Public Health, Medical Research, Mental Health
  • Entertainment - Movies/TV, Music, Gaming, Celebrity
  • Sports - Football, Basketball, Baseball, Soccer, Esports
  • World - Europe, Asia, Middle East, Americas
  • Crime - Violent Crime, White Collar, Cybercrime
  • Legal - Court Cases, Legislation, Regulatory
  • Opinion - Editorial, Analysis, Commentary

Topic Keyword Patterns

LabelPatterns
AI and Machine Learningartificial intelligence, AI, machine learning, ML, deep learning, neural network, GPT, LLM, ChatGPT, OpenAI, Anthropic
Crypto and Blockchainbitcoin, ethereum, cryptocurrency, crypto, blockchain, NFT, DeFi, Web3
Climateclimate change, global warming, carbon, emissions, renewable energy, sustainability, net zero
US PoliticsCongress, Senate, White House, Capitol, Democrat, Republican, GOP, Biden, Trump
Cybersecuritycybersecurity, security breach, hack, hacked, ransomware, malware, phishing, data breach
SpaceNASA, SpaceX, rocket, satellite, astronaut, Mars, moon landing, space exploration

Industries Hierarchy

Technology Sector

  • Software and SaaS
  • Semiconductors
  • Internet and Social Media
  • Cloud Computing
  • AI and Data
  • Consumer Electronics

Financial Services

  • Banking
  • Insurance
  • Investment Management
  • Fintech
  • Private Equity
  • Cryptocurrency

Healthcare

  • Pharmaceuticals
  • Biotechnology
  • Medical Devices
  • Healthcare Providers
  • Digital Health

Consumer

  • Retail
  • E-commerce
  • Consumer Goods
  • Food and Beverage
  • Restaurants and Hospitality

Energy

  • Oil and Gas
  • Utilities
  • Renewable Energy
  • Mining

Industrial

  • Manufacturing
  • Aerospace and Defense
  • Automotive
  • Construction
  • Logistics

Media and Entertainment

  • Film and Television
  • Music
  • Publishing
  • Gaming
  • Streaming

Other Sectors

  • Telecommunications
  • Real Estate
  • Professional Services
  • Education
  • Government

Industry Keyword Patterns

LabelPatterns
Software and SaaSSaaS, software as a service, enterprise software, Salesforce, Microsoft, Oracle, SAP
Pharmaceuticalspharma, pharmaceutical, drug maker, FDA approval, clinical trial, Pfizer, Merck, J and J
Semiconductorssemiconductor, chip, processor, NVIDIA, Intel, AMD, TSMC, chip shortage
Fintechfintech, payment, digital payment, Stripe, Square, PayPal, mobile banking
Automotiveautomotive, automaker, car manufacturer, Tesla, Ford, GM, Toyota, EV, electric vehicle

PR Tactics

Announcement Types

  • Product Launch - New product or service announcement
  • Feature Update - Updates to existing products
  • Partnership Announcement - Strategic partnerships, collaborations
  • Acquisition Announcement - M and A activity
  • Funding Announcement - Investment rounds, fundraising
  • IPO Announcement - Going public
  • Executive Hire - Leadership changes
  • Earnings Release - Financial results

Crisis Communications

  • Crisis Response - Response to negative events
  • Apology Statement - Public apologies
  • Recall Announcement - Product recalls

Thought Leadership

  • Research Report - Studies, surveys, industry reports
  • Expert Commentary - Op-eds, expert opinions

Corporate Communications

  • ESG Announcement - Sustainability, social responsibility
  • Sponsored Content - Paid/partner content

Tactic Keyword Patterns

LabelPatterns
Product Launchlaunches new, unveils, introduces, announces new, now available, coming soon
Funding Announcementraises dollar, secures funding, closes round, Series A/B/C/D, seed round, funding round
Acquisitionacquires, to acquire, acquired by, acquisition, merger, deal valued at
Crisis Responseapologizes, regrets, takes responsibility, in response to, addressing concerns
Earnings Releasequarterly results, earnings report, revenue of, profit of, financial results

Risk Signals

Financial Risk

  • Revenue Decline - Declining sales or revenue
  • Profit Warning - Earnings below expectations
  • Debt Concerns - Leverage, credit issues
  • Bankruptcy Risk - Insolvency, Chapter 11

Regulatory Risk

  • Investigation - Government probes, inquiries
  • Lawsuit - Legal actions, class actions
  • Regulatory Action - Fines, penalties, enforcement
  • Antitrust - Competition concerns
  • Privacy Violation - GDPR, data protection issues

Operational Risk

  • Security Breach - Cyber attacks, data breaches
  • Product Recall - Safety issues, defects
  • Service Outage - System failures, downtime
  • Layoffs - Workforce reductions

Reputation Risk

  • Executive Misconduct - Leadership scandals
  • Public Backlash - Boycotts, negative reactions
  • Whistleblower - Internal complaints, leaks

Strategic Risk

  • Market Share Loss - Competitive pressure
  • Failed Launch - Product failures
  • Executive Departure - Key person leaving

Risk Signal Patterns

LabelSeverityPatterns
InvestigationHighunder investigation, SEC probe, DOJ inquiry, FTC investigation, subpoena
LawsuitMediumlawsuit, legal action, sued, class action, litigation, plaintiff
Data BreachHighdata breach, security breach, hack, hacked, ransomware, compromised
LayoffsLowlayoffs, job cuts, workforce reduction, downsizing, eliminating positions
BankruptcyHighbankruptcy, Chapter 11, insolvency, creditor, restructuring
Product RecallHighrecall, recalled, safety concern, defect, malfunction

Seed SQL

-- TOPICS
INSERT INTO taxonomy_labels (id, category, label, parent_id, description, keyword_patterns) VALUES
-- Primary topics
('topic-business', 'topic', 'Business', NULL, 'Business and corporate news', '["business","corporate","company"]'),
('topic-technology', 'topic', 'Technology', NULL, 'Technology and innovation', '["technology","tech","digital"]'),
('topic-politics', 'topic', 'Politics', NULL, 'Political news', '["politics","government","election"]'),
('topic-science', 'topic', 'Science', NULL, 'Scientific news', '["science","research","study"]'),
('topic-health', 'topic', 'Health', NULL, 'Health news', '["health","medical","healthcare"]'),
('topic-entertainment', 'topic', 'Entertainment', NULL, 'Entertainment news', '["entertainment","movie","music"]'),
('topic-sports', 'topic', 'Sports', NULL, 'Sports news', '["sports","game","championship"]'),
('topic-world', 'topic', 'World', NULL, 'International news', '["international","global","world"]'),
('topic-crime', 'topic', 'Crime', NULL, 'Crime news', '["crime","arrest","police"]'),
('topic-legal', 'topic', 'Legal', NULL, 'Legal news', '["legal","court","lawsuit"]'),
-- Technology subtopics
('topic-ai', 'topic', 'AI and Machine Learning', 'topic-technology', 'AI news', '["AI","machine learning","GPT","LLM"]'),
('topic-crypto', 'topic', 'Crypto and Blockchain', 'topic-technology', 'Crypto news', '["bitcoin","crypto","blockchain"]'),
('topic-cybersecurity', 'topic', 'Cybersecurity', 'topic-technology', 'Security news', '["cybersecurity","hack","breach"]'),
('topic-software', 'topic', 'Software', 'topic-technology', 'Software news', '["software","app","SaaS"]');

-- INDUSTRIES
INSERT INTO taxonomy_labels (id, category, label, parent_id, description, keyword_patterns) VALUES
('ind-tech', 'industry', 'Technology', NULL, 'Tech sector', '["tech company","software company"]'),
('ind-finance', 'industry', 'Financial Services', NULL, 'Finance sector', '["bank","financial services"]'),
('ind-healthcare', 'industry', 'Healthcare', NULL, 'Healthcare sector', '["healthcare","hospital","pharma"]'),
('ind-consumer', 'industry', 'Consumer', NULL, 'Consumer sector', '["retail","consumer goods"]'),
('ind-energy', 'industry', 'Energy', NULL, 'Energy sector', '["oil","gas","energy"]'),
('ind-software', 'industry', 'Software and SaaS', 'ind-tech', 'Software companies', '["SaaS","enterprise software"]'),
('ind-semiconductor', 'industry', 'Semiconductors', 'ind-tech', 'Chip makers', '["semiconductor","chip","NVIDIA"]'),
('ind-pharma', 'industry', 'Pharmaceuticals', 'ind-healthcare', 'Drug makers', '["pharma","drug maker","FDA"]'),
('ind-fintech', 'industry', 'Fintech', 'ind-finance', 'Financial technology', '["fintech","payment","Stripe"]');

-- TACTICS
INSERT INTO taxonomy_labels (id, category, label, description, keyword_patterns) VALUES
('tactic-launch', 'tactic', 'Product Launch', 'New product announcement', '["launches","unveils","introduces"]'),
('tactic-funding', 'tactic', 'Funding Announcement', 'Investment news', '["raises","funding round","Series"]'),
('tactic-acquisition', 'tactic', 'Acquisition', 'M and A news', '["acquires","merger","acquisition"]'),
('tactic-partnership', 'tactic', 'Partnership', 'Partnership news', '["partners with","partnership"]'),
('tactic-earnings', 'tactic', 'Earnings Release', 'Financial results', '["earnings","quarterly results"]'),
('tactic-crisis', 'tactic', 'Crisis Response', 'Crisis communication', '["apologizes","response to"]'),
('tactic-esg', 'tactic', 'ESG Announcement', 'Sustainability news', '["sustainability","ESG","carbon"]');

-- RISKS
INSERT INTO taxonomy_labels (id, category, label, description, keyword_patterns) VALUES
('risk-investigation', 'risk', 'Investigation', 'Government probe', '["investigation","probe","subpoena"]'),
('risk-lawsuit', 'risk', 'Lawsuit', 'Legal action', '["lawsuit","sued","litigation"]'),
('risk-breach', 'risk', 'Security Breach', 'Cyber incident', '["breach","hack","ransomware"]'),
('risk-recall', 'risk', 'Product Recall', 'Safety issue', '["recall","safety","defect"]'),
('risk-layoffs', 'risk', 'Layoffs', 'Workforce reduction', '["layoffs","job cuts","downsizing"]'),
('risk-bankruptcy', 'risk', 'Bankruptcy Risk', 'Insolvency', '["bankruptcy","Chapter 11"]'),
('risk-decline', 'risk', 'Revenue Decline', 'Financial decline', '["revenue decline","profit warning"]'),
('risk-departure', 'risk', 'Executive Departure', 'Leadership exit', '["CEO resigns","steps down"]');

Classification Logic

Rule Priority

  1. Keyword Rules (fastest, free) - Direct pattern matching
  2. Vector Similarity (fast, free) - Compare to taxonomy centroids
  3. LLM Classification (slow, costly) - For ambiguous cases

Confidence Thresholds

MethodMin ConfidenceAction
Rules0.7+Accept
Vector0.8+Accept
Vector0.5-0.8Use LLM
LLM0.7+Accept
AnyBelow 0.5Flag for review

Last updated: 2024-01-15