Taxonomy Seed Data
Initial classification labels for topics, industries, tactics, and risk signals
Overview
The taxonomy provides a hierarchical classification system for news articles:
| Category | Purpose | Example |
|---|---|---|
| topic | What the article is about | Technology, Politics, Sports |
| industry | Business sector relevance | Healthcare, Finance, Retail |
| tactic | PR/communication patterns | Product Launch, Crisis Response |
| risk | Potential concerns | Regulatory, Reputation, Financial |
Topics Hierarchy
Level 1: Primary Topics
- Business - Economy, Markets, Corporate, Startups, Real Estate
- Technology - AI/ML, Software, Hardware, Cybersecurity, Crypto, Social Media
- Politics - US Politics, International Relations, Policy, Elections
- Science - Research, Space, Environment, Climate
- Health - Public Health, Medical Research, Mental Health
- Entertainment - Movies/TV, Music, Gaming, Celebrity
- Sports - Football, Basketball, Baseball, Soccer, Esports
- World - Europe, Asia, Middle East, Americas
- Crime - Violent Crime, White Collar, Cybercrime
- Legal - Court Cases, Legislation, Regulatory
- Opinion - Editorial, Analysis, Commentary
Topic Keyword Patterns
| Label | Patterns |
|---|---|
| AI and Machine Learning | artificial intelligence, AI, machine learning, ML, deep learning, neural network, GPT, LLM, ChatGPT, OpenAI, Anthropic |
| Crypto and Blockchain | bitcoin, ethereum, cryptocurrency, crypto, blockchain, NFT, DeFi, Web3 |
| Climate | climate change, global warming, carbon, emissions, renewable energy, sustainability, net zero |
| US Politics | Congress, Senate, White House, Capitol, Democrat, Republican, GOP, Biden, Trump |
| Cybersecurity | cybersecurity, security breach, hack, hacked, ransomware, malware, phishing, data breach |
| Space | NASA, SpaceX, rocket, satellite, astronaut, Mars, moon landing, space exploration |
Industries Hierarchy
Technology Sector
- Software and SaaS
- Semiconductors
- Internet and Social Media
- Cloud Computing
- AI and Data
- Consumer Electronics
Financial Services
- Banking
- Insurance
- Investment Management
- Fintech
- Private Equity
- Cryptocurrency
Healthcare
- Pharmaceuticals
- Biotechnology
- Medical Devices
- Healthcare Providers
- Digital Health
Consumer
- Retail
- E-commerce
- Consumer Goods
- Food and Beverage
- Restaurants and Hospitality
Energy
- Oil and Gas
- Utilities
- Renewable Energy
- Mining
Industrial
- Manufacturing
- Aerospace and Defense
- Automotive
- Construction
- Logistics
Media and Entertainment
- Film and Television
- Music
- Publishing
- Gaming
- Streaming
Other Sectors
- Telecommunications
- Real Estate
- Professional Services
- Education
- Government
Industry Keyword Patterns
| Label | Patterns |
|---|---|
| Software and SaaS | SaaS, software as a service, enterprise software, Salesforce, Microsoft, Oracle, SAP |
| Pharmaceuticals | pharma, pharmaceutical, drug maker, FDA approval, clinical trial, Pfizer, Merck, J and J |
| Semiconductors | semiconductor, chip, processor, NVIDIA, Intel, AMD, TSMC, chip shortage |
| Fintech | fintech, payment, digital payment, Stripe, Square, PayPal, mobile banking |
| Automotive | automotive, automaker, car manufacturer, Tesla, Ford, GM, Toyota, EV, electric vehicle |
PR Tactics
Announcement Types
- Product Launch - New product or service announcement
- Feature Update - Updates to existing products
- Partnership Announcement - Strategic partnerships, collaborations
- Acquisition Announcement - M and A activity
- Funding Announcement - Investment rounds, fundraising
- IPO Announcement - Going public
- Executive Hire - Leadership changes
- Earnings Release - Financial results
Crisis Communications
- Crisis Response - Response to negative events
- Apology Statement - Public apologies
- Recall Announcement - Product recalls
Thought Leadership
- Research Report - Studies, surveys, industry reports
- Expert Commentary - Op-eds, expert opinions
Corporate Communications
- ESG Announcement - Sustainability, social responsibility
- Sponsored Content - Paid/partner content
Tactic Keyword Patterns
| Label | Patterns |
|---|---|
| Product Launch | launches new, unveils, introduces, announces new, now available, coming soon |
| Funding Announcement | raises dollar, secures funding, closes round, Series A/B/C/D, seed round, funding round |
| Acquisition | acquires, to acquire, acquired by, acquisition, merger, deal valued at |
| Crisis Response | apologizes, regrets, takes responsibility, in response to, addressing concerns |
| Earnings Release | quarterly results, earnings report, revenue of, profit of, financial results |
Risk Signals
Financial Risk
- Revenue Decline - Declining sales or revenue
- Profit Warning - Earnings below expectations
- Debt Concerns - Leverage, credit issues
- Bankruptcy Risk - Insolvency, Chapter 11
Regulatory Risk
- Investigation - Government probes, inquiries
- Lawsuit - Legal actions, class actions
- Regulatory Action - Fines, penalties, enforcement
- Antitrust - Competition concerns
- Privacy Violation - GDPR, data protection issues
Operational Risk
- Security Breach - Cyber attacks, data breaches
- Product Recall - Safety issues, defects
- Service Outage - System failures, downtime
- Layoffs - Workforce reductions
Reputation Risk
- Executive Misconduct - Leadership scandals
- Public Backlash - Boycotts, negative reactions
- Whistleblower - Internal complaints, leaks
Strategic Risk
- Market Share Loss - Competitive pressure
- Failed Launch - Product failures
- Executive Departure - Key person leaving
Risk Signal Patterns
| Label | Severity | Patterns |
|---|---|---|
| Investigation | High | under investigation, SEC probe, DOJ inquiry, FTC investigation, subpoena |
| Lawsuit | Medium | lawsuit, legal action, sued, class action, litigation, plaintiff |
| Data Breach | High | data breach, security breach, hack, hacked, ransomware, compromised |
| Layoffs | Low | layoffs, job cuts, workforce reduction, downsizing, eliminating positions |
| Bankruptcy | High | bankruptcy, Chapter 11, insolvency, creditor, restructuring |
| Product Recall | High | recall, recalled, safety concern, defect, malfunction |
Seed SQL
-- TOPICS
INSERT INTO taxonomy_labels (id, category, label, parent_id, description, keyword_patterns) VALUES
-- Primary topics
('topic-business', 'topic', 'Business', NULL, 'Business and corporate news', '["business","corporate","company"]'),
('topic-technology', 'topic', 'Technology', NULL, 'Technology and innovation', '["technology","tech","digital"]'),
('topic-politics', 'topic', 'Politics', NULL, 'Political news', '["politics","government","election"]'),
('topic-science', 'topic', 'Science', NULL, 'Scientific news', '["science","research","study"]'),
('topic-health', 'topic', 'Health', NULL, 'Health news', '["health","medical","healthcare"]'),
('topic-entertainment', 'topic', 'Entertainment', NULL, 'Entertainment news', '["entertainment","movie","music"]'),
('topic-sports', 'topic', 'Sports', NULL, 'Sports news', '["sports","game","championship"]'),
('topic-world', 'topic', 'World', NULL, 'International news', '["international","global","world"]'),
('topic-crime', 'topic', 'Crime', NULL, 'Crime news', '["crime","arrest","police"]'),
('topic-legal', 'topic', 'Legal', NULL, 'Legal news', '["legal","court","lawsuit"]'),
-- Technology subtopics
('topic-ai', 'topic', 'AI and Machine Learning', 'topic-technology', 'AI news', '["AI","machine learning","GPT","LLM"]'),
('topic-crypto', 'topic', 'Crypto and Blockchain', 'topic-technology', 'Crypto news', '["bitcoin","crypto","blockchain"]'),
('topic-cybersecurity', 'topic', 'Cybersecurity', 'topic-technology', 'Security news', '["cybersecurity","hack","breach"]'),
('topic-software', 'topic', 'Software', 'topic-technology', 'Software news', '["software","app","SaaS"]');
-- INDUSTRIES
INSERT INTO taxonomy_labels (id, category, label, parent_id, description, keyword_patterns) VALUES
('ind-tech', 'industry', 'Technology', NULL, 'Tech sector', '["tech company","software company"]'),
('ind-finance', 'industry', 'Financial Services', NULL, 'Finance sector', '["bank","financial services"]'),
('ind-healthcare', 'industry', 'Healthcare', NULL, 'Healthcare sector', '["healthcare","hospital","pharma"]'),
('ind-consumer', 'industry', 'Consumer', NULL, 'Consumer sector', '["retail","consumer goods"]'),
('ind-energy', 'industry', 'Energy', NULL, 'Energy sector', '["oil","gas","energy"]'),
('ind-software', 'industry', 'Software and SaaS', 'ind-tech', 'Software companies', '["SaaS","enterprise software"]'),
('ind-semiconductor', 'industry', 'Semiconductors', 'ind-tech', 'Chip makers', '["semiconductor","chip","NVIDIA"]'),
('ind-pharma', 'industry', 'Pharmaceuticals', 'ind-healthcare', 'Drug makers', '["pharma","drug maker","FDA"]'),
('ind-fintech', 'industry', 'Fintech', 'ind-finance', 'Financial technology', '["fintech","payment","Stripe"]');
-- TACTICS
INSERT INTO taxonomy_labels (id, category, label, description, keyword_patterns) VALUES
('tactic-launch', 'tactic', 'Product Launch', 'New product announcement', '["launches","unveils","introduces"]'),
('tactic-funding', 'tactic', 'Funding Announcement', 'Investment news', '["raises","funding round","Series"]'),
('tactic-acquisition', 'tactic', 'Acquisition', 'M and A news', '["acquires","merger","acquisition"]'),
('tactic-partnership', 'tactic', 'Partnership', 'Partnership news', '["partners with","partnership"]'),
('tactic-earnings', 'tactic', 'Earnings Release', 'Financial results', '["earnings","quarterly results"]'),
('tactic-crisis', 'tactic', 'Crisis Response', 'Crisis communication', '["apologizes","response to"]'),
('tactic-esg', 'tactic', 'ESG Announcement', 'Sustainability news', '["sustainability","ESG","carbon"]');
-- RISKS
INSERT INTO taxonomy_labels (id, category, label, description, keyword_patterns) VALUES
('risk-investigation', 'risk', 'Investigation', 'Government probe', '["investigation","probe","subpoena"]'),
('risk-lawsuit', 'risk', 'Lawsuit', 'Legal action', '["lawsuit","sued","litigation"]'),
('risk-breach', 'risk', 'Security Breach', 'Cyber incident', '["breach","hack","ransomware"]'),
('risk-recall', 'risk', 'Product Recall', 'Safety issue', '["recall","safety","defect"]'),
('risk-layoffs', 'risk', 'Layoffs', 'Workforce reduction', '["layoffs","job cuts","downsizing"]'),
('risk-bankruptcy', 'risk', 'Bankruptcy Risk', 'Insolvency', '["bankruptcy","Chapter 11"]'),
('risk-decline', 'risk', 'Revenue Decline', 'Financial decline', '["revenue decline","profit warning"]'),
('risk-departure', 'risk', 'Executive Departure', 'Leadership exit', '["CEO resigns","steps down"]');
Classification Logic
Rule Priority
- Keyword Rules (fastest, free) - Direct pattern matching
- Vector Similarity (fast, free) - Compare to taxonomy centroids
- LLM Classification (slow, costly) - For ambiguous cases
Confidence Thresholds
| Method | Min Confidence | Action |
|---|---|---|
| Rules | 0.7+ | Accept |
| Vector | 0.8+ | Accept |
| Vector | 0.5-0.8 | Use LLM |
| LLM | 0.7+ | Accept |
| Any | Below 0.5 | Flag for review |
Last updated: 2024-01-15