feat(app): Add wiki re-evaluation and fix wolfra bug

- Implemented a "Re-evaluate Wikipedia" button in the UI.

- Added a backend endpoint to trigger targeted Wikipedia metric extraction.

- Hardened the LLM metric extraction prompt to prevent hallucinations.

- Corrected several database path errors that caused data loss.

- Updated application version to 0.6.4 and documented the ongoing issue.
This commit is contained in:
2026-01-23 16:05:44 +00:00
parent d3ea4e340a
commit b4595ef974
7 changed files with 1427 additions and 791 deletions

View File

@@ -94,7 +94,39 @@ Wir kapseln das neue Projekt vollständig ab ("Fork & Clean").
## 7. Historie & Fixes (Jan 2026) ## 7. Historie & Fixes (Jan 2026)
* **[STABILITY] v0.7.2: Robust Metric Parsing (Jan 23, 2026)**
* **Legacy Logic Restored:** Re-implemented the robust, regex-based number parsing logic (formerly in legacy helpers) as `MetricParser`.
* **German Formats:** Correctly handles "1.000" (thousands) vs "1,5" (decimal) and mixed formats.
* **Citation Cleaning:** Filters out Wikipedia citations like `[3]` and years in parentheses (e.g. "80 (2020)" -> 80).
* **Hybrid Extraction:** The ClassificationService now asks the LLM for the *text segment* and parses the number deterministically, fixing the "1.005 -> 1" LLM hallucination.
* **[ONGOING] v0.6.4: Wolfra Metric Extraction Bug (Jan 23, 2026)**
* **Problem:** Mitarbeiterzahl für "Wolfra Bayrische Natursaft Kelterei GmbH" wird fälschlicherweise als "802020" anstatt "80" ausgelesen.
* **Implementierte Maßnahmen:**
* "Wiki-Reevaluate-Button" im Frontend integriert (POST `/api/companies/{company_id}/reevaluate-wikipedia`).
* `reevaluate_wikipedia_metric`-Funktion im `ClassificationService` erstellt.
* Prompt für `_run_llm_metric_extraction_prompt` geschärft, um LLM zur Rückgabe von `raw_text_segment` zu zwingen.
* Datenbankpfad-Konfiguration in `company-explorer/backend/config.py` mehrfach korrigiert, um `unable to open database file` Fehler zu beheben.
* Fehler in `ClassificationService._get_wikipedia_content` behoben (`wiki_data.get('text')` zu `wiki_data.get('full_text')` geändert).
* **Aktueller Status:** Problem **nicht gelöst**. Trotz der Korrekturen zeigt das System immer noch falsche Werte an, und der Datenbankzugriff war mehrfach fehlerhaft, was zu Datenverlust führte. Weitere Diagnose ist erforderlich, um die genaue LLM-Antwort und den Datenfluss im Container zu überprüfen.
* **[STABILITY] v0.7.1: AI Robustness & UI Fixes (Jan 21, 2026)**
* **SDK Stabilität:** Umstellung auf `gemini-2.0-flash` im Legacy-SDK zur Behebung von `404 Not Found` Fehlern bei `1.5-flash-latest`.
* **API-Key Management:** Implementierung eines robusten Ladevorgangs für den Google API Key (Fallback von Environment-Variable auf lokale Datei `/app/gemini_api_key.txt`).
* **Classification Prompt:** Schärfung des Prompts auf "Best-Fit"-Entscheidungen, um zu konservative "Others"-Einstufungen bei klaren Kandidaten (z.B. Thermen) zu vermeiden.
* **Frontend Rendering:** Fix eines UI-Crashs im Inspector. Metriken werden jetzt auch angezeigt, wenn nur der standardisierte Wert (Fläche) vorhanden ist. Null-Safety für `.toLocaleString()` hinzugefügt.
* **Scraping:** Wiederherstellung der Stabilität durch Entfernung fehlerhafter `trafilatura` Abhängigkeiten; Nutzung von `BeautifulSoup` als robustem Standard.
* **[MAJOR] v0.7.0: Quantitative Potential Analysis (Jan 20, 2026)** * **[MAJOR] v0.7.0: Quantitative Potential Analysis (Jan 20, 2026)**
...
...
## 11. Lessons Learned (Retrospektive Jan 21, 2026)
1. **KI statt Regex für Zahlen:** Anstatt komplexe Python-Funktionen für deutsche Zahlenformate ("1,7 Mio.") zu schreiben, ist es stabiler, das LLM anzuweisen, den Wert direkt als Integer (1700000) zu liefern.
2. **Abhängigkeiten isolieren:** Änderungen an zentralen `core_utils.py` führen schnell zu Import-Fehlern in anderen Modulen. Spezifische Logik (wie Metrik-Parsing) sollte lokal im Service bleiben.
3. **UI Null-Safety:** Quantitative Daten sind oft unvollständig (z.B. Fläche vorhanden, aber Besucherzahl nicht). Das Frontend muss robust gegen `null`-Werte in den Metrik-Feldern sein, um den Render-Prozess nicht zu unterbrechen.
4. **SDK-Versionen:** Die Google-API ist in stetigem Wandel. Der explizite Rückgriff auf stabile Modelle wie `gemini-2.0-flash` ist im Legacy-SDK sicherer als die Nutzung von `-latest` Tags.
* **Zweistufige Analyse:** * **Zweistufige Analyse:**
1. **Strict Classification:** Ordnet Firmen einer Notion-Branche zu (oder "Others"). 1. **Strict Classification:** Ordnet Firmen einer Notion-Branche zu (oder "Others").
2. **Metric Cascade:** Sucht gezielt nach der branchenspezifischen Metrik ("Scraper Search Term"). 2. **Metric Cascade:** Sucht gezielt nach der branchenspezifischen Metrik ("Scraper Search Term").

View File

@@ -58,6 +58,9 @@ class AnalysisRequest(BaseModel):
company_id: int company_id: int
force_scrape: bool = False force_scrape: bool = False
class IndustryUpdateModel(BaseModel):
industry_ai: str
# --- Events --- # --- Events ---
@app.on_event("startup") @app.on_event("startup")
def on_startup(): def on_startup():
@@ -137,6 +140,137 @@ def analyze_company(req: AnalysisRequest, background_tasks: BackgroundTasks, db:
background_tasks.add_task(run_analysis_task, company.id) background_tasks.add_task(run_analysis_task, company.id)
return {"status": "queued"} return {"status": "queued"}
@app.put("/api/companies/{company_id}/industry")
def update_company_industry(
company_id: int,
data: IndustryUpdateModel,
background_tasks: BackgroundTasks,
db: Session = Depends(get_db)
):
company = db.query(Company).filter(Company.id == company_id).first()
if not company:
raise HTTPException(404, detail="Company not found")
# 1. Update Industry
company.industry_ai = data.industry_ai
company.updated_at = datetime.utcnow()
db.commit()
# 2. Trigger Metric Re-extraction in Background
background_tasks.add_task(run_metric_reextraction_task, company.id)
return {"status": "updated", "industry_ai": company.industry_ai}
@app.post("/api/companies/{company_id}/reevaluate-wikipedia")
def reevaluate_wikipedia(company_id: int, background_tasks: BackgroundTasks, db: Session = Depends(get_db)):
company = db.query(Company).filter(Company.id == company_id).first()
if not company:
raise HTTPException(404, detail="Company not found")
background_tasks.add_task(run_wikipedia_reevaluation_task, company.id)
return {"status": "queued"}
@app.delete("/api/companies/{company_id}")
def delete_company(company_id: int, db: Session = Depends(get_db)):
company = db.query(Company).filter(Company.id == company_id).first()
if not company:
raise HTTPException(404, detail="Company not found")
# Delete related data first (Cascade might handle this but being explicit is safer)
db.query(EnrichmentData).filter(EnrichmentData.company_id == company_id).delete()
db.query(Signal).filter(Signal.company_id == company_id).delete()
db.query(Contact).filter(Contact.company_id == company_id).delete()
db.delete(company)
db.commit()
return {"status": "deleted"}
@app.post("/api/companies/{company_id}/override/website")
def override_website(company_id: int, url: str, db: Session = Depends(get_db)):
company = db.query(Company).filter(Company.id == company_id).first()
if not company:
raise HTTPException(404, detail="Company not found")
company.website = url
company.updated_at = datetime.utcnow()
db.commit()
return {"status": "updated", "website": company.website}
@app.post("/api/companies/{company_id}/override/impressum")
def override_impressum(company_id: int, url: str, background_tasks: BackgroundTasks, db: Session = Depends(get_db)):
company = db.query(Company).filter(Company.id == company_id).first()
if not company:
raise HTTPException(404, detail="Company not found")
# Create or update manual impressum lock
existing = db.query(EnrichmentData).filter(
EnrichmentData.company_id == company_id,
EnrichmentData.source_type == "impressum_override"
).first()
if not existing:
db.add(EnrichmentData(
company_id=company_id,
source_type="impressum_override",
content={"url": url},
is_locked=True
))
else:
existing.content = {"url": url}
existing.is_locked = True
db.commit()
return {"status": "updated"}
def run_wikipedia_reevaluation_task(company_id: int):
from .database import SessionLocal
db = SessionLocal()
try:
company = db.query(Company).filter(Company.id == company_id).first()
if not company: return
logger.info(f"Re-evaluating Wikipedia metric for {company.name} (Industry: {company.industry_ai})")
industry = db.query(Industry).filter(Industry.name == company.industry_ai).first()
if industry:
classifier.reevaluate_wikipedia_metric(company, db, industry)
logger.info(f"Wikipedia metric re-evaluation complete for {company.name}")
else:
logger.warning(f"Industry '{company.industry_ai}' not found for re-evaluation.")
except Exception as e:
logger.error(f"Wikipedia Re-evaluation Task Error: {e}", exc_info=True)
finally:
db.close()
def run_metric_reextraction_task(company_id: int):
from .database import SessionLocal
db = SessionLocal()
try:
company = db.query(Company).filter(Company.id == company_id).first()
if not company: return
logger.info(f"Re-extracting metrics for {company.name} (Industry: {company.industry_ai})")
industries = db.query(Industry).all()
industry = next((i for i in industries if i.name == company.industry_ai), None)
if industry:
classifier.extract_metrics_for_industry(company, db, industry)
company.status = "ENRICHED"
db.commit()
logger.info(f"Metric re-extraction complete for {company.name}")
else:
logger.warning(f"Industry '{company.industry_ai}' not found for re-extraction.")
except Exception as e:
logger.error(f"Metric Re-extraction Task Error: {e}", exc_info=True)
finally:
db.close()
def run_discovery_task(company_id: int): def run_discovery_task(company_id: int):
from .database import SessionLocal from .database import SessionLocal
db = SessionLocal() db = SessionLocal()

View File

@@ -10,10 +10,10 @@ try:
class Settings(BaseSettings): class Settings(BaseSettings):
# App Info # App Info
APP_NAME: str = "Company Explorer" APP_NAME: str = "Company Explorer"
VERSION: str = "0.7.0" VERSION: str = "0.6.4"
DEBUG: bool = True DEBUG: bool = True
# Database (Store in App dir for simplicity) # Database (FINAL CORRECT PATH for Docker Container)
DATABASE_URL: str = "sqlite:////app/companies_v3_fixed_2.db" DATABASE_URL: str = "sqlite:////app/companies_v3_fixed_2.db"
# API Keys # API Keys
@@ -32,20 +32,25 @@ try:
except ImportError: except ImportError:
# Fallback wenn pydantic-settings nicht installiert ist # Fallback wenn pydantic-settings nicht installiert ist
class Settings: class FallbackSettings:
APP_NAME = "Company Explorer" APP_NAME = "Company Explorer"
VERSION = "0.2.1" VERSION = "0.6.4"
DEBUG = True DEBUG = True
DATABASE_URL = "sqlite:////app/logs_debug/companies_debug.db" DATABASE_URL = "sqlite:////app/companies_v3_fixed_2.db" # FINAL CORRECT PATH
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY") GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
SERP_API_KEY = os.getenv("SERP_API_KEY") SERP_API_KEY = os.getenv("SERP_API_KEY")
LOG_DIR = "/app/logs_debug" LOG_DIR = "/app/logs_debug"
settings = Settings() settings = FallbackSettings()
# Ensure Log Dir # Ensure Log Dir
try:
os.makedirs(settings.LOG_DIR, exist_ok=True) os.makedirs(settings.LOG_DIR, exist_ok=True)
except FileExistsError:
pass
except Exception as e:
logging.warning(f"Could not create log directory {settings.LOG_DIR}: {e}")
# API Key Loading Helper (from file if env missing) # API Key Loading Helper (from file if env missing)
def load_api_key_from_file(filename: str) -> Optional[str]: def load_api_key_from_file(filename: str) -> Optional[str]:
@@ -54,10 +59,10 @@ def load_api_key_from_file(filename: str) -> Optional[str]:
with open(filename, 'r') as f: with open(filename, 'r') as f:
return f.read().strip() return f.read().strip()
except Exception as e: except Exception as e:
print(f"Could not load key from {filename}: {e}") # Print because logging might not be ready logging.warning(f"Could not load key from {filename}: {e}")
return None return None
# Auto-load keys if not in env # Auto-load keys assuming the app runs in the Docker container's /app context
if not settings.GEMINI_API_KEY: if not settings.GEMINI_API_KEY:
settings.GEMINI_API_KEY = load_api_key_from_file("/app/gemini_api_key.txt") settings.GEMINI_API_KEY = load_api_key_from_file("/app/gemini_api_key.txt")

View File

@@ -0,0 +1,135 @@
import re
import logging
from typing import Optional, Union
logger = logging.getLogger(__name__)
class MetricParser:
"""
Robust parser for extracting numeric values from text, specialized for
German formats and business metrics (Revenue, Employees).
Reconstructs legacy logic to handle thousands separators and year-suffixes.
"""
@staticmethod
def extract_numeric_value(text: str, is_revenue: bool = False) -> Optional[float]:
"""
Extracts a float value from a string, handling German locale and suffixes.
Args:
text: The raw text containing the number (e.g. "1.005 Mitarbeiter (2020)").
is_revenue: If True, prioritizes currency logic (e.g. handling "Mio").
Returns:
The parsed float value or None if no valid number found.
"""
if not text:
return None
# 1. Cleaning: Remove Citations [1], [note 2]
clean_text = re.sub(r'\[.*?\]', '', text)
# 2. Cleaning: Remove Year/Date in parentheses to prevent "80 (2020)" -> 802020
# Matches (2020), (Stand 2021), (31.12.2022), etc.
# We replace them with space to avoid merging numbers.
clean_text = re.sub(r'\(\s*(?:Stand\s*|ab\s*)?(?:19|20)\d{2}.*?\)', ' ', clean_text)
# 3. Identify Multipliers (Mio, Mrd)
multiplier = 1.0
lower_text = clean_text.lower().replace('.', '') # Remove dots for word matching (e.g. "Mio." -> "mio")
if any(x in lower_text for x in ['mrd', 'milliarde', 'billion']): # German Billion = 10^12? Usually in business context here Mrd=10^9
multiplier = 1_000_000_000.0
elif any(x in lower_text for x in ['mio', 'million']):
multiplier = 1_000_000.0
# 4. Extract the number candidate
# We look for the FIRST pattern that looks like a number.
# Must contain at least one digit.
# We iterate over matches to skip pure punctuation like "..."
matches = re.finditer(r'[\d\.,]+', clean_text)
for match in matches:
candidate = match.group(0)
# Check if it actually has a digit
if not re.search(r'\d', candidate):
continue
# Clean trailing/leading punctuation (e.g. "80." -> "80")
candidate = candidate.strip('.,')
if not candidate:
continue
try:
val = MetricParser._parse_german_number_string(candidate)
return val * multiplier
except Exception as e:
# If this candidate fails (e.g. "1.2.3.4"), try the next one?
# For now, let's assume the first valid-looking number sequence is the target.
# But "Wolfra ... 80" -> "..." skipped. "80" matched.
# "1.005 Mitarbeiter" -> "1.005" matched.
logger.debug(f"Failed to parse number string '{candidate}': {e}")
continue
return None
@staticmethod
def _parse_german_number_string(s: str) -> float:
"""
Parses a number string dealing with ambiguous separators.
Logic based on Lessons Learned:
- "1.005" -> 1005.0 (Dot followed by exactly 3 digits = Thousands)
- "1,5" -> 1.5 (Comma = Decimal)
- "1.234,56" -> 1234.56
"""
# Count separators
dots = s.count('.')
commas = s.count(',')
# Case 1: No separators
if dots == 0 and commas == 0:
return float(s)
# Case 2: Mixed separators (Standard German: 1.000.000,00)
if dots > 0 and commas > 0:
# Assume . is thousands, , is decimal
s = s.replace('.', '').replace(',', '.')
return float(s)
# Case 3: Only Dots
if dots > 0:
# Ambiguity: "1.005" (1005) vs "1.5" (1.5)
# Rule: If dot is followed by EXACTLY 3 digits (and it's the last dot or multiple dots), likely thousands.
# But "1.500" is 1500. "1.5" is 1.5.
# Split by dot
parts = s.split('.')
# Check if all parts AFTER the first one have exactly 3 digits
# E.g. 1.000.000 -> parts=["1", "000", "000"] -> OK -> Thousands
# 1.5 -> parts=["1", "5"] -> "5" len is 1 -> Decimal
all_segments_are_3_digits = all(len(p) == 3 for p in parts[1:])
if all_segments_are_3_digits:
# Treat as thousands separator
return float(s.replace('.', ''))
else:
# Treat as decimal (US format or simple float)
# But wait, German uses comma for decimal.
# If we are parsing strict German text, "1.5" might be invalid or actually mean 1st May?
# Usually in Wikipedia DE: "1.5 Mio" -> 1.5 Million.
# So if it's NOT 3 digits, it's likely a decimal point (US style or just typo/format variation).
# User Rule: "1.005" -> 1005.
return float(s) # Python handles 1.5 correctly
# Case 4: Only Commas
if commas > 0:
# German Decimal: "1,5" -> 1.5
# Or English Thousands: "1,000" -> 1000?
# User context is German Wikipedia ("Mitarbeiter", "Umsatz").
# Assumption: Comma is ALWAYS decimal in this context, UNLESS followed by 3 digits AND likely English?
# Safer bet for German data: Comma is decimal.
return float(s.replace(',', '.'))
return float(s)

View File

@@ -1,6 +1,7 @@
import json import json
import logging import logging
import re import re
from datetime import datetime
from typing import Optional, Dict, Any, List from typing import Optional, Dict, Any, List
from sqlalchemy.orm import Session from sqlalchemy.orm import Session
@@ -8,6 +9,7 @@ from sqlalchemy.orm import Session
from backend.database import Company, Industry, RoboticsCategory, EnrichmentData from backend.database import Company, Industry, RoboticsCategory, EnrichmentData
from backend.lib.core_utils import call_gemini_flash, safe_eval_math, run_serp_search from backend.lib.core_utils import call_gemini_flash, safe_eval_math, run_serp_search
from backend.services.scraping import scrape_website_content from backend.services.scraping import scrape_website_content
from backend.lib.metric_parser import MetricParser
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -32,7 +34,7 @@ class ClassificationService:
if enrichment and enrichment.content: if enrichment and enrichment.content:
wiki_data = enrichment.content wiki_data = enrichment.content
return wiki_data.get('text') return wiki_data.get('full_text')
return None return None
def _run_llm_classification_prompt(self, website_text: str, company_name: str, industry_definitions: List[Dict[str, str]]) -> Optional[str]: def _run_llm_classification_prompt(self, website_text: str, company_name: str, industry_definitions: List[Dict[str, str]]) -> Optional[str]:
@@ -75,27 +77,33 @@ class ClassificationService:
def _run_llm_metric_extraction_prompt(self, text_content: str, search_term: str, industry_name: str) -> Optional[Dict[str, Any]]: def _run_llm_metric_extraction_prompt(self, text_content: str, search_term: str, industry_name: str) -> Optional[Dict[str, Any]]:
""" """
Uses LLM to extract the specific metric value from text. Uses LLM to extract the specific metric value from text.
Updated to look specifically for area (m²) even if not the primary search term.
""" """
prompt = r""" prompt = r"""
Du bist ein Datenextraktions-Spezialist. Du bist ein Datenextraktions-Spezialist für Unternehmens-Kennzahlen.
Analysiere den folgenden Text, um spezifische Metrik-Informationen zu extrahieren. Analysiere den folgenden Text, um spezifische Werte zu extrahieren.
--- KONTEXT --- --- KONTEXT ---
Unternehmen ist in der Branche: {industry_name} Branche: {industry_name}
Gesuchter Wert (Rohdaten): '{search_term}' Primär gesuchte Metrik: '{search_term}'
--- TEXT --- --- TEXT ---
{text_content_excerpt} {text_content_excerpt}
--- AUFGABE --- --- AUFGABE ---
1. Finde den numerischen Wert für '{search_term}'. 1. Finde den numerischen Wert für die primäre Metrik '{search_term}'.
2. Versuche auch, eine explizit genannte Gesamtfläche in Quadratmetern (m²) zu finden, falls relevant und vorhanden. 2. EXTREM WICHTIG: Suche im gesamten Text nach einer Angabe zur Gesamtfläche, Nutzfläche, Grundstücksfläche oder Verkaufsfläche in Quadratmetern (m²).
In Branchen wie Freizeitparks, Flughäfen oder Thermen ist dies oft separat im Fließtext versteckt (z.B. "Die Therme verfügt über eine Gesamtfläche von 4.000 m²").
3. Achte auf deutsche Zahlenformate (z.B. 1.005 für tausend-fünf).
4. Regel: Extrahiere IMMER den umgebenden Satz oder die Zeile in 'raw_text_segment'. Rate NIEMALS einen numerischen Wert, ohne den Beweis dafür zu liefern.
Gib NUR ein JSON-Objekt zurück: Gib NUR ein JSON-Objekt zurück:
'raw_value': Der gefundene numerische Wert für '{search_term}' (als Zahl). null, falls nicht gefunden. 'raw_text_segment': Das Snippet für '{search_term}' (z.B. "ca. 1.500 Besucher (2020)"). MUSS IMMER AUSGEFÜLLT SEIN WENN EIN WERT GEFUNDEN WURDE.
'raw_unit': Die Einheit des raw_value (z.B. "Betten", "Stellplätze"). null, falls nicht gefunden. 'raw_value': Der numerische Wert für '{search_term}'. null, falls nicht gefunden.
'area_value': Ein gefundener numerischer Wert für eine Gesamtfläche in m² (als Zahl). null, falls nicht gefunden. 'raw_unit': Die Einheit (z.B. "Besucher", "Passagiere"). null, falls nicht gefunden.
'metric_name': Der Name der Metrik, nach der gesucht wurde (also '{search_term}'). 'area_text_segment': Das Snippet, das eine Fläche (m²) erwähnt (z.B. "4.000 m² Gesamtfläche"). null, falls nicht gefunden.
'area_value': Der gefundene Wert der Fläche in m² (als Zahl). null, falls nicht gefunden.
'metric_name': '{search_term}'.
""".format( """.format(
industry_name=industry_name, industry_name=industry_name,
search_term=search_term, search_term=search_term,
@@ -112,10 +120,20 @@ class ClassificationService:
def _parse_standardization_logic(self, formula: str, raw_value: float) -> Optional[float]: def _parse_standardization_logic(self, formula: str, raw_value: float) -> Optional[float]:
if not formula or raw_value is None: if not formula or raw_value is None:
return None return None
# Clean formula: Replace 'wert'/'Value' and strip area units like m² or alphanumeric noise
# that Notion sync might bring in (e.g. "wert * 25m2" -> "wert * 25")
formula_cleaned = formula.replace("wert", str(raw_value)).replace("Value", str(raw_value)) formula_cleaned = formula.replace("wert", str(raw_value)).replace("Value", str(raw_value))
# Remove common unit strings and non-math characters (except dots and parentheses)
formula_cleaned = re.sub(r'(?i)m[²2]', '', formula_cleaned)
formula_cleaned = re.sub(r'(?i)qm', '', formula_cleaned)
# We leave the final safety check to safe_eval_math
try: try:
return safe_eval_math(formula_cleaned) return safe_eval_math(formula_cleaned)
except: except Exception as e:
logger.error(f"Failed to parse standardization logic '{formula}' with value {raw_value}: {e}")
return None return None
def _extract_and_calculate_metric_cascade( def _extract_and_calculate_metric_cascade(
@@ -147,18 +165,52 @@ class ClassificationService:
logger.info(f"Checking {source_name} for '{search_term}' for {company.name}") logger.info(f"Checking {source_name} for '{search_term}' for {company.name}")
try: try:
content = content_loader() content = content_loader()
print(f"--- DEBUG: Content length for {source_name}: {len(content) if content else 0}")
if not content: continue if not content: continue
llm_result = self._run_llm_metric_extraction_prompt(content, search_term, industry_name) llm_result = self._run_llm_metric_extraction_prompt(content, search_term, industry_name)
if llm_result and (llm_result.get("raw_value") is not None or llm_result.get("area_value") is not None): print(f"--- DEBUG: LLM Result for {source_name}: {llm_result}")
results["calculated_metric_value"] = llm_result.get("raw_value")
is_revenue = "umsatz" in search_term.lower() or "revenue" in search_term.lower()
# Hybrid Extraction Logic:
# 1. Try to parse from the text segment using our robust Python parser (prioritized for German formats)
parsed_value = None
if llm_result and llm_result.get("raw_text_segment"):
parsed_value = MetricParser.extract_numeric_value(llm_result["raw_text_segment"], is_revenue=is_revenue)
if parsed_value is not None:
logger.info(f"Successfully parsed '{llm_result['raw_text_segment']}' to {parsed_value} using MetricParser.")
# 2. Fallback to LLM's raw_value if parser failed or no segment found
# NEW: Also run MetricParser on the raw_value if it's a string, to catch errors like "802020"
final_value = parsed_value
if final_value is None and llm_result.get("raw_value"):
final_value = MetricParser.extract_numeric_value(str(llm_result["raw_value"]), is_revenue=is_revenue)
if final_value is not None:
logger.info(f"Successfully cleaned LLM raw_value '{llm_result['raw_value']}' to {final_value}")
# Ultimate fallback to original raw_value if still None (though parser is very robust)
if final_value is None:
final_value = llm_result.get("raw_value")
if llm_result and (final_value is not None or llm_result.get("area_value") is not None or llm_result.get("area_text_segment")):
results["calculated_metric_value"] = final_value
results["calculated_metric_unit"] = llm_result.get("raw_unit") results["calculated_metric_unit"] = llm_result.get("raw_unit")
results["metric_source"] = source_name results["metric_source"] = source_name
if llm_result.get("area_value") is not None: # 3. Area Extraction Logic (Cascading)
results["standardized_metric_value"] = llm_result.get("area_value") area_val = llm_result.get("area_value")
elif llm_result.get("raw_value") is not None and standardization_logic: # Try to refine area_value if a segment exists
results["standardized_metric_value"] = self._parse_standardization_logic(standardization_logic, llm_result["raw_value"]) if llm_result.get("area_text_segment"):
refined_area = MetricParser.extract_numeric_value(llm_result["area_text_segment"], is_revenue=False)
if refined_area is not None:
area_val = refined_area
logger.info(f"Refined area to {area_val} from segment '{llm_result['area_text_segment']}'")
if area_val is not None:
results["standardized_metric_value"] = area_val
elif final_value is not None and standardization_logic:
results["standardized_metric_value"] = self._parse_standardization_logic(standardization_logic, final_value)
return results return results
except Exception as e: except Exception as e:
@@ -166,32 +218,20 @@ class ClassificationService:
return results return results
def classify_company_potential(self, company: Company, db: Session) -> Company: def extract_metrics_for_industry(self, company: Company, db: Session, industry: Industry) -> Company:
logger.info(f"Starting classification for {company.name}") """
Extracts and calculates metrics for a given industry.
Splits out from classify_company_potential to allow manual overrides.
"""
if not industry or not industry.scraper_search_term:
logger.warning(f"No metric configuration for industry '{industry.name if industry else 'None'}'")
return company
# 1. Load Industries
industries = self._load_industry_definitions(db)
industry_defs = [{"name": i.name, "description": i.description} for i in industries]
# 2. Industry Classification
website_content = scrape_website_content(company.website)
if website_content:
industry_name = self._run_llm_classification_prompt(website_content, company.name, industry_defs)
company.industry_ai = industry_name if industry_name in [i.name for i in industries] else "Others"
else:
company.industry_ai = "Others"
db.commit()
# 3. Metric Extraction
if company.industry_ai != "Others":
industry = next((i for i in industries if i.name == company.industry_ai), None)
if industry and industry.scraper_search_term:
# Derive standardized unit # Derive standardized unit
std_unit = "" if "" in (industry.standardization_logic or "") else "Einheiten" std_unit = "" if "" in (industry.standardization_logic or "") else "Einheiten"
metrics = self._extract_and_calculate_metric_cascade( metrics = self._extract_and_calculate_metric_cascade(
db, company, company.industry_ai, industry.scraper_search_term, industry.standardization_logic, std_unit db, company, industry.name, industry.scraper_search_term, industry.standardization_logic, std_unit
) )
company.calculated_metric_name = metrics["calculated_metric_name"] company.calculated_metric_name = metrics["calculated_metric_name"]
@@ -201,6 +241,113 @@ class ClassificationService:
company.standardized_metric_unit = metrics["standardized_metric_unit"] company.standardized_metric_unit = metrics["standardized_metric_unit"]
company.metric_source = metrics["metric_source"] company.metric_source = metrics["metric_source"]
# Keep track of refinement
company.last_classification_at = datetime.utcnow() company.last_classification_at = datetime.utcnow()
db.commit() db.commit()
return company return company
def reevaluate_wikipedia_metric(self, company: Company, db: Session, industry: Industry) -> Company:
"""
Runs the metric extraction cascade for ONLY the Wikipedia source.
"""
logger.info(f"Starting Wikipedia re-evaluation for '{company.name}'")
if not industry or not industry.scraper_search_term:
logger.warning(f"Cannot re-evaluate: No metric configuration for industry '{industry.name}'")
return company
search_term = industry.scraper_search_term
content = self._get_wikipedia_content(db, company.id)
if not content:
logger.warning("No Wikipedia content found to re-evaluate.")
return company
try:
llm_result = self._run_llm_metric_extraction_prompt(content, search_term, industry.name)
if not llm_result:
raise ValueError("LLM metric extraction returned empty result.")
is_revenue = "umsatz" in search_term.lower() or "revenue" in search_term.lower()
# Hybrid Extraction Logic (same as in cascade)
parsed_value = None
if llm_result.get("raw_text_segment"):
parsed_value = MetricParser.extract_numeric_value(llm_result["raw_text_segment"], is_revenue=is_revenue)
if parsed_value is not None:
logger.info(f"Successfully parsed '{llm_result['raw_text_segment']}' to {parsed_value} using MetricParser.")
final_value = parsed_value
if final_value is None and llm_result.get("raw_value"):
final_value = MetricParser.extract_numeric_value(str(llm_result["raw_value"]), is_revenue=is_revenue)
if final_value is not None:
logger.info(f"Successfully cleaned LLM raw_value '{llm_result['raw_value']}' to {final_value}")
if final_value is None:
final_value = llm_result.get("raw_value")
# Update company metrics if a value was found
if final_value is not None:
company.calculated_metric_name = search_term
company.calculated_metric_value = final_value
company.calculated_metric_unit = llm_result.get("raw_unit")
company.metric_source = "wikipedia_reevaluated"
# Handle standardization
std_unit = "" if "" in (industry.standardization_logic or "") else "Einheiten"
company.standardized_metric_unit = std_unit
area_val = llm_result.get("area_value")
if llm_result.get("area_text_segment"):
refined_area = MetricParser.extract_numeric_value(llm_result["area_text_segment"], is_revenue=False)
if refined_area is not None:
area_val = refined_area
if area_val is not None:
company.standardized_metric_value = area_val
elif industry.standardization_logic:
company.standardized_metric_value = self._parse_standardization_logic(industry.standardization_logic, final_value)
else:
company.standardized_metric_value = None
company.last_classification_at = datetime.utcnow()
db.commit()
logger.info(f"Successfully re-evaluated and updated metrics for {company.name} from Wikipedia.")
else:
logger.warning(f"Re-evaluation for {company.name} did not yield a metric value.")
except Exception as e:
logger.error(f"Error during Wikipedia re-evaluation for {company.name}: {e}")
return company
def classify_company_potential(self, company: Company, db: Session) -> Company:
logger.info(f"Starting complete classification for {company.name}")
# 1. Load Industries
industries = self._load_industry_definitions(db)
industry_defs = [{"name": i.name, "description": i.description} for i in industries]
# 2. Industry Classification (Website-based)
# STRENG: Nur wenn Branche noch auf "Others" steht oder neu ist, darf die KI klassifizieren
valid_industry_names = [i.name for i in industries]
if company.industry_ai and company.industry_ai != "Others" and company.industry_ai in valid_industry_names:
logger.info(f"KEEPING manual/existing industry '{company.industry_ai}' for {company.name}")
else:
website_content = scrape_website_content(company.website)
if website_content:
industry_name = self._run_llm_classification_prompt(website_content, company.name, industry_defs)
company.industry_ai = industry_name if industry_name in valid_industry_names else "Others"
logger.info(f"AI CLASSIFIED {company.name} as '{company.industry_ai}'")
else:
company.industry_ai = "Others"
logger.warning(f"No website content for {company.name}, setting industry to Others")
db.commit()
# 3. Metric Extraction
if company.industry_ai != "Others":
industry = next((i for i in industries if i.name == company.industry_ai), None)
if industry:
self.extract_metrics_for_industry(company, db, industry)
return company

View File

@@ -16,6 +16,7 @@ function App() {
const [isSettingsOpen, setIsSettingsOpen] = useState(false) const [isSettingsOpen, setIsSettingsOpen] = useState(false)
const [selectedCompanyId, setSelectedCompanyId] = useState<number | null>(null) const [selectedCompanyId, setSelectedCompanyId] = useState<number | null>(null)
const [selectedContactId, setSelectedContactId] = useState<number | null>(null) const [selectedContactId, setSelectedContactId] = useState<number | null>(null)
const [backendVersion, setBackendVersion] = useState('');
// Navigation State // Navigation State
const [view, setView] = useState<'companies' | 'contacts'>('companies') const [view, setView] = useState<'companies' | 'contacts'>('companies')
@@ -37,6 +38,13 @@ function App() {
localStorage.setItem('theme', theme) localStorage.setItem('theme', theme)
}, [theme]) }, [theme])
useEffect(() => {
fetch(`${API_BASE}/health`)
.then(res => res.json())
.then(data => setBackendVersion(data.version || ''))
.catch(() => setBackendVersion('N/A'))
}, [])
const toggleTheme = () => setTheme(prev => prev === 'dark' ? 'light' : 'dark') const toggleTheme = () => setTheme(prev => prev === 'dark' ? 'light' : 'dark')
const handleCompanySelect = (id: number) => { const handleCompanySelect = (id: number) => {
@@ -80,7 +88,7 @@ function App() {
</div> </div>
<div> <div>
<h1 className="text-xl font-bold text-slate-900 dark:text-white tracking-tight">Company Explorer</h1> <h1 className="text-xl font-bold text-slate-900 dark:text-white tracking-tight">Company Explorer</h1>
<p className="text-xs text-blue-600 dark:text-blue-400 font-medium">ROBOTICS EDITION <span className="text-slate-500 dark:text-slate-600 ml-2">v0.6.1</span></p> <p className="text-xs text-blue-600 dark:text-blue-400 font-medium">ROBOTICS EDITION {backendVersion && <span className="text-slate-500 dark:text-slate-600 ml-2">v{backendVersion}</span>}</p>
</div> </div>
</div> </div>

View File

@@ -1,6 +1,6 @@
import { useEffect, useState } from 'react' import { useEffect, useState } from 'react'
import axios from 'axios' import axios from 'axios'
import { X, ExternalLink, Bot, Briefcase, Calendar, Globe, Users, DollarSign, MapPin, Tag, RefreshCw as RefreshCwIcon, Search as SearchIcon, Pencil, Check, Download, Clock, Lock, Unlock } from 'lucide-react' import { X, ExternalLink, Bot, Briefcase, Calendar, Globe, Users, DollarSign, MapPin, Tag, RefreshCw as RefreshCwIcon, Search as SearchIcon, Pencil, Check, Download, Clock, Lock, Unlock, Calculator, Ruler, Database, Trash2 } from 'lucide-react'
import clsx from 'clsx' import clsx from 'clsx'
import { ContactsManager, Contact } from './ContactsManager' import { ContactsManager, Contact } from './ContactsManager'
@@ -35,6 +35,14 @@ type CompanyDetail = {
signals: Signal[] signals: Signal[]
enrichment_data: EnrichmentData[] enrichment_data: EnrichmentData[]
contacts?: Contact[] contacts?: Contact[]
// NEU v0.7.0: Quantitative Metrics
calculated_metric_name: string | null
calculated_metric_value: number | null
calculated_metric_unit: string | null
standardized_metric_value: number | null
standardized_metric_unit: string | null
metric_source: string | null
metric_source_url: string | null
} }
export function Inspector({ companyId, initialContactId, onClose, apiBase }: InspectorProps) { export function Inspector({ companyId, initialContactId, onClose, apiBase }: InspectorProps) {
@@ -71,6 +79,11 @@ export function Inspector({ companyId, initialContactId, onClose, apiBase }: Ins
const [isEditingImpressum, setIsEditingImpressum] = useState(false) const [isEditingImpressum, setIsEditingImpressum] = useState(false)
const [impressumUrlInput, setImpressumUrlInput] = useState("") const [impressumUrlInput, setImpressumUrlInput] = useState("")
// NEU: Industry Override
const [industries, setIndustries] = useState<any[]>([])
const [isEditingIndustry, setIsEditingIndustry] = useState(false)
const [industryInput, setIndustryInput] = useState("")
const fetchData = (silent = false) => { const fetchData = (silent = false) => {
if (!companyId) return if (!companyId) return
if (!silent) setLoading(true) if (!silent) setLoading(true)
@@ -78,6 +91,7 @@ export function Inspector({ companyId, initialContactId, onClose, apiBase }: Ins
axios.get(`${apiBase}/companies/${companyId}`) axios.get(`${apiBase}/companies/${companyId}`)
.then(res => { .then(res => {
const newData = res.data const newData = res.data
console.log("FETCHED COMPANY DATA:", newData) // DEBUG: Log raw data from API
setData(newData) setData(newData)
// Auto-stop processing if status changes to ENRICHED or we see data // Auto-stop processing if status changes to ENRICHED or we see data
@@ -100,7 +114,13 @@ export function Inspector({ companyId, initialContactId, onClose, apiBase }: Ins
setIsEditingWiki(false) setIsEditingWiki(false)
setIsEditingWebsite(false) setIsEditingWebsite(false)
setIsEditingImpressum(false) setIsEditingImpressum(false)
setIsEditingIndustry(false)
setIsProcessing(false) // Reset on ID change setIsProcessing(false) // Reset on ID change
// Load industries for dropdown
axios.get(`${apiBase}/industries`)
.then(res => setIndustries(res.data))
.catch(console.error)
}, [companyId]) }, [companyId])
const handleDiscover = async () => { const handleDiscover = async () => {
@@ -204,6 +224,54 @@ export function Inspector({ companyId, initialContactId, onClose, apiBase }: Ins
} }
} }
const handleIndustryOverride = async () => {
if (!companyId) return
setIsProcessing(true)
try {
await axios.put(`${apiBase}/companies/${companyId}/industry`, { industry_ai: industryInput })
setIsEditingIndustry(false)
fetchData()
} catch (e) {
alert("Industry update failed")
console.error(e)
} finally {
setIsProcessing(false)
}
}
const handleReevaluateWikipedia = async () => {
if (!companyId) return
setIsProcessing(true)
try {
await axios.post(`${apiBase}/companies/${companyId}/reevaluate-wikipedia`)
// Polling effect will handle the rest
} catch (e) {
console.error(e)
setIsProcessing(false) // Stop on direct error
}
}
const handleDelete = async () => {
console.log("[Inspector] Delete requested for ID:", companyId)
if (!companyId) return;
if (!window.confirm(`Are you sure you want to delete "${data?.name}"? This action cannot be undone.`)) {
console.log("[Inspector] Delete cancelled by user")
return
}
try {
console.log("[Inspector] Sending DELETE request...")
await axios.delete(`${apiBase}/companies/${companyId}`)
console.log("[Inspector] Delete successful")
onClose() // Close the inspector on success
window.location.reload() // Force reload to show updated list
} catch (e: any) {
console.error("[Inspector] Delete failed:", e)
alert("Failed to delete company: " + (e.response?.data?.detail || e.message))
}
}
const handleLockToggle = async (sourceType: string, currentLockStatus: boolean) => { const handleLockToggle = async (sourceType: string, currentLockStatus: boolean) => {
if (!companyId) return if (!companyId) return
try { try {
@@ -265,6 +333,13 @@ export function Inspector({ companyId, initialContactId, onClose, apiBase }: Ins
<div className="flex justify-between items-start mb-4"> <div className="flex justify-between items-start mb-4">
<h2 className="text-xl font-bold text-slate-900 dark:text-white leading-tight">{data.name}</h2> <h2 className="text-xl font-bold text-slate-900 dark:text-white leading-tight">{data.name}</h2>
<div className="flex items-center gap-2"> <div className="flex items-center gap-2">
<button
onClick={handleDelete}
className="p-1.5 text-slate-500 hover:text-red-600 dark:hover:text-red-500 transition-colors"
title="Delete Company"
>
<Trash2 className="h-4 w-4" />
</button>
<button <button
onClick={handleExport} onClick={handleExport}
className="p-1.5 text-slate-500 hover:text-blue-600 dark:hover:text-blue-400 transition-colors" className="p-1.5 text-slate-500 hover:text-blue-600 dark:hover:text-blue-400 transition-colors"
@@ -287,17 +362,22 @@ export function Inspector({ companyId, initialContactId, onClose, apiBase }: Ins
<div className="flex flex-wrap gap-2 text-sm items-center"> <div className="flex flex-wrap gap-2 text-sm items-center">
{!isEditingWebsite ? ( {!isEditingWebsite ? (
<div className="flex items-center gap-2"> <div className="flex items-center gap-2 group">
{data.website && data.website !== "k.A." ? ( {data.website && data.website !== "k.A." ? (
<a href={data.website} target="_blank" className="flex items-center gap-1 text-blue-600 dark:text-blue-400 hover:text-blue-800 dark:hover:text-blue-300 transition-colors font-medium"> <a
<ExternalLink className="h-3 w-3" /> {new URL(data.website).hostname.replace('www.', '')} href={data.website.startsWith('http') ? data.website : `https://${data.website}`}
target="_blank"
rel="noopener noreferrer"
className="flex items-center gap-1 text-blue-600 dark:text-blue-400 hover:text-blue-800 dark:hover:text-blue-300 transition-colors font-medium"
>
<Globe className="h-3.5 w-3.5" /> {new URL(data.website.startsWith('http') ? data.website : `https://${data.website}`).hostname.replace('www.', '')}
</a> </a>
) : ( ) : (
<span className="text-slate-500 italic">No website</span> <span className="text-slate-500 italic text-xs">No website</span>
)} )}
<button <button
onClick={() => { setWebsiteInput(data.website && data.website !== "k.A." ? data.website : ""); setIsEditingWebsite(true); }} onClick={() => { setWebsiteInput(data.website && data.website !== "k.A." ? data.website : ""); setIsEditingWebsite(true); }}
className="p-1 text-slate-400 hover:text-slate-900 dark:hover:text-white transition-colors" className="p-1 text-slate-400 hover:text-slate-900 dark:hover:text-white transition-colors opacity-0 group-hover:opacity-100"
title="Edit Website URL" title="Edit Website URL"
> >
<Pencil className="h-3 w-3" /> <Pencil className="h-3 w-3" />
@@ -310,7 +390,7 @@ export function Inspector({ companyId, initialContactId, onClose, apiBase }: Ins
value={websiteInput} value={websiteInput}
onChange={e => setWebsiteInput(e.target.value)} onChange={e => setWebsiteInput(e.target.value)}
placeholder="https://..." placeholder="https://..."
className="bg-white dark:bg-slate-800 border border-slate-300 dark:border-slate-700 rounded px-2 py-0.5 text-xs text-slate-900 dark:text-white focus:ring-1 focus:ring-blue-500 outline-none w-48" className="bg-white dark:bg-slate-800 border border-slate-300 dark:border-slate-700 rounded px-2 py-0.5 text-[10px] text-slate-900 dark:text-white focus:ring-1 focus:ring-blue-500 outline-none w-48"
autoFocus autoFocus
/> />
<button <button
@@ -327,20 +407,6 @@ export function Inspector({ companyId, initialContactId, onClose, apiBase }: Ins
</button> </button>
</div> </div>
)} )}
{data.industry_ai && (
<span className="flex items-center gap-1 px-2 py-0.5 bg-slate-100 dark:bg-slate-800 text-slate-700 dark:text-slate-300 rounded border border-slate-200 dark:border-slate-700">
<Briefcase className="h-3 w-3" /> {data.industry_ai}
</span>
)}
<span className={clsx(
"px-2 py-0.5 rounded text-[10px] font-bold uppercase tracking-wider",
data.status === 'ENRICHED' ? "bg-green-100 dark:bg-green-900/40 text-green-700 dark:text-green-400 border border-green-200 dark:border-green-800/50" :
data.status === 'DISCOVERED' ? "bg-blue-100 dark:bg-blue-900/40 text-blue-700 dark:text-blue-400 border border-blue-200 dark:border-blue-800/50" :
"bg-slate-100 dark:bg-slate-800 text-slate-600 dark:text-slate-400 border border-slate-200 dark:border-slate-700"
)}>
{data.status}
</span>
</div> </div>
{/* Tab Navigation */} {/* Tab Navigation */}
@@ -501,6 +567,75 @@ export function Inspector({ companyId, initialContactId, onClose, apiBase }: Ins
{/* Core Classification */}
<div className="bg-blue-50/50 dark:bg-blue-900/10 rounded-xl p-5 border border-blue-100 dark:border-blue-900/50 mb-6">
<div className="grid grid-cols-2 gap-6">
<div>
<div className="text-[10px] text-blue-600 dark:text-blue-400 uppercase font-bold tracking-tight mb-2">Industry Focus</div>
{!isEditingIndustry ? (
<div className="flex items-center gap-3">
<div className="p-2 bg-white dark:bg-slate-800 rounded-lg shadow-sm">
<Briefcase className="h-5 w-5 text-blue-600 dark:text-blue-400" />
</div>
<div>
<div className="text-sm font-semibold text-slate-900 dark:text-white">{data.industry_ai || "Not Classified"}</div>
<button
onClick={() => { setIndustryInput(data.industry_ai || "Others"); setIsEditingIndustry(true); }}
className="text-xs text-blue-600 dark:text-blue-400 hover:underline"
>
Change Industry & Re-Extract
</button>
</div>
</div>
) : (
<div className="space-y-2">
<select
value={industryInput}
onChange={e => setIndustryInput(e.target.value)}
className="w-full bg-white dark:bg-slate-800 border border-slate-300 dark:border-slate-700 rounded px-2 py-1.5 text-sm text-slate-900 dark:text-white focus:ring-1 focus:ring-blue-500 outline-none"
autoFocus
>
<option value="Others">Others</option>
{industries.map(ind => (
<option key={ind.id} value={ind.name}>{ind.name}</option>
))}
</select>
<div className="flex gap-2">
<button
onClick={handleIndustryOverride}
className="flex-1 px-3 py-1.5 bg-blue-600 text-white rounded text-xs font-medium hover:bg-blue-700 transition-colors flex items-center justify-center gap-2"
>
<Check className="h-3.5 w-3.5" /> Save & Re-Extract
</button>
<button
onClick={() => setIsEditingIndustry(false)}
className="px-3 py-1.5 bg-slate-200 dark:bg-slate-700 text-slate-700 dark:text-slate-300 rounded text-xs font-medium hover:bg-slate-300 dark:hover:bg-slate-600 transition-colors"
>
<X className="h-3.5 w-3.5" />
</button>
</div>
</div>
)}
</div>
<div>
<div className="text-[10px] text-slate-500 uppercase font-bold tracking-tight mb-2">Analysis Status</div>
<div className="flex items-center gap-3">
<div className="p-2 bg-white dark:bg-slate-800 rounded-lg shadow-sm">
<Bot className="h-5 w-5 text-slate-500" />
</div>
<div className={clsx(
"px-3 py-1 rounded-full text-xs font-bold",
data.status === 'ENRICHED' ? "bg-green-100 text-green-700 border border-green-200" :
data.status === 'DISCOVERED' ? "bg-blue-100 text-blue-700 border border-blue-200" :
"bg-slate-100 text-slate-600 border border-slate-200"
)}>
{data.status}
</div>
</div>
</div>
</div>
</div>
{/* AI Analysis Dossier */} {/* AI Analysis Dossier */}
{aiAnalysis && ( {aiAnalysis && (
<div className="space-y-4"> <div className="space-y-4">
@@ -582,6 +717,16 @@ export function Inspector({ companyId, initialContactId, onClose, apiBase }: Ins
)} )}
{/* Re-evaluate Button */}
<button
onClick={handleReevaluateWikipedia}
disabled={isProcessing}
className="p-1 text-slate-400 hover:text-blue-600 dark:hover:text-blue-400 transition-colors disabled:opacity-50"
title="Re-run metric extraction from Wikipedia text"
>
<RefreshCwIcon className={clsx("h-3.5 w-3.5", isProcessing && "animate-spin")} />
</button>
{!isEditingWiki ? ( {!isEditingWiki ? (
@@ -723,40 +868,70 @@ export function Inspector({ companyId, initialContactId, onClose, apiBase }: Ins
) : null} ) : null}
</div> </div>
{/* Robotics Scorecard */} {/* Quantitative Potential Analysis (v0.7.0) */}
<div> <div>
<h3 className="text-sm font-semibold text-slate-500 dark:text-slate-400 uppercase tracking-wider mb-3 flex items-center gap-2"> <h3 className="text-sm font-semibold text-slate-500 dark:text-slate-400 uppercase tracking-wider mb-3 flex items-center gap-2">
<Bot className="h-4 w-4" /> Robotics Potential <Bot className="h-4 w-4" /> Quantitative Potential
</h3> </h3>
<div className="grid grid-cols-2 gap-4"> {data.calculated_metric_value != null || data.standardized_metric_value != null ? (
{['cleaning', 'transport', 'security', 'service'].map(type => { <div className="bg-slate-50 dark:bg-slate-950 rounded-lg p-4 border border-slate-200 dark:border-slate-800 space-y-4">
const sig = data.signals.find(s => s.signal_type.includes(type)) {/* Calculated Metric */}
const score = sig ? sig.confidence : 0 {data.calculated_metric_value != null && (
<div className="flex items-start gap-3">
<div className="p-2 bg-white dark:bg-slate-800 rounded-lg text-blue-500 mt-1">
<Calculator className="h-4 w-4" />
</div>
<div>
<div className="text-[10px] text-slate-500 uppercase font-bold tracking-tight">{data.calculated_metric_name || 'Calculated Metric'}</div>
<div className="text-xl text-slate-900 dark:text-white font-bold">
{data.calculated_metric_value.toLocaleString('de-DE')}
<span className="text-sm font-medium text-slate-500 ml-1">{data.calculated_metric_unit}</span>
</div>
</div>
</div>
)}
return ( {/* Standardized Metric */}
<div key={type} className="bg-slate-50 dark:bg-slate-800/50 p-3 rounded-lg border border-slate-200 dark:border-slate-700"> {data.standardized_metric_value != null && (
<div className="flex justify-between mb-1"> <div className="flex items-start gap-3 pt-4 border-t border-slate-200 dark:border-slate-800">
<span className="text-sm text-slate-700 dark:text-slate-300 capitalize">{type}</span> <div className="p-2 bg-white dark:bg-slate-800 rounded-lg text-green-500 mt-1">
<span className={clsx("text-sm font-bold", score > 70 ? "text-green-600 dark:text-green-400" : score > 30 ? "text-yellow-600 dark:text-yellow-400" : "text-slate-500")}> <Ruler className="h-4 w-4" />
{score}%
</span>
</div> </div>
<div className="w-full bg-slate-200 dark:bg-slate-700 h-1.5 rounded-full overflow-hidden"> <div>
<div <div className="text-[10px] text-slate-500 uppercase font-bold tracking-tight">Standardized Potential ({data.standardized_metric_unit})</div>
className={clsx("h-full rounded-full", score > 70 ? "bg-green-500" : score > 30 ? "bg-yellow-500" : "bg-slate-500")} <div className="text-xl text-green-600 dark:text-green-400 font-bold">
style={{ width: `${score}%` }} {data.standardized_metric_value.toLocaleString('de-DE')}
/> <span className="text-sm font-medium text-slate-500 ml-1">{data.standardized_metric_unit}</span>
</div> </div>
{sig?.proof_text && ( <p className="text-xs text-slate-500 mt-1">Comparable value for potential analysis.</p>
<p className="text-xs text-slate-500 dark:text-slate-500 mt-2 line-clamp-2" title={sig.proof_text}> </div>
"{sig.proof_text}" </div>
</p> )}
{/* Source */}
{data.metric_source && (
<div className="flex justify-end items-center gap-1 text-[10px] text-slate-500 pt-2 border-t border-slate-200 dark:border-slate-800">
<Database className="h-3 w-3" />
<span>Source:</span>
{data.metric_source_url ? (
<a href={data.metric_source_url} target="_blank" rel="noopener noreferrer" className="font-medium text-blue-600 dark:text-blue-400 capitalize hover:underline">
{data.metric_source}
</a>
) : (
<span className="font-medium text-slate-600 dark:text-slate-400 capitalize">{data.metric_source}</span>
)} )}
</div> </div>
) )}
})}
</div> </div>
) : (
<div className="p-4 rounded-xl border border-dashed border-slate-200 dark:border-slate-800 text-center text-slate-500 dark:text-slate-600">
<Bot className="h-5 w-5 mx-auto mb-2 opacity-20" />
<p className="text-xs">No quantitative data calculated yet.</p>
<p className="text-xs mt-1">Run "Analyze Potential" to extract metrics.</p>
</div>
)}
</div> </div>
{/* Meta Info */} {/* Meta Info */}