feat(explorer): implement v0.7.0 quantitative potential analysis (cascade logic & metric extraction)

2026-01-20 16:38:05 +00:00
parent 76d801c1d6
commit 103287c12b
6 changed files with 483 additions and 417 deletions
--- a/MIGRATION_PLAN.md
+++ b/MIGRATION_PLAN.md
@@ -1,4 +1,4 @@
-# Migrations-Plan: Legacy GSheets -> Company Explorer (Robotics Edition v0.5.1)
+# Migrations-Plan: Legacy GSheets -> Company Explorer (Robotics Edition v0.7.0)

 **Kontext:** Neuanfang für die Branche **Robotik & Facility Management**.
 **Ziel:** Ablösung von Google Sheets/CLI durch eine Web-App ("Company Explorer") mit SQLite-Backend.
@@ -8,10 +8,10 @@
 | Bereich | Alt (Legacy) | Neu (Robotics Edition) |
 | :--- | :--- | :--- |
 | **Daten-Basis** | Google Sheets | **SQLite** (Lokal, performant, filterbar). |
-| **Ziel-Daten** | Allgemein / Kundenservice | **Robotics-Signale** (SPA-Bereich? Intralogistik? Werkschutz?). |
-| **Branchen** | KI-Vorschlag (Freitext) | **Strict Mode:** Mapping auf feste CRM-Liste (z.B. "Hotellerie", "Maschinenbau"). |
-| **Texterstellung** | Pain/Gain Matrix (Service) | **Pain/Gain Matrix (Robotics)**. "Übersetzung" des alten Wissens auf Roboter. |
-| **Analytics** | Techniker-ML-Modell | **Deaktiviert**. Vorerst keine Relevanz. |
+| **Ziel-Daten** | Allgemein / Kundenservice | **Quantifizierbares Potenzial** (z.B. 4500m² Fläche, 120 Betten). |
+| **Branchen** | KI-Vorschlag (Freitext) | **Strict Mode:** Mapping auf definierte Notion-Liste (z.B. "Hotellerie", "Automotive"). |
+| **Bewertung** | 0-100 Score (Vage) | **Data-Driven:** Rohwert (Scraper/Search) -> Standardisierung (Formel) -> Potenzial. |
+| **Analytics** | Techniker-ML-Modell | **Deaktiviert**. Fokus auf harte Fakten. |
 | **Operations** | D365 Sync (Broken) | **Excel-Import & Deduplizierung**. Fokus auf Matching externer Listen gegen Bestand. |

 ## 2. Architektur & Komponenten-Mapping
@@ -26,8 +26,7 @@ Das System wird in `company-explorer/` neu aufgebaut. Wir lösen Abhängigkeiten
 | **Importer** | Ersetzt `SyncManager`. Importiert Excel-Dumps (CRM) und Event-Listen. | 1 |
 | **Deduplicator** | Ersetzt `company_deduplicator.py`. **Kern-Feature:** Checkt Event-Listen gegen DB. Muss "intelligent" matchen (Name + Ort + Web). | 1 |
 | **Scraper (Base)** | Extrahiert Text von Websites. Basis für alle Analysen. | 1 |
-| **Signal Detector** | **NEU.** Analysiert Website-Text auf Roboter-Potential. <br> *Logik:* Wenn Branche = Hotel & Keyword = "Wellness" -> Potential: Reinigungsroboter. | 1 |
-| **Classifier** | Brancheneinstufung. **Strict Mode:** Prüft gegen `config/allowed_industries.json`. | 2 |
+| **Classification Service** | **NEU (v0.7.0).** Zweistufige Logik: <br> 1. Strict Industry Classification. <br> 2. Metric Extraction Cascade (Web -> Wiki -> SerpAPI). | 1 |
 | **Marketing Engine** | Ersetzt `generate_marketing_text.py`. Nutzt neue `marketing_wissen_robotics.yaml`. | 3 |

 ### B. Frontend (`frontend/`) - React
@@ -35,6 +34,7 @@ Das System wird in `company-explorer/` neu aufgebaut. Wir lösen Abhängigkeiten
 *   **View 1: Der "Explorer":** DataGrid aller Firmen. Filterbar nach "Roboter-Potential" und Status.
 *   **View 2: Der "Inspector":** Detailansicht einer Firma. Zeigt gefundene Signale ("Hat SPA Bereich"). Manuelle Korrektur-Möglichkeit.
 *   **View 3: "List Matcher":** Upload einer Excel-Liste -> Anzeige von Duplikaten -> Button "Neue importieren".
+*   **View 4: "Settings":** Konfiguration von Branchen, Rollen und Robotik-Logik.

 ## 3. Umgang mit Shared Code (`helpers.py` & Co.)

@@ -42,314 +42,115 @@ Wir kapseln das neue Projekt vollständig ab ("Fork & Clean").

 *   **Quelle:** `helpers.py` (Root)
 *   **Ziel:** `company-explorer/backend/lib/core_utils.py`
-*   **Aktion:** Wir kopieren nur:
-    *   OpenAI/Gemini Wrapper (Retry Logic).
-    *   Text Cleaning (`clean_text`, `normalize_string`).
-    *   URL Normalization.
-
-*   **Quelle:** Andere Gemini Apps (`duckdns`, `gtm-architect`, `market-intel`)
-*   **Aktion:** Wir betrachten diese als Referenz. Nützliche Logik (z.B. die "Grit"-Prompts aus `market-intel`) wird explizit in die neuen Service-Module kopiert.
+*   **Aktion:** Wir kopieren nur relevante Teile und ergänzen sie (z.B. `safe_eval_math`, `run_serp_search`).

 ## 4. Datenstruktur (SQLite Schema)

-### Tabelle `companies` (Stammdaten)
+### Tabelle `companies` (Stammdaten & Analyse)
 *   `id` (PK)
 *   `name` (String)
 *   `website` (String)
 *   `crm_id` (String, nullable - Link zum D365)
-*   `industry_crm` (String - Die "erlaubte" Branche)
+*   `industry_crm` (String - Die "erlaubte" Branche aus Notion)
 *   `city` (String)
 *   `country` (String - Standard: "DE" oder aus Impressum)
 *   `status` (Enum: NEW, IMPORTED, ENRICHED, QUALIFIED)
+*   **NEU (v0.7.0):**
+    *   `calculated_metric_name` (String - z.B. "Anzahl Betten")
+    *   `calculated_metric_value` (Float - z.B. 180)
+    *   `calculated_metric_unit` (String - z.B. "Betten")
+    *   `standardized_metric_value` (Float - z.B. 4500)
+    *   `standardized_metric_unit` (String - z.B. "m²")
+    *   `metric_source` (String - "website", "wikipedia", "serpapi")

-### Tabelle `signals` (Roboter-Potential)
-*   `company_id` (FK)
-*   `signal_type` (z.B. "has_spa", "has_large_warehouse", "has_security_needs")
-*   `confidence` (Float)
-*   `proof_text` (Snippet von der Website)
+### Tabelle `signals` (Deprecated)
+*   *Veraltet ab v0.7.0. Wird durch quantitative Metriken in `companies` ersetzt.*

 ### Tabelle `contacts` (Ansprechpartner)
 *   `id` (PK)
 *   `account_id` (FK -> companies.id)
-*   `gender` (Selection: "männlich", "weiblich")
-*   `title` (Text, z.B. "Dr.")
-*   `first_name` (Text)
-*   `last_name` (Text)
-*   `email` (Email)
-*   `job_title` (Text - Visitenkarte)
-*   `language` (Selection: "De", "En")
-*   `role` (Selection: "Operativer Entscheider", "Infrastruktur-Verantwortlicher", "Wirtschaftlicher Entscheider", "Innovations-Treiber")
-*   `status` (Selection: Siehe Prozess-Status)
-*   `is_primary` (Boolean - Nur einer pro Account)
+*   `gender`, `title`, `first_name`, `last_name`, `email`
+*   `job_title` (Visitenkarte)
+*   `role` (Standardisierte Rolle: "Operativer Entscheider", etc.)
+*   `status` (Marketing Status)

-### Tabelle `industries` (Branchen-Fokus)
+### Tabelle `industries` (Branchen-Fokus - Synced from Notion)
 *   `id` (PK)
-*   `name` (String, Unique)
-*   `description` (Text - Abgrenzung/Definition)
-*   `is_focus` (Boolean)
-*   `primary_category_id` (FK -> robotics_categories.id)
-*   `metric_type` (String: `Unit_Count`, `Area_in`, `Area_out` - Art der Metrik zur Größenbestimmung)
-*   `min_requirement` (Float, nullable - Minimaler Schwellenwert für Signal-Relevanz)
-*   `whale_threshold` (Float, nullable - Schwellenwert, ab dem ein Account als "Whale" gilt)
-*   `proxy_factor` (Float, nullable - Multiplikator für die Standardisierungslogik)
-*   `scraper_keywords` (JSON-Array von Strings - Keywords für den Scraper zur Metrik-Erkennung)
-*   `standardization_logic` (String - Formel zur Standardisierung der Metrik, z.B. "wert * 25m²")
+*   `notion_id` (String, Unique)
+*   `name` (String - "Vertical" in Notion)
+*   `description` (Text - "Definition" in Notion)
+*   `metric_type` (String - "Metric Type")
+*   `min_requirement` (Float - "Min. Requirement")
+*   `whale_threshold` (Float - "Whale Threshold")
+*   `proxy_factor` (Float - "Proxy Factor")
+*   `scraper_search_term` (String - "Scraper Search Term")
+*   `scraper_keywords` (Text - "Scraper Keywords")
+*   `standardization_logic` (String - "Standardization Logic")

 ### Tabelle `job_role_mappings` (Rollen-Logik)
 *   `id` (PK)
-*   `pattern` (String - Regex oder Text-Pattern für Jobtitles)
-*   `role` (String - Zielrolle im Verkaufsprozess)
-
-### Tabelle `duplicates_log`
-*   Speichert Ergebnisse von Listen-Abgleichen ("Upload X enthielt 20 bekannte Firmen").
-
-## 5. Phasenplan Umsetzung
-
-1.  **Housekeeping:** Archivierung des Legacy-Codes (`_legacy_gsheets_system`).
-2.  **Setup:** Init `company-explorer` (Backend + Frontend Skeleton).
-3.  **Foundation:** DB-Schema + "List Matcher" (Deduplizierung ist Prio A für Operations).
-4.  **Enrichment:** Implementierung des Scrapers + Signal Detector (Robotics).
-5.  **UI:** React Interface für die Daten.
-6.  **CRM-Features:** Contacts Management & Marketing Automation Status.
-
-## 6. Spezifikation: Contacts & Marketing Status (v0.5.0)
-
-*(Hinzugefügt am 15.01.2026)*
-
-**Konzept:**
-Contacts stehen in 1:n Beziehung zu Accounts. Accounts können einen "Primary Contact" haben.
-
-**Datenfelder:**
-*   **Geschlecht:** Selection (männlich / weiblich)
-*   **Vorname:** Text
-*   **Nachname:** Text
-*   **E-Mail:** Type: E-Mail
-*   **Jobtitle:** Text (Titel auf der Visitenkarte)
-*   **Sprache:** Selection (De / En)
-
-**Rollen (Funktion im Verkaufsprozess):**
-*   Operativer Entscheider
-*   Infrastruktur-Verantwortlicher
-*   Wirtschaftlicher Entscheider
-*   Innovations-Treiber
-
-**Status (Marketing Automation):**
-*   *Manuell:*
-    *   Soft Denied (freundliche Absage)
-    *   Bounced (E-Mail invalide)
-    *   Redirect (ist nicht verantwortlich)
-    *   Interested (ist interessiert)
-    *   Hard denied (nicht mehr kontaktieren)
-*   *Automatisch:*
-    *   Init (Kontakt soll in die Automation hineinlaufen)
-    *   1st Step (Kontakt hat die erste Nachricht erhalten)
-    *   2nd Step (Kontakt hat die zweite Nachricht erhalten)
-    *   Not replied (Kontakt hat die dritte Nachricht erhalten und nicht geantwortet)
-
-**Branchen-Fokus (Settings):**
-*   **Name:** Eindeutiger Name der Branche (CRM-Mapping).
-*   **Beschreibung:** Textuelle Abgrenzung, was zu dieser Branche gehört.
-*   **Is Focus:** Markiert Branchen, die prioritär bearbeitet werden.
-*   **Primäre Produktkategorie:** Zuordnung einer Robotics-Kategorie (z.B. Hotel -> Cleaning).
-
-**Job-Rollen Mapping (Settings):**
-*   **Pattern:** Text-Muster (z.B. "Technischer Leiter", "CTO"), das in Jobtitles gesucht wird.
-*   **Zugeordnete Rolle:** Die funktionale Interpretation (z.B. Operativer Entscheider).
+*   `pattern` (String - Regex für Jobtitles)
+*   `role` (String - Zielrolle)

 ## 7. Historie & Fixes (Jan 2026)

-*   **[UPGRADE] v0.6.1: Notion Single Source of Truth (Jan 20, 2026)**
-    *   **Notion SSoT:** Umstellung der Branchenverwaltung (`Industries`) und Robotik-Kategorien auf Notion. Lokale Änderungen im Web-Interface sind für synchronisierte Felder deaktiviert, um die Datenintegrität zu wahren.
-    *   **Dynamische Klassifizierung:** Der `ClassificationService` lädt die `allowed_industries` nun direkt aus der Datenbank, die wiederum via Sync-Skript aus Notion befüllt wird.
-    *   **Erweiterte Datenmodelle:** Die Datenbank wurde um Felder wie `whale_threshold`, `min_requirement`, `scraper_keywords` und `industry_group` erweitert.
-    *   **Sync-Automation:** Bereitstellung von `backend/scripts/sync_notion_industries.py` zur manuellen oder automatisierten Aktualisierung des lokalen Datenbestands.
+*   **[MAJOR] v0.7.0: Quantitative Potential Analysis (Jan 20, 2026)**
+    *   **Zweistufige Analyse:** 
+        1.  **Strict Classification:** Ordnet Firmen einer Notion-Branche zu (oder "Others").
+        2.  **Metric Cascade:** Sucht gezielt nach der branchenspezifischen Metrik ("Scraper Search Term").
+    *   **Fallback-Kaskade:** Website -> Wikipedia -> SerpAPI (Google Search).
+    *   **Standardisierung:** Berechnet vergleichbare Werte (z.B. m²) aus Rohdaten mit der `Standardization Logic`.
+    *   **Datenbank:** Erweiterung der `companies`-Tabelle um Metrik-Felder, Deprecation der `signals`-Tabelle.

-*   **[UPGRADE] v0.5.1: Robustness, UI Fixes & Wikipedia Hardening**
-    *   **[FIX] Critical DB Schema Mismatch (Jan 15, 2026):**
-        *   **Problem:** Die Anwendung stürzte beim Zugriff auf Firmendetails mit `OperationalError: no such column: wiki_verified_empty` ab.
-        *   **Ursache:** Eine nicht committete Code-Änderung hatte das DB-Modell in `database.py` erweitert, die physische Datenbank-Datei (`companies_v3_final.db`) war jedoch nicht migriert worden und dazu komplett leer/korrupt.
-        *   **Lösung:** Um die Anwendung schnell wieder lauffähig zu bekommen, wurde in `config.py` der `DATABASE_URL` auf einen neuen Dateinamen (`companies_v3_fixed_2.db`) geändert. Dies zwang die App, beim Start eine neue, leere Datenbank mit dem korrekten, aktuellen Schema zu erstellen. Auf eine Datenmigration aus der alten, leeren Datei wurde verzichtet.
-    *   **Standort-Fix (4B AG):** Die Backend-Logik wurde an entscheidenden Stellen (`run_analysis_task`, `override_impressum_url`) mit detailliertem Logging versehen und korrigiert, um sicherzustellen, dass `city` und `country` aus Impressums-Daten zuverlässig in die Haupt-Firmentabelle (`companies`) übernommen werden. Dies löst das Problem, dass Standorte im Inspector, aber nicht in der Übersicht angezeigt wurden.
-    *   **Wikipedia "Verified Empty":**
-        *   **Backend:** Implementierung einer `wiki_verified_empty` Flag in der Datenbank, um Firmen ohne Wikipedia-Eintrag dauerhaft zu markieren. Der `DiscoveryService` überspringt diese Einträge nun.
-        *   **Frontend:** Ein neuer Button im Inspector erlaubt das manuelle Setzen dieses Status.
-    *   **Robuste Wikipedia-Suche:** Die Namens-Normalisierungslogik aus dem Legacy-System wurde vollständig in den `DiscoveryService` reintegriert. Dies ermöglicht eine deutlich höhere Trefferquote bei Firmennamen mit unterschiedlichen Rechtsformen (z.B. "Therme Erding Service GmbH" -> "Therme Erding").
-    *   **UI-Fix (Sort & View):** Die Frontend-Tabellen (`CompanyTable`, `ContactsTable`) wurden grundlegend überarbeitet, um die zuvor fehlenden **Sortier-Dropdowns** und **Grid/List-View-Toggles** korrekt und zuverlässig anzuzeigen. Die Standard-Sortierung ist nun "Alphabetisch".
+*   **[UPGRADE] v0.6.1: Notion Sync Fixes**
+    *   **Mapping:** Korrektur des Mappings für `Metric Type` und `Scraper Search Term` (Notion Select Fields).
+    *   **Truncate-and-Reload:** Sync-Skript löscht alte Daten vor dem Import (für `industries`), behält aber `robotics_categories` bei (Upsert), um FK-Constraints zu schützen.
+    *   **Frontend:** Korrektur der Einheiten-Anzeige ("Unit") im Settings-Dialog.

-*   **[UPGRADE] v0.5.0: Contacts, Settings & UI Overhaul**
-    *   **Contacts Management:**
-        *   Implementierung einer globalen Kontakt-Liste (`ContactsTable`) mit Such- und Filterfunktionen.
-        *   Detail-Bearbeitung von Kontakten direkt im Inspector (Click-to-Edit).
-        *   Bulk-Import-Funktion für Kontakte (CSV-basiert) mit automatischer Firmen-Erstellung und Dubletten-Prüfung (E-Mail).
-        *   Erweiterte Felder: Akademischer Titel, differenzierte Rollen (Operativ, Strategisch etc.) und Marketing-Status.
-    *   **UI Modernisierung:**
-        *   **Light Mode:** Vollständige Unterstützung für Hell/Dunkel-Modus mit Toggle im Header.
-        *   **Grid View:** Umstellung der Firmen-Liste auf eine kartenbasierte Ansicht (analog zu Kontakten).
-        *   **Responsive Design:** Optimierung des Inspectors und der Navigation für mobile Endgeräte.
-    *   **Erweiterte Settings:**
-        *   Neue Konfigurations-Tabs für **Branchen** (Industries) und **Job-Rollen**.
-        *   CRUD-Operationen für Branchen (inkl. Auto-Increment bei Namensgleichheit).
-    *   **Bugfixes:**
-        *   Korrektur des API-Pfads für manuelle Impressum-Updates.
-        *   Stabilisierung der Datenbank-Logik bei Unique-Constraints.
-        *   Optimierung der Anzeige von "Unknown, DE" in der Firmenliste (wird nun ausgeblendet, solange keine Stadt bekannt ist).
+*   **[UPGRADE] v0.6.0: Notion Single Source of Truth**
+    *   Synchronisation von Branchen und Kategorien direkt aus Notion.

-*   **[UPGRADE] v0.4.0: Export & Manual Impressum**
-    *   **JSON Export:** Erweiterung der Detailansicht um einen "Export JSON"-Button, der alle Unternehmensdaten (inkl. Anreicherungen und Signale) herunterlädt.
-    *   **Zeitstempel:** Anzeige des Erstellungsdatums für jeden Anreicherungsdatensatz (Wikipedia, AI Dossier, Impressum) in der Detailansicht.
-    *   **Manuelle Impressum-URL:** Möglichkeit zur manuellen Eingabe einer Impressum-URL in der Detailansicht, um die Extraktion von Firmendaten zu erzwingen.
-    *   **Frontend-Fix:** Behebung eines Build-Fehlers (`Unexpected token`) in `Inspector.tsx` durch Entfernung eines duplizierten JSX-Blocks.
+*   **[UPGRADE] v0.5.1: Robustness**
+    *   Logging, Wikipedia-Optimierung, UI-Fixes.

-*   **[UPGRADE] v2.6.2: Report Completeness & Edit Mode**
-    *   **Edit Hard Facts:** Neue Funktion in Phase 1 ("Edit Raw Data") erlaubt die manuelle Korrektur der extrahierten technischen JSON-Daten.
-    *   **Report-Update:** Phase 5 Prompt wurde angepasst, um explizit die Ergebnisse aus Phase 2 (ICPs & Data Proxies) im finalen Report aufzuführen.
-    *   **Backend-Fix:** Korrektur eines Fehlers beim Speichern von JSON-Daten, der auftrat, wenn Datenbank-Inhalte als Strings vorlagen.
+## 8. Eingesetzte Prompts (Account-Analyse v0.7.0)

-*   **[UPGRADE] v2.6.1: Stability & UI Improvements**
-    *   **White Screen Fix:** Robuste Absicherung des Frontends gegen `undefined`-Werte beim Laden älterer Sitzungen (`optional chaining`).
-    *   **Session Browser:** Komplettes Redesign der Sitzungsübersicht zu einer übersichtlichen Listenansicht mit Icons (Reinigung/Service/Transport/Security).
-    *   **URL-Anzeige:** Die Quell-URL wird nun als dedizierter Link angezeigt und das Projekt automatisch basierend auf dem erkannten Produktnamen umbenannt.
+### 8.1 Strict Industry Classification

-*   **[UPGRADE] v2.6: Rich Session Browser**
-    *   **Neues UI:** Die textbasierte Liste für "Letzte Sitzungen" wurde durch eine dedizierte, kartenbasierte UI (`SessionBrowser.tsx`) ersetzt.
-    *   **Angereicherte Daten:** Jede Sitzungskarte zeigt nun den Produktnamen, die Produktkategorie (mit Icon), eine Kurzbeschreibung und einen Thumbnail-Platzhalter an.
-    *   **Backend-Anpassung:** Die Datenbankabfrage (`gtm_db_manager.py`) wurde erweitert, um diese Metadaten direkt aus der JSON-Spalte zu extrahieren und an das Frontend zu liefern.
-    *   **Verbesserte UX:** Deutlich verbesserte Übersichtlichkeit und schnellere Identifikation von vergangenen Analysen.
-
-*   **[UPGRADE] v2.5: Hard Fact Extraction**
-    *   **Phase 1 Erweiterung:** Implementierung eines sekundären Extraktions-Schritts für "Hard Facts" (Specs).
-    *   **Strukturiertes Daten-Schema:** Integration von `templates/json_struktur_roboplanet.txt`.
-    *   **Normalisierung:** Automatische Standardisierung von Einheiten (Minuten, cm, kg, m²/h).
-    *   **Frontend Update:** Neue UI-Komponente zur Anzeige der technischen Daten (Core Data, Layer, Extended Features).
-    *   **Sidebar & Header:** Update auf "ROBOPLANET v2.5".
-
-*   **[UPGRADE] v2.4:**
-    *   Dokumentation der Kern-Engine (`helpers.py`) mit Dual SDK & Hybrid Image Generation.
-    *   Aktualisierung der Architektur-Übersicht und Komponenten-Beschreibungen.
-    *   Versionierung an den aktuellen Code-Stand (`v2.4.0`) angepasst.
-
-*   **[UPGRADE] v2.3:**
-    *   Einführung der Session History (Datenbank-basiert).
-    *   Implementierung von Markdown-Cleaning (Stripping von Code-Blocks).
-    *   Prompt-Optimierung für tabellarische Markdown-Ausgaben in Phase 5.
-    *   Markdown-File Import Feature.
-
-## 8. Eingesetzte Prompts (Account-Analyse)
-
-Dieser Abschnitt dokumentiert die Prompts, die im Backend des **Company Explorers** zur automatisierten Analyse von Unternehmensdaten eingesetzt werden.
-
-### 8.1 Impressum Extraktion (aus `services/scraping.py`)
-
-Dient der Extraktion strukturierter Stammdaten aus dem rohen Text der Impressums-Seite.
-
-**Prompt:**
+Ordnet das Unternehmen einer definierten Branche zu.

 ```python
-prompt = f"""
-Extract the official company details from this German 'Impressum' text.
-Return JSON ONLY. Keys: 'legal_name', 'street', 'zip', 'city', 'country_code', 'email', 'phone', 'ceo_name', 'vat_id'.
-'country_code' should be the two-letter ISO code (e.g., "DE", "CH", "AT").
-If a field is missing, use null.
-
-Text:
-{raw_text}
+prompt = r"""
+Du bist ein präziser Branchen-Klassifizierer.
+...
+--- ZU VERWENDENDE BRANCHEN-DEFINITIONEN (STRIKT) ---
+{industry_definitions_json}
+...
+Wähle EINE der folgenden Branchen... Wenn keine zutrifft, wähle "Others".
 """
 ```

-**Variablen:**
-*   **`raw_text`**: Der bereinigte HTML-Text der gefundenen Impressums-URL (max. 10.000 Zeichen).
+### 8.2 Metric Extraction

---
-
-### 8.2 Robotics Potential Analyse (aus `services/classification.py`)
-
-Der Kern-Prompt zur Bewertung des Automatisierungspotenzials. Er fasst das Geschäftsmodell zusammen, prüft auf physische Infrastruktur und bewertet spezifische Robotik-Anwendungsfälle.
-
-**Prompt:**
+Extrahiert den spezifischen Zahlenwert ("Scraper Search Term").

 ```python
-prompt = f"""
-You are a Senior B2B Market Analyst for 'Roboplanet', a specialized robotics distributor.
-Your task is to analyze the target company based on their website text and create a concise **Dossier**.
-
--- TARGET COMPANY ---
-Name: {company_name}
-Website Content (Excerpt):
-{website_text[:20000]} 
-
--- ALLOWED INDUSTRIES (STRICT) ---
-You MUST assign the company to exactly ONE of these industries. If unsure, choose the closest match or "Sonstige".
-{json.dumps(self.allowed_industries, ensure_ascii=False)}
-
--- ANALYSIS PART 1: BUSINESS MODEL ---
-1. Identify the core products/services.
-2. Summarize in 2-3 German sentences: What do they do and for whom? (Target: "business_model")
-
--- ANALYSIS PART 2: INFRASTRUCTURE & POTENTIAL (Chain of Thought) ---
-1. **Infrastructure Scan:** Look for evidence of physical assets like *Factories, Large Warehouses, Production Lines, Campuses, Hospitals*.
-2. **Provider vs. User Check:** 
-   - Does the company USE this infrastructure (Potential Customer)?
-   - Or do they SELL products for it (Competitor/Partner)? 
-   - *Example:* "Cleaning" -> Do they sell soap (Provider) or do they have a 50,000sqm factory (User)?
-3. **Evidence Extraction:** Extract 1-2 key sentences from the text proving this infrastructure. (Target: "infrastructure_evidence")
-
--- ANALYSIS PART 3: SCORING (0-100) ---
-Based on the identified infrastructure, score the potential for these categories:
-
-{category_guidance}
-
--- OUTPUT FORMAT (JSON ONLY) ---
-{{
-    "industry": "String (from list)",
-    "business_model": "2-3 sentences summary (German)",
-    "infrastructure_evidence": "1-2 key sentences proving physical assets (German)",
-    "potentials": {{
-        "cleaning": {{ "score": 0-100, "reason": "Reasoning based on infrastructure." }},\
-        "transport": {{ "score": 0-100, "reason": "Reasoning based on logistics volume." }},\
-        "security": {{ "score": 0-100, "reason": "Reasoning based on perimeter/assets." }},\
-        "service": {{ "score": 0-100, "reason": "Reasoning based on guest interaction." }}\
-    }}\
-}}
+prompt = r"""
+Analysiere den folgenden Text...
+--- KONTEXT ---
+Branche: {industry_name}
+Gesuchter Wert: '{search_term}'
+...
+Gib NUR ein JSON-Objekt zurück:
+'raw_value', 'raw_unit', 'area_value' (falls explizit m² genannt).
 """
 ```

-**Variablen:**
-*   **`company_name`**: Offizieller Name des Zielunternehmens zur korrekten Identifikation im Dossier.
-*   **`website_text`**: Der gescrapte Text der Hauptseite (max. 20.000 Zeichen), der als primäre Informationsquelle dient.
-*   **`allowed_industries`**: Eine JSON-Liste der gültigen Branchen. Diese wird dynamisch aus der Datenbanktabelle `industries` geladen (synchronisiert aus Notion). Erzwingt ein sauberes CRM-Mapping.
-*   **`category_guidance`**: Dynamisch generierte Definitionen und Scoring-Anweisungen für die Robotik-Kategorien. Ermöglicht die Anpassung der KI-Logik über Notion/Settings ohne Code-Änderung.
+## 9. Notion Integration

-## 9. Notion Integration (Single Source of Truth)
-
-Das System nutzt Notion als zentrales Steuerungselement für strategische Definitionen.
-
-### 9.1 Datenfluss
-1.  **Definition:** Branchen und Robotik-Kategorien werden in Notion gepflegt (Whale Thresholds, Keywords, Definitionen).
-2.  **Synchronisation:** Das Skript `sync_notion_industries.py` zieht die Daten via API und führt einen Upsert in die lokale SQLite-Datenbank aus.
-3.  **App-Nutzung:** Das Web-Interface zeigt diese Daten schreibgeschützt an. Der `ClassificationService` nutzt sie als "System-Anweisung" für das LLM.
-
-### 9.2 Technische Details
-*   **Notion Token:** Muss in `/app/notion_token.txt` (Container-Pfad) hinterlegt sein.
-*   **DB-Mapping:** Die Zuordnung erfolgt primär über die `notion_id`, sekundär über den Namen, um Dubletten bei der Migration zu vermeiden.
-
-## 10. Database Migration (v0.6.1 -> v0.6.2)
-
-Wenn die `industries`-Tabelle in einer bestehenden Datenbank aktualisiert werden muss (z.B. um neue Felder aus Notion zu unterstützen), darf die Datenbankdatei **nicht** gelöscht werden. Stattdessen muss das Migrations-Skript ausgeführt werden.
-
-**Prozess:**
-
-1.  **Sicherstellen, dass die Zieldatenbank vorhanden ist:** Die `companies_v3_fixed_2.db` muss im `company-explorer`-Verzeichnis liegen.
-2.  **Migration ausführen:** Dieser Befehl fügt die fehlenden Spalten hinzu, ohne Daten zu löschen.
-    ```bash
-    docker exec -it company-explorer python3 backend/scripts/migrate_db.py
-    ```
-3.  **Container neu starten:** Damit der Server das neue Schema erkennt.
-    ```bash
-    docker-compose restart company-explorer
-    ```
-4.  **Notion-Sync ausführen:** Um die neuen Spalten mit Daten zu befüllen.
-    ```bash
-    docker exec -it company-explorer python3 backend/scripts/sync_notion_industries.py
-    ```
+Das System nutzt Notion als SSoT für `Industries` und `RoboticsCategories`.
+Sync-Skript: `backend/scripts/sync_notion_industries.py`.

+## 10. Database Migration

+Bei Schema-Änderungen ohne Datenverlust: `backend/scripts/migrate_db.py`.
--- a/company-explorer/backend/config.py
+++ b/company-explorer/backend/config.py
@@ -10,7 +10,7 @@ try:
    class Settings(BaseSettings):
        # App Info
        APP_NAME: str = "Company Explorer"
-        VERSION: str = "0.6.1"
+        VERSION: str = "0.7.0"
        DEBUG: bool = True
        
        # Database (Store in App dir for simplicity)
--- a/company-explorer/backend/database.py
+++ b/company-explorer/backend/database.py
@@ -42,6 +42,14 @@ class Company(Base):
    last_wiki_search_at = Column(DateTime, nullable=True)
    last_classification_at = Column(DateTime, nullable=True)
    last_signal_check_at = Column(DateTime, nullable=True)
+
+    # NEW: Quantitative Potential Metrics (v0.7.0)
+    calculated_metric_name = Column(String, nullable=True)  # e.g., "Anzahl Betten"
+    calculated_metric_value = Column(Float, nullable=True)   # e.g., 180.0
+    calculated_metric_unit = Column(String, nullable=True)   # e.g., "Betten"
+    standardized_metric_value = Column(Float, nullable=True) # e.g., 4500.0
+    standardized_metric_unit = Column(String, nullable=True) # e.g., "m²"
+    metric_source = Column(String, nullable=True)            # "website", "wikipedia", "serpapi"
    
    # Relationships
    signals = relationship("Signal", back_populates="company", cascade="all, delete-orphan")
@@ -244,4 +252,4 @@ def get_db():
    try:
        yield db
    finally:
-        db.close()
+        db.close()
--- a/company-explorer/backend/lib/core_utils.py
+++ b/company-explorer/backend/lib/core_utils.py
@@ -6,8 +6,9 @@ import re
 import unicodedata
 from urllib.parse import urlparse
 from functools import wraps
-from typing import Optional, Union, List
+from typing import Optional, Union, List, Dict, Any
 from thefuzz import fuzz
+import requests # Added for SerpAPI

 # Try new Google GenAI Lib (v1.0+)
 try:
@@ -45,7 +46,6 @@ def retry_on_failure(max_retries: int = 3, delay: float = 2.0):
                    return func(*args, **kwargs)
                except Exception as e:
                    last_exception = e
-                    # Don't retry on certain fatal errors (can be extended)
                    if isinstance(e, ValueError) and "API Key" in str(e):
                        raise e
                    
@@ -67,9 +67,7 @@ def clean_text(text: str) -> str:
    if not text:
        return ""
    text = str(text).strip()
-    # Normalize unicode characters
    text = unicodedata.normalize('NFKC', text)
-    # Remove control characters
    text = "".join(ch for ch in text if unicodedata.category(ch)[0] != "C")
    text = re.sub(r'\s+', ' ', text)
    return text
@@ -83,18 +81,14 @@ def simple_normalize_url(url: str) -> str:
    if not url or url.lower() in ["k.a.", "nan", "none"]:
        return "k.A."
    
-    # Ensure protocol for urlparse
    if not url.startswith(('http://', 'https://')):
        url = 'http://' + url
        
    try:
        parsed = urlparse(url)
        domain = parsed.netloc or parsed.path
-        
-        # Remove www.
        if domain.startswith('www.'):
            domain = domain[4:]
-            
        return domain.lower()
    except Exception:
        return "k.A."
@@ -109,8 +103,6 @@ def normalize_company_name(name: str) -> str:
        return ""
        
    name = name.lower()
-    
-    # Remove common legal forms (more comprehensive list)
    legal_forms = [
        r'\bgmbh\b', r'\bag\b', r'\bkg\b', r'\bohg\b', r'\bug\b', r'\bltd\b', 
        r'\bllc\b', r'\binc\b', r'\bcorp\b', r'\bco\b', r'\b& co\b', r'\be\.v\.\b',
@@ -122,11 +114,8 @@ def normalize_company_name(name: str) -> str:
    for form in legal_forms:
        name = re.sub(form, '', name)
        
-    # Condense numbers: "11 88 0" -> "11880"
-    name = re.sub(r'(\d)\s+(\d)', r'\1\2', name) # Condense numbers separated by space
-
-    # Remove special chars and extra spaces
-    name = re.sub(r'[^\w\s\d]', '', name) # Keep digits
+    name = re.sub(r'(\d)\s+(\d)', r'\1\2', name)
+    name = re.sub(r'[^\w\s\d]', '', name)
    name = re.sub(r'\s+', ' ', name).strip()
    
    return name
@@ -144,20 +133,17 @@ def extract_numeric_value(raw_value: str, is_umsatz: bool = False) -> str:
    if raw_value in ["k.a.", "nan", "none"]:
        return "k.A."

-    # Simple multiplier handling
    multiplier = 1.0
    if 'mrd' in raw_value or 'billion' in raw_value or 'bn' in raw_value:
-        multiplier = 1000.0 # Standardize to Millions for revenue, Billions for absolute numbers
+        multiplier = 1000.0 
        if not is_umsatz: multiplier = 1000000000.0
    elif 'mio' in raw_value or 'million' in raw_value or 'mn' in raw_value:
-        multiplier = 1.0 # Already in Millions for revenue
+        multiplier = 1.0 
        if not is_umsatz: multiplier = 1000000.0
    elif 'tsd' in raw_value or 'thousand' in raw_value:
-        multiplier = 0.001 # Thousands converted to millions for revenue
+        multiplier = 0.001 
        if not is_umsatz: multiplier = 1000.0
        
-    # Extract number candidates
-    # Regex for "1.000,50" or "1,000.50" or "1000"
    matches = re.findall(r'(\d+[\.,]?\d*[\.,]?\d*)', raw_value)
    if not matches:
        return "k.A."
@@ -165,41 +151,26 @@ def extract_numeric_value(raw_value: str, is_umsatz: bool = False) -> str:
    try:
        num_str = matches[0]
        
-        # Heuristic for German formatting (1.000,00) vs English (1,000.00)
-        # If it contains both, the last separator is likely the decimal
        if '.' in num_str and ',' in num_str:
            if num_str.rfind(',') > num_str.rfind('.'):
-                # German: 1.000,00 -> remove dots, replace comma with dot
                num_str = num_str.replace('.', '').replace(',', '.')
            else:
-                # English: 1,000.00 -> remove commas
                num_str = num_str.replace(',', '')
        elif '.' in num_str:
-            # Ambiguous: 1.005 could be 1005 or 1.005
-            # Assumption: If it's employees (integer), and looks like "1.xxx", it's likely thousands
            parts = num_str.split('.')
            if len(parts) > 1 and len(parts[-1]) == 3 and not is_umsatz:
-                 # Likely thousands separator for employees (e.g. 1.005)
                 num_str = num_str.replace('.', '')
            elif is_umsatz and len(parts) > 1 and len(parts[-1]) == 3:
-                 # For revenue, 375.6 vs 1.000 is tricky. 
-                 # But usually revenue in millions is small numbers with decimals (250.5).
-                 # Large integers usually mean thousands.
-                 # Let's keep dot as decimal for revenue by default unless we detect multiple dots
                 if num_str.count('.') > 1:
                     num_str = num_str.replace('.', '')
        elif ',' in num_str:
-            # German decimal: 1,5 -> 1.5
            num_str = num_str.replace(',', '.')
            
        val = float(num_str) * multiplier
        
-        # Round appropriately
        if is_umsatz:
-            # Return in millions, e.g. "250.5"
            return f"{val:.2f}".rstrip('0').rstrip('.')
        else:
-            # Return integer for employees
            return str(int(val))
            
    except ValueError:
@@ -218,7 +189,6 @@ def clean_json_response(response_text: str) -> str:
    """
    if not response_text: return "{}"
    
-    # Remove markdown code blocks
    cleaned = re.sub(r'^```json\s*', '', response_text, flags=re.MULTILINE)
    cleaned = re.sub(r'^```\s*', '', cleaned, flags=re.MULTILINE)
    cleaned = re.sub(r'\s*```$', '', cleaned, flags=re.MULTILINE)
@@ -227,11 +197,10 @@ def clean_json_response(response_text: str) -> str:

 # ==============================================================================
 # 3. LLM WRAPPER (GEMINI)
-
 # ==============================================================================

@retry_on_failure(max_retries=3)
-def call_gemini(
+def call_gemini_flash(
    prompt: Union[str, List[str]], 
    model_name: str = "gemini-2.0-flash",
    temperature: float = 0.3,
@@ -296,4 +265,75 @@ def call_gemini(
            logger.error(f"Error with google-generativeai lib: {e}")
            raise e
            
-    raise ImportError("No Google GenAI library installed (neither google-genai nor google-generativeai).")
+    raise ImportError("No Google GenAI library installed (neither google-genai nor google-generativeai).")
+
+# ==============================================================================
+# 4. MATH UTILS
+# ==============================================================================
+
+def safe_eval_math(expression: str) -> Optional[float]:
+    """
+    Safely evaluates simple mathematical expressions.
+    Only allows numbers, basic operators (+, -, *, /), and parentheses.
+    Prevents arbitrary code execution.
+    """
+    if not isinstance(expression, str) or not expression: 
+        return None
+
+    # Allowed characters: digits, ., +, -, *, /, (, )
+    # Also allow 'wert' (for replacement) and spaces
+    allowed_pattern = re.compile(r"^[0-9.+\-*/()\s]+$")
+    
+    # Temporarily replace 'wert' for initial character check if still present
+    temp_expression = expression.lower().replace("wert", "1") # Replace wert with a dummy digit
+
+    if not allowed_pattern.fullmatch(temp_expression):
+        logger.error(f"Math expression contains disallowed characters: {expression}")
+        return None
+
+    try:
+        # Compile the expression for safety and performance. Use a restricted global/local dict.
+        code = compile(expression, '<string>', 'eval')
+        # Restrict globals and locals to prevent arbitrary code execution
+        return float(eval(code, {"__builtins__": {}}, {}))
+    except Exception as e:
+        logger.error(f"Error evaluating math expression '{expression}': {e}", exc_info=True)
+        return None
+
+# ==============================================================================
+# 5. SEARCH UTILS
+# ==============================================================================
+
+@retry_on_failure(max_retries=2, delay=5.0)
+def run_serp_search(query: str, num_results: int = 5) -> Optional[Dict[str, Any]]:
+    """
+    Performs a Google search using SerpAPI and returns parsed results.
+    Requires SERP_API_KEY in settings.
+    """
+    api_key = settings.SERP_API_KEY
+    if not api_key:
+        logger.error("SERP_API_KEY is missing in configuration. Cannot run SerpAPI search.")
+        return None
+
+    url = "https://serpapi.com/search.json"
+    params = {
+        "api_key": api_key,
+        "engine": "google",
+        "q": query,
+        "num": num_results, # Number of organic results
+        "gl": "de",        # Geo-targeting to Germany
+        "hl": "de"         # Interface language to German
+    }
+
+    try:
+        response = requests.get(url, params=params)
+        response.raise_for_status() # Raise an exception for HTTP errors
+        results = response.json()
+        logger.info("SerpAPI search for '%s' successful. Found %s organic results.", query, len(results.get("organic_results", [])))
+        return results
+    except requests.exceptions.RequestException as e:
+        logger.error(f"SerpAPI request failed for query '{query}': {e}", exc_info=True)
+        return None
+    except json.JSONDecodeError as e:
+        logger.error(f"Failed to parse SerpAPI JSON response for query '{query}': {e}", exc_info=True)
+        return None
--- a/company-explorer/backend/services/classification.py
+++ b/company-explorer/backend/services/classification.py
@@ -1,117 +1,334 @@
 import json
 import logging
-import os
-from typing import Dict, Any, List
-from ..lib.core_utils import call_gemini, clean_json_response
-from ..config import settings
-from ..database import SessionLocal, RoboticsCategory, Industry
+import re
+from typing import Optional, Dict, Any, List
+
+from sqlalchemy.orm import Session
+
+from backend.database import Company, Industry, RoboticsCategory, EnrichmentData, get_db
+from backend.config import settings
+from backend.lib.core_utils import call_gemini_flash, safe_eval_math, run_serp_search
+from backend.services.scraping import scrape_website_content # Corrected import

 logger = logging.getLogger(__name__)

 class ClassificationService:
-    def __init__(self):
-        pass
+    def __init__(self, db: Session):
+        self.db = db
+        self.allowed_industries_notion: List[Industry] = self._load_industry_definitions()
+        self.robotics_categories: List[RoboticsCategory] = self._load_robotics_categories()
+        
+        # Pre-process allowed industries for LLM prompt
+        self.llm_industry_definitions = [
+            {"name": ind.name, "description": ind.description} for ind in self.allowed_industries_notion
+        ]
+        
+        # Store for quick lookup
+        self.industry_lookup = {ind.name: ind for ind in self.allowed_industries_notion}
+        self.category_lookup = {cat.id: cat for cat in self.robotics_categories}

-    def _get_allowed_industries(self) -> List[str]:
-        """
-        Fetches the allowed industries from the database (Settings > Industry Focus).
-        """
-        db = SessionLocal()
-        try:
-            # Query all industries, order by name for consistency
-            industries = db.query(Industry.name).order_by(Industry.name).all()
-            # extract names from tuples (query returns list of tuples)
-            names = [i[0] for i in industries]
-            return names if names else ["Sonstige"]
-        except Exception as e:
-            logger.error(f"Failed to load allowed industries from DB: {e}")
-            return ["Sonstige"]
-        finally:
-            db.close()
+    def _load_industry_definitions(self) -> List[Industry]:
+        """Loads all industry definitions from the database."""
+        industries = self.db.query(Industry).all()
+        if not industries:
+            logger.warning("No industry definitions found in DB. Classification might be limited.")
+        return industries

-    def _get_category_prompts(self) -> str:
-        """
-        Fetches the latest category definitions from the database.
-        """
-        db = SessionLocal()
-        try:
-            categories = db.query(RoboticsCategory).all()
-            if not categories:
-                return "Error: No categories defined."
-            
-            prompt_parts = []
-            for cat in categories:
-                prompt_parts.append(f"* **{cat.name} ({cat.key}):**\n     - Definition: {cat.description}\n     - Scoring Guide: {cat.reasoning_guide}")
-            
-            return "\n".join(prompt_parts)
-        except Exception as e:
-            logger.error(f"Error fetching categories: {e}")
-            return "Error loading categories."
-        finally:
-            db.close()
+    def _load_robotics_categories(self) -> List[RoboticsCategory]:
+        """Loads all robotics categories from the database."""
+        categories = self.db.query(RoboticsCategory).all()
+        if not categories:
+            logger.warning("No robotics categories found in DB. Potential scoring might be limited.")
+        return categories

-    def analyze_robotics_potential(self, company_name: str, website_text: str) -> Dict[str, Any]:
-        """
-        Analyzes the company for robotics potential based on website content.
-        Returns strict JSON.
-        """
-        if not website_text or len(website_text) < 100:
-            return {"error": "Insufficient text content"}
-            
-        category_guidance = self._get_category_prompts()
-        allowed_industries = self._get_allowed_industries()
+    def _get_wikipedia_content(self, company_id: int) -> Optional[str]:
+        """Fetches Wikipedia content from enrichment_data for a given company."""
+        enrichment = self.db.query(EnrichmentData).filter(
+            EnrichmentData.company_id == company_id,
+            EnrichmentData.source_type == "wikipedia"
+        ).order_by(EnrichmentData.created_at.desc()).first()
+        
+        if enrichment and enrichment.content:
+            # Wikipedia content is stored as JSON with a 'text' key
+            wiki_data = enrichment.content
+            return wiki_data.get('text')
+        return None

-        prompt = f"""
-        You are a Senior B2B Market Analyst for 'Roboplanet', a specialized robotics distributor.
-        Your task is to analyze the target company based on their website text and create a concise **Dossier**.
+    def _run_llm_classification_prompt(self, website_text: str, company_name: str) -> Optional[str]:
+        """
+        Uses LLM to classify the company into one of the predefined industries.
+        Returns the industry name (string) or "Others".
+        """
+        prompt = r"""
+        Du bist ein präziser Branchen-Klassifizierer für Unternehmen.
+        Deine Aufgabe ist es, das vorliegende Unternehmen basierend auf seinem Website-Inhalt
+        einer der untenstehenden Branchen zuzuordnen.

-        --- TARGET COMPANY ---
+        --- UNTERNEHMEN ---
        Name: {company_name}
-        Website Content (Excerpt):
-        {website_text[:20000]} 
+        Website-Inhalt (Auszug):
+        {website_text_excerpt}
+
+        --- ZU VERWENDENDE BRANCHEN-DEFINITIONEN (STRIKT) ---
+        Wähle EINE der folgenden Branchen. Jede Branche hat eine Definition.
+        {industry_definitions_json}
+
+        --- AUFGABE ---
+        Analysiere den Website-Inhalt. Wähle die Branchen-Definition, die am besten zum Unternehmen passt.
+        Wenn keine der Definitionen zutrifft oder du unsicher bist, wähle "Others".
+        Gib NUR den Namen der zugeordneten Branche zurück, als reinen String, nichts anderes.
+
+        Beispiel Output: Hotellerie
+        Beispiel Output: Automotive - Dealer
+        Beispiel Output: Others
+        """.format(
+            company_name=company_name,
+            website_text_excerpt=website_text[:10000], # Limit text to avoid token limits
+            industry_definitions_json=json.dumps(self.llm_industry_definitions, ensure_ascii=False)
+        )
        
-        --- ALLOWED INDUSTRIES (STRICT) ---
-        You MUST assign the company to exactly ONE of these industries. If unsure, choose the closest match or "Sonstige".
-        {json.dumps(allowed_industries, ensure_ascii=False)}
+        try:
+            response = call_gemini_flash(prompt, temperature=0.1, json_mode=False) # Low temp for strict classification
+            classified_industry = response.strip()
+            if classified_industry in [ind.name for ind in self.allowed_industries_notion] + ["Others"]:
+                return classified_industry
+            logger.warning(f"LLM classified industry '{classified_industry}' not in allowed list. Defaulting to Others.")
+            return "Others"
+        except Exception as e:
+            logger.error(f"LLM classification failed for {company_name}: {e}", exc_info=True)
+            return None

-        --- ANALYSIS PART 1: BUSINESS MODEL ---
-        1. Identify the core products/services.
-        2. Summarize in 2-3 German sentences: What do they do and for whom? (Target: "business_model")
-
-        --- ANALYSIS PART 2: INFRASTRUCTURE & POTENTIAL (Chain of Thought) ---
-        1. **Infrastructure Scan:** Look for evidence of physical assets like *Factories, Large Warehouses, Production Lines, Campuses, Hospitals*.
-        2. **Provider vs. User Check:** 
-           - Does the company USE this infrastructure (Potential Customer)?
-           - Or do they SELL products for it (Competitor/Partner)? 
-           - *Example:* "Cleaning" -> Do they sell soap (Provider) or do they have a 50,000sqm factory (User)?
-        3. **Evidence Extraction:** Extract 1-2 key sentences from the text proving this infrastructure. (Target: "infrastructure_evidence")
-
-        --- ANALYSIS PART 3: SCORING (0-100) ---
-        Based on the identified infrastructure, score the potential for these categories:
-        
-        {category_guidance}
-
-        --- OUTPUT FORMAT (JSON ONLY) ---
-        {{
-            "industry": "String (from list)",
-            "business_model": "2-3 sentences summary (German)",
-            "infrastructure_evidence": "1-2 key sentences proving physical assets (German)",
-            "potentials": {{
-                "cleaning": {{ "score": 0-100, "reason": "Reasoning based on infrastructure." }},
-                "transport": {{ "score": 0-100, "reason": "Reasoning based on logistics volume." }},
-                "security": {{ "score": 0-100, "reason": "Reasoning based on perimeter/assets." }},
-                "service": {{ "score": 0-100, "reason": "Reasoning based on guest interaction." }}
-            }}
-        }}
+    def _run_llm_metric_extraction_prompt(self, text_content: str, search_term: str, industry_name: str) -> Optional[Dict[str, Any]]:
        """
+        Uses LLM to extract the specific metric value from text.
+        Returns a dict with 'raw_value', 'raw_unit', 'standardized_value' (if found), 'metric_name'.
+        """
+        # Attempt to extract both the raw unit count and a potential area if explicitly mentioned
+        prompt = r"""
+        Du bist ein Datenextraktions-Spezialist.
+        Analysiere den folgenden Text, um spezifische Metrik-Informationen zu extrahieren.
+
+        --- KONTEXT ---
+        Unternehmen ist in der Branche: {industry_name}
+        Gesuchter Wert (Rohdaten): '{search_term}'
+
+        --- TEXT ---
+        {text_content_excerpt}
+
+        --- AUFGABE ---
+        1. Finde den numerischen Wert für '{search_term}'.
+        2. Versuche auch, eine explizit genannte Gesamtfläche in Quadratmetern (m²) zu finden, falls relevant und vorhanden.
+
+        Gib NUR ein JSON-Objekt zurück mit den Schlüsseln:
+        'raw_value': Der gefundene numerische Wert für '{search_term}' (als Zahl). null, falls nicht gefunden.
+        'raw_unit': Die Einheit des raw_value (z.B. "Betten", "Stellplätze"). null, falls nicht gefunden.
+        'area_value': Ein gefundener numerischer Wert für eine Gesamtfläche in m² (als Zahl). null, falls nicht gefunden.
+        'metric_name': Der Name der Metrik, nach der gesucht wurde (also '{search_term}').
+
+        Beispiel Output (wenn 180 Betten und 4500m² Fläche gefunden):
+        {{"raw_value": 180, "raw_unit": "Betten", "area_value": 4500, "metric_name": "{search_term}"}}
+
+        Beispiel Output (wenn nur 180 Betten gefunden):
+        {{"raw_value": 180, "raw_unit": "Betten", "area_value": null, "metric_name": "{search_term}"}}
+
+        Beispiel Output (wenn nichts gefunden):
+        {{"raw_value": null, "raw_unit": null, "area_value": null, "metric_name": "{search_term}"}}
+        """.format(
+            industry_name=industry_name,
+            search_term=search_term,
+            text_content_excerpt=text_content[:15000] # Adjust as needed for token limits
+        )

        try:
-            response_text = call_gemini(
-                prompt=prompt,
-                json_mode=True,
-                temperature=0.1 # Very low temp for analytical reasoning
-            )
-            return json.loads(clean_json_response(response_text))
+            response = call_gemini_flash(prompt, temperature=0.05, json_mode=True) # Very low temp for extraction
+            result = json.loads(response)
+            return result
        except Exception as e:
-            logger.error(f"Classification failed: {e}")
-            return {"error": str(e)}
+            logger.error(f"LLM metric extraction failed for '{search_term}' in '{industry_name}': {e}", exc_info=True)
+            return None
+
+    def _parse_standardization_logic(self, formula: str, raw_value: float) -> Optional[float]:
+        """
+        Safely parses and executes a simple mathematical formula for standardization.
+        Supports basic arithmetic (+, -, *, /) and integer/float values.
+        """
+        if not formula or not raw_value:
+            return None
+        
+        # Replace 'wert' or 'value' with the actual raw_value
+        formula_cleaned = formula.replace("wert", str(raw_value)).replace("Value", str(raw_value)).replace("VALUE", str(raw_value))
+        
+        try:
+            # Use safe_eval_math from core_utils to prevent arbitrary code execution
+            return safe_eval_math(formula_cleaned)
+        except Exception as e:
+            logger.error(f"Error evaluating standardization logic '{formula}' with value {raw_value}: {e}", exc_info=True)
+            return None
+
+    def _extract_and_calculate_metric_cascade(
+        self,
+        company: Company,
+        industry_name: str,
+        search_term: str,
+        standardization_logic: Optional[str],
+        standardized_unit: Optional[str]
+    ) -> Dict[str, Any]:
+        """
+        Orchestrates the 3-stage (Website -> Wikipedia -> SerpAPI) metric extraction.
+        """
+        results = {
+            "calculated_metric_name": search_term,
+            "calculated_metric_value": None,
+            "calculated_metric_unit": None,
+            "standardized_metric_value": None,
+            "standardized_metric_unit": standardized_unit,
+            "metric_source": None
+        }
+
+        # --- STAGE 1: Website Analysis ---
+        logger.info(f"Stage 1: Analyzing website for '{search_term}' for {company.name}")
+        website_content = scrape_website_content(company.website)
+        if website_content:
+            llm_result = self._run_llm_metric_extraction_prompt(website_content, search_term, industry_name)
+            if llm_result and (llm_result.get("raw_value") is not None or llm_result.get("area_value") is not None):
+                results["calculated_metric_value"] = llm_result.get("raw_value")
+                results["calculated_metric_unit"] = llm_result.get("raw_unit")
+                results["metric_source"] = "website"
+
+                if llm_result.get("area_value") is not None:
+                    # Prioritize directly found standardized area
+                    results["standardized_metric_value"] = llm_result.get("area_value")
+                    logger.info(f"Direct area value found on website for {company.name}: {llm_result.get('area_value')} m²")
+                elif llm_result.get("raw_value") is not None and standardization_logic:
+                    # Calculate if only raw value found
+                    results["standardized_metric_value"] = self._parse_standardization_logic(
+                        standardization_logic, llm_result["raw_value"]
+                    )
+                return results
+
+        # --- STAGE 2: Wikipedia Analysis ---
+        logger.info(f"Stage 2: Analyzing Wikipedia for '{search_term}' for {company.name}")
+        wikipedia_content = self._get_wikipedia_content(company.id)
+        if wikipedia_content:
+            llm_result = self._run_llm_metric_extraction_prompt(wikipedia_content, search_term, industry_name)
+            if llm_result and (llm_result.get("raw_value") is not None or llm_result.get("area_value") is not None):
+                results["calculated_metric_value"] = llm_result.get("raw_value")
+                results["calculated_metric_unit"] = llm_result.get("raw_unit")
+                results["metric_source"] = "wikipedia"
+
+                if llm_result.get("area_value") is not None:
+                    results["standardized_metric_value"] = llm_result.get("area_value")
+                    logger.info(f"Direct area value found on Wikipedia for {company.name}: {llm_result.get('area_value')} m²")
+                elif llm_result.get("raw_value") is not None and standardization_logic:
+                    results["standardized_metric_value"] = self._parse_standardization_logic(
+                        standardization_logic, llm_result["raw_value"]
+                    )
+                return results
+
+        # --- STAGE 3: SerpAPI (Google Search) ---
+        logger.info(f"Stage 3: Running SerpAPI search for '{search_term}' for {company.name}")
+        search_query = f"{company.name} {search_term} {industry_name}" # Example: "Hotel Moxy Würzburg Anzahl Betten Hotellerie"
+        serp_results = run_serp_search(search_query) # This returns a dictionary of search results
+        
+        if serp_results and serp_results.get("organic_results"):
+            # Concatenate snippets from organic results
+            snippets = " ".join([res.get("snippet", "") for res in serp_results["organic_results"]])
+            if snippets:
+                llm_result = self._run_llm_metric_extraction_prompt(snippets, search_term, industry_name)
+                if llm_result and (llm_result.get("raw_value") is not None or llm_result.get("area_value") is not None):
+                    results["calculated_metric_value"] = llm_result.get("raw_value")
+                    results["calculated_metric_unit"] = llm_result.get("raw_unit")
+                    results["metric_source"] = "serpapi"
+
+                    if llm_result.get("area_value") is not None:
+                        results["standardized_metric_value"] = llm_result.get("area_value")
+                        logger.info(f"Direct area value found via SerpAPI for {company.name}: {llm_result.get('area_value')} m²")
+                    elif llm_result.get("raw_value") is not None and standardization_logic:
+                        results["standardized_metric_value"] = self._parse_standardization_logic(
+                            standardization_logic, llm_result["raw_value"]
+                        )
+                    return results
+        
+        logger.info(f"Could not extract metric for '{search_term}' from any source for {company.name}.")
+        return results # Return results with None values
+
+    def classify_company_potential(self, company: Company) -> Company:
+        """
+        Main method to classify industry and calculate potential metric for a company.
+        """
+        logger.info(f"Starting classification for Company ID: {company.id}, Name: {company.name}")
+
+        # --- STEP 1: Strict Industry Classification ---
+        website_content_for_classification = scrape_website_content(company.website)
+        if not website_content_for_classification:
+            logger.warning(f"No website content found for {company.name}. Skipping industry classification.")
+            company.industry_ai = "Others" # Default if no content
+        else:
+            classified_industry_name = self._run_llm_classification_prompt(website_content_for_classification, company.name)
+            if classified_industry_name:
+                company.industry_ai = classified_industry_name
+                logger.info(f"Classified {company.name} into industry: {classified_industry_name}")
+            else:
+                company.industry_ai = "Others"
+                logger.warning(f"Failed to classify industry for {company.name}. Setting to 'Others'.")
+
+        self.db.add(company) # Update industry_ai
+        self.db.commit()
+        self.db.refresh(company)
+
+        # --- STEP 2: Metric Extraction & Standardization (if not 'Others') ---
+        if company.industry_ai == "Others" or company.industry_ai is None:
+            logger.info(f"Company {company.name} classified as 'Others'. Skipping metric extraction.")
+            return company
+
+        industry_definition = self.industry_lookup.get(company.industry_ai)
+        if not industry_definition:
+            logger.error(f"Industry definition for '{company.industry_ai}' not found in lookup. Skipping metric extraction.")
+            return company
+
+        if not industry_definition.scraper_search_term:
+            logger.info(f"Industry '{company.industry_ai}' has no 'Scraper Search Term'. Skipping metric extraction.")
+            return company
+        
+        # Determine standardized unit from standardization_logic if possible
+        standardized_unit = "Einheiten" # Default
+        if industry_definition.standardization_logic:
+            # Example: "wert * 25m² (Fläche pro Zimmer)" -> extract "m²"
+            match = re.search(r'(\w+)$', industry_definition.standardization_logic.replace(' ', ''))
+            if match:
+                standardized_unit = match.group(1).replace('(', '').replace(')', '') # Extract unit like "m²"
+
+        metric_results = self._extract_and_calculate_metric_cascade(
+            company,
+            company.industry_ai,
+            industry_definition.scraper_search_term,
+            industry_definition.standardization_logic,
+            standardized_unit # Pass the derived unit
+        )
+
+        # Update company object with results
+        company.calculated_metric_name = metric_results["calculated_metric_name"]
+        company.calculated_metric_value = metric_results["calculated_metric_value"]
+        company.calculated_metric_unit = metric_results["calculated_metric_unit"]
+        company.standardized_metric_value = metric_results["standardized_metric_value"]
+        company.standardized_metric_unit = metric_results["standardized_metric_unit"]
+        company.metric_source = metric_results["metric_source"]
+        company.last_classification_at = datetime.utcnow() # Update timestamp
+
+        self.db.add(company)
+        self.db.commit()
+        self.db.refresh(company) # Refresh to get updated values
+
+        logger.info(f"Classification and metric extraction completed for {company.name}.")
+        return company
+
+# --- HELPER FOR SAFE MATH EVALUATION (Moved from core_utils.py or assumed to be there) ---
+# Assuming safe_eval_math is available via backend.lib.core_utils.safe_eval_math
+# Example implementation if not:
+# def safe_eval_math(expression: str) -> float:
+#     # Implement a safe parser/evaluator for simple math expressions
+#     # For now, a very basic eval might be used, but in production, this needs to be locked down
+#     allowed_chars = "0123456789.+-*/ "
+#     if not all(c in allowed_chars for c in expression):
+#         raise ValueError("Expression contains disallowed characters.")
+#     return eval(expression)
--- a/company-explorer/backend/services/scraping.py
+++ b/company-explorer/backend/services/scraping.py
@@ -6,7 +6,7 @@ import json
 from urllib.parse import urljoin, urlparse
 from bs4 import BeautifulSoup
 from typing import Optional, Dict
-from ..lib.core_utils import clean_text, retry_on_failure, call_gemini, clean_json_response
+from ..lib.core_utils import clean_text, retry_on_failure, call_gemini_flash, clean_json_response

 logger = logging.getLogger(__name__)