feat(company-explorer): add wikipedia integration, robotics settings, and manual overrides

- Ported robust Wikipedia extraction logic (categories, first paragraph) from legacy system. - Implemented database-driven Robotics Category configuration with frontend settings UI. - Updated Robotics Potential analysis to use Chain-of-Thought infrastructure reasoning. - Added Manual Override features for Wikipedia URL (with locking) and Website URL (with re-scrape trigger). - Enhanced Inspector UI with Wikipedia profile, category tags, and action buttons.
2026-01-08 10:08:21 +00:00
parent cf4390bdb7
commit a43b01bb6e
12 changed files with 1320 additions and 160 deletions
--- a/GEMINI.md
+++ b/GEMINI.md
@@ -15,6 +15,7 @@ The system is modular and consists of the following key components:
 *   **`company_deduplicator.py`:** A module for intelligent duplicate checking, both for external lists and internal CRM data.
 *   **`generate_marketing_text.py`:** An engine for creating personalized marketing texts.
 *   **`app.py`:** A Flask application that provides an API to run the different modules.
 *   **`company-explorer/`:** A new React/FastAPI-based application (v2.x) replacing the legacy CLI tools. It focuses on identifying robotics potential in companies.
 ## Git Workflow & Conventions
@@ -23,61 +24,27 @@ The system is modular and consists of the following key components:
    - Beschreibung: Detaillierte Änderungen als Liste mit `- ` am Zeilenanfang (keine Bulletpoints).
 - **Datei-Umbenennungen:** Um die Git-Historie einer Datei zu erhalten, muss sie zwingend mit `git mv alter_name.py neuer_name.py` umbenannt werden.
 - **Commit & Push Prozess:** Änderungen werden zuerst lokal committet. Das Pushen auf den Remote-Server erfolgt erst nach expliziter Bestätigung durch Sie.
 - **Anzeige der Historie:** Web-Oberflächen wie Gitea zeigen die Historie einer umbenannten Datei möglicherweise nicht vollständig an. Die korrekte und vollständige Historie kann auf der Kommandozeile mit `git log --follow <dateiname>` eingesehen werden.
-## Building and Running
+## Current Status (Jan 08, 2026) - Company Explorer (Robotics Edition)
-The project is designed to be run in a Docker container. The `Dockerfile` contains the instructions to build the container.
+*   **Robotics Potential Analysis (v2.3):**
    *   **Logic Overhaul:** Switched from keyword-based scanning to a **"Chain-of-Thought" Infrastructure Analysis**. The AI now evaluates physical assets (factories, warehouses, solar parks) to determine robotics needs.
    *   **Provider vs. User:** Implemented strict reasoning to distinguish between companies *selling* cleaning products (providers) and those *operating* factories (users/potential clients).
    *   **Configurable Logic:** Added a database-backed configuration system for robotics categories (`cleaning`, `transport`, `security`, `service`). Users can now define the "Trigger Logic" and "Scoring Guide" directly in the frontend settings.
-**To build the Docker container:**
+*   **Wikipedia Integration (v2.1):**
    *   **Deep Extraction:** Implemented the "Legacy" extraction logic (`WikipediaService`). It now pulls the **first paragraph** (cleaned of references), **categories** (filtered for relevance), revenue, employees, and HQ location.
    *   **Google-First Discovery:** Uses SerpAPI to find the correct Wikipedia article, validating via domain match and city.
    *   **Visual Inspector:** The frontend `Inspector` now displays a comprehensive Wikipedia profile including category tags.
-```bash
+*   **Manual Overrides & Control:**
-docker build -t company-enrichment .
+    *   **Wikipedia Override:** Added a UI to manually correct the Wikipedia URL. This triggers a re-scan and **locks** the record (`is_locked` flag) to prevent auto-overwrite.
-```
+    *   **Website Override:** Added a UI to manually correct the company website. This automatically clears old scraping data to force a fresh analysis on the next run.
-**To run the Docker container:**
+*   **Architecture & DB:**
-
+    *   **Database:** Updated `companies_v3_final.db` schema to include `RoboticsCategory` and `EnrichmentData.is_locked`.
-```bash
+    *   **Services:** Refactored `ClassificationService` and `DiscoveryService` for better modularity and robustness.
 docker run -p 8080:8080 company-enrichment
 ```
 The application will be available at `http://localhost:8080`.
 ## Development Conventions
 *   **Configuration:** The project uses a `config.py` file to manage configuration settings.
 *   **Dependencies:** Python dependencies are listed in the `requirements.txt` file.
 *   **Modularity:** The code is modular and well-structured, with helper functions and classes to handle specific tasks.
 *   **API:** The Flask application in `app.py` provides an API to interact with the system.
 *   **Logging:** The project uses the `logging` module to log information and errors.
 *   **Error Handling:** The `readme.md` indicates a critical error related to the `openai` library. The next step is to downgrade the library to a compatible version.
 ## Current Status (Jan 05, 2026) - GTM & Market Intel Fixes
 *   **GTM Architect (v2.4) - UI/UX Refinement:**
    *   **Corporate Design Integration:** A central, customizable `CORPORATE_DESIGN_PROMPT` was introduced in `config.py` to ensure all generated images strictly follow a "clean, professional, photorealistic" B2B style, avoiding comic aesthetics.
    *   **Aspect Ratio Control:** Implemented user-selectable aspect ratios (16:9, 9:16, 1:1, 4:3) in the frontend (Phase 6), passing through to the Google Imagen/Gemini 2.5 API.
    *   **Frontend Fix:** Resolved a double-declaration bug in `App.tsx` that prevented the build.
 *   **Market Intelligence Tool (v1.2) - Backend Hardening:**
    *   **"Failed to fetch" Resolved:** Fixed a critical Nginx routing issue by forcing the frontend to use relative API paths (`./api`) instead of absolute ports, ensuring requests correctly pass through the reverse proxy in Docker.
    *   **Large Payload Fix:** Increased `client_max_body_size` to 50M in both Nginx configurations (`nginx-proxy.conf` and frontend `nginx.conf`) to prevent 413 Errors when uploading large knowledge base files during campaign generation.
    *   **JSON Stability:** The Python Orchestrator and Node.js bridge were hardened against invalid JSON output. The system now robustly handles stdout noise and logs full raw output to `/app/Log/server_dump.txt` in case of errors.
    *   **Language Support:** Implemented a `--language` flag. The tool now correctly respects the frontend language selection (defaulting to German) and forces the LLM to output German text for signals, ICPs, and outreach campaigns.
    *   **Logging:** Fixed log volume mounting paths to ensure debug logs are persisted and accessible.
 ## Current Status (Jan 2026) - GTM Architect & Core Updates
 *   **GTM Architect (v2.2) - FULLY OPERATIONAL:**
    *   **Image Generation Fixed:** Successfully implemented a hybrid image generation pipeline.
        *   **Text-to-Image:** Uses `imagen-4.0-generate-001` for generic scenes.
        *   **Image-to-Image:** Uses `gemini-2.5-flash-image` with reference image upload for product-consistent visuals.
        *   **Prompt Engineering:** Strict prompts ensure the product design remains unaltered.
    *   **Library Upgrade:** Migrated core AI logic to `google-genai` (v1.x) to resolve deprecation warnings and access newer models. `Pillow` added for image processing.
    *   **Model Update:** Switched text generation to `gemini-2.0-flash` due to regional unavailability of 1.5.
    *   **Frontend Stability:** Fixed a critical React crash in Phase 3 by handling object-based role descriptions robustly.
    *   **Infrastructure:** Updated Docker configurations (`gtm-architect/requirements.txt`) to support new dependencies.
 ## Next Steps
-*   **Monitor Logs:** Check `Log_from_docker/` for detailed execution traces of the GTM Architect.
+*   **Quality Assurance:** Implement a dedicated "Review Mode" to validate high-potential leads.
-*   **Feedback Loop:** Verify the quality of the generated GTM strategies and adjust prompts in `gtm_architect_orchestrator.py` if necessary.
+*   **Data Import:** Finalize the "List Matcher" to import and deduplicate Excel lists against the new DB.
--- a/company-explorer/backend/app.py
+++ b/company-explorer/backend/app.py
@@ -17,7 +17,7 @@ setup_logging()
 import logging
 logger = logging.getLogger(__name__)
-from .database import init_db, get_db, Company, Signal, EnrichmentData
+from .database import init_db, get_db, Company, Signal, EnrichmentData, RoboticsCategory
 from .services.deduplication import Deduplicator
 from .services.discovery import DiscoveryService
 from .services.scraping import ScraperService
@@ -97,7 +97,10 @@ def list_companies(
@app.get("/api/companies/{company_id}")
 def get_company(company_id: int, db: Session = Depends(get_db)):
-    company = db.query(Company).options(joinedload(Company.signals)).filter(Company.id == company_id).first()
+    company = db.query(Company).options(
        joinedload(Company.signals),
        joinedload(Company.enrichment_data)
    ).filter(Company.id == company_id).first()
    if not company:
        raise HTTPException(status_code=404, detail="Company not found")
    return company
@@ -154,6 +157,27 @@ def bulk_import_names(req: BulkImportRequest, db: Session = Depends(get_db)):
        db.rollback()
        raise HTTPException(status_code=500, detail=str(e))
@app.get("/api/robotics/categories")
 def list_robotics_categories(db: Session = Depends(get_db)):
    """Lists all configured robotics categories."""
    return db.query(RoboticsCategory).all()
 class CategoryUpdate(BaseModel):
    description: str
    reasoning_guide: str
@app.put("/api/robotics/categories/{id}")
 def update_robotics_category(id: int, cat: CategoryUpdate, db: Session = Depends(get_db)):
    """Updates a robotics category definition."""
    category = db.query(RoboticsCategory).filter(RoboticsCategory.id == id).first()
    if not category:
        raise HTTPException(404, "Category not found")
    category.description = cat.description
    category.reasoning_guide = cat.reasoning_guide
    db.commit()
    return category
@app.post("/api/enrich/discover")
 def discover_company(req: AnalysisRequest, background_tasks: BackgroundTasks, db: Session = Depends(get_db)):
    """
@@ -172,6 +196,71 @@ def discover_company(req: AnalysisRequest, background_tasks: BackgroundTasks, db
        logger.error(f"Discovery Error: {e}")
        raise HTTPException(status_code=500, detail=str(e))
@app.post("/api/companies/{company_id}/override/wiki")
 def override_wiki_url(company_id: int, url: str = Query(...), db: Session = Depends(get_db)):
    """
    Manually sets the Wikipedia URL for a company and triggers re-extraction.
    Locks the data against auto-discovery.
    """
    company = db.query(Company).filter(Company.id == company_id).first()
    if not company:
        raise HTTPException(404, "Company not found")
    logger.info(f"Manual Override for {company.name}: Setting Wiki URL to {url}")
    # Update or create EnrichmentData entry
    existing_wiki = db.query(EnrichmentData).filter(
        EnrichmentData.company_id == company.id, 
        EnrichmentData.source_type == "wikipedia"
    ).first()
    # Extract data immediately
    wiki_data = {"url": url}
    if url and url != "k.A.":
        try:
            wiki_data = discovery.extract_wikipedia_data(url)
            wiki_data['url'] = url # Ensure URL is correct
        except Exception as e:
            logger.error(f"Extraction failed for manual URL: {e}")
            wiki_data["error"] = str(e)
    if not existing_wiki:
        db.add(EnrichmentData(
            company_id=company.id, 
            source_type="wikipedia", 
            content=wiki_data,
            is_locked=True
        ))
    else:
        existing_wiki.content = wiki_data
        existing_wiki.updated_at = datetime.utcnow()
        existing_wiki.is_locked = True # LOCK IT
    db.commit()
    return {"status": "updated", "data": wiki_data}
@app.post("/api/companies/{company_id}/override/website")
 def override_website_url(company_id: int, url: str = Query(...), db: Session = Depends(get_db)):
    """
    Manually sets the Website URL for a company.
    Clears existing scrape data to force a fresh analysis on next run.
    """
    company = db.query(Company).filter(Company.id == company_id).first()
    if not company:
        raise HTTPException(404, "Company not found")
    logger.info(f"Manual Override for {company.name}: Setting Website to {url}")
    company.website = url
    # Remove old scrape data since URL changed
    db.query(EnrichmentData).filter(
        EnrichmentData.company_id == company.id,
        EnrichmentData.source_type == "website_scrape"
    ).delete()
    db.commit()
    return {"status": "updated", "website": url}
 def run_discovery_task(company_id: int):
    # New Session for Background Task
    from .database import SessionLocal
@@ -182,27 +271,38 @@ def run_discovery_task(company_id: int):
        logger.info(f"Running Discovery Task for {company.name}")
-        # 1. Website Search
+        # 1. Website Search (Always try if missing)
        if not company.website or company.website == "k.A.":
            found_url = discovery.find_company_website(company.name, company.city)
            if found_url and found_url != "k.A.":
                company.website = found_url
                logger.info(f"-> Found URL: {found_url}")
-        # 2. Wikipedia Search
+        # 2. Wikipedia Search & Extraction
-        wiki_url = discovery.find_wikipedia_url(company.name)
+        # Check if locked
        company.last_wiki_search_at = datetime.utcnow()
        existing_wiki = db.query(EnrichmentData).filter(
            EnrichmentData.company_id == company.id, 
-            EnrichmentData.source_type == "wikipedia_url"
+            EnrichmentData.source_type == "wikipedia"
        ).first()
-        
+
-        if not existing_wiki:
+        if existing_wiki and existing_wiki.is_locked:
-            db.add(EnrichmentData(company_id=company.id, source_type="wikipedia_url", content={"url": wiki_url}))
+            logger.info(f"Skipping Wiki Discovery for {company.name} - Data is LOCKED.")
        else:
-            existing_wiki.content = {"url": wiki_url}
+            # Pass available info for better validation
-            existing_wiki.updated_at = datetime.utcnow()
+            current_website = company.website if company.website and company.website != "k.A." else None
            wiki_url = discovery.find_wikipedia_url(company.name, website=current_website, city=company.city)
            company.last_wiki_search_at = datetime.utcnow()
            wiki_data = {"url": wiki_url}
            if wiki_url and wiki_url != "k.A.":
                logger.info(f"Extracting full data from Wikipedia for {company.name}...")
                wiki_data = discovery.extract_wikipedia_data(wiki_url)
            if not existing_wiki:
                db.add(EnrichmentData(company_id=company.id, source_type="wikipedia", content=wiki_data))
            else:
                existing_wiki.content = wiki_data
                existing_wiki.updated_at = datetime.utcnow()
        if company.status == "NEW" and company.website and company.website != "k.A.":
            company.status = "DISCOVERED"
--- a/company-explorer/backend/database.py
+++ b/company-explorer/backend/database.py
@@ -77,13 +77,30 @@ class EnrichmentData(Base):
    id = Column(Integer, primary_key=True, index=True)
    company_id = Column(Integer, ForeignKey("companies.id"))
-    source_type = Column(String) # "website_scrape", "wikipedia_api", "google_serp"
+    source_type = Column(String) # "website_scrape", "wikipedia", "google_serp"
    content = Column(JSON)       # The raw data
    is_locked = Column(Boolean, default=False) # Manual override flag
    created_at = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
    company = relationship("Company", back_populates="enrichment_data")
 class RoboticsCategory(Base):
    """
    Stores definitions for robotics categories to allow user customization via UI.
    """
    __tablename__ = "robotics_categories"
    id = Column(Integer, primary_key=True, index=True)
    key = Column(String, unique=True, index=True) # e.g. "cleaning", "service"
    name = Column(String) # Display Name
    description = Column(Text) # The core definition used in LLM prompts
    reasoning_guide = Column(Text) # Instructions for the Chain-of-Thought
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
 class ImportLog(Base):
    """
    Logs bulk imports (e.g. from Excel lists).
@@ -104,6 +121,47 @@ class ImportLog(Base):
 def init_db():
    Base.metadata.create_all(bind=engine)
    init_robotics_defaults()
 def init_robotics_defaults():
    """Seeds the database with default robotics categories if empty."""
    db = SessionLocal()
    try:
        if db.query(RoboticsCategory).count() == 0:
            defaults = [
                {
                    "key": "cleaning",
                    "name": "Cleaning Robots",
                    "description": "Does the company manage large floors, hospitals, hotels, or public spaces? (Keywords: Hygiene, Cleaning, SPA, Facility Management)",
                    "reasoning_guide": "High (80-100): Large industrial floors, shopping malls, hospitals, airports. Medium (40-79): Mid-sized production, large offices, supermarkets. Low (0-39): Small offices, software consultancies."
                },
                {
                    "key": "transport",
                    "name": "Intralogistics / Transport",
                    "description": "Do they move goods internally? (Keywords: Warehouse, Intralogistics, Production line, Hospital logistics)",
                    "reasoning_guide": "High: Manufacturing, E-Commerce fulfillment, Hospitals. Low: Pure service providers, law firms."
                },
                {
                    "key": "security",
                    "name": "Security & Surveillance",
                    "description": "Do they have large perimeters, solar parks, wind farms, or night patrols? (Keywords: Werkschutz, Security, Monitoring)",
                    "reasoning_guide": "High: Critical infrastructure, large open-air storage, factories with valuable assets, 24/7 operations. Medium: Standard corporate HQs. Low: Offices in shared buildings."
                },
                {
                    "key": "service",
                    "name": "Service / Waiter Robots",
                    "description": "Do they operate restaurants, nursing homes, or event venues where food/items need to be served to people?",
                    "reasoning_guide": "High: Restaurants, Hotels (Room Service), Nursing Homes (Meal delivery). Low: B2B manufacturing, closed offices, pure installation services."
                }
            ]
            for d in defaults:
                db.add(RoboticsCategory(**d))
            db.commit()
            print("Seeded Robotics Categories.")
    except Exception as e:
        print(f"Error seeding robotics defaults: {e}")
    finally:
        db.close()
 def get_db():
    db = SessionLocal()
--- a/company-explorer/backend/lib/core_utils.py
+++ b/company-explorer/backend/lib/core_utils.py
@@ -3,8 +3,11 @@ import logging
 import random
 import os
 import re
 import unicodedata
 from urllib.parse import urlparse
 from functools import wraps
 from typing import Optional, Union, List
 from thefuzz import fuzz
 # Versuche neue Google GenAI Lib (v1.0+)
 try:
@@ -64,6 +67,10 @@ def clean_text(text: str) -> str:
    if not text:
        return ""
    text = str(text).strip()
    # Normalize unicode characters
    text = unicodedata.normalize('NFKC', text)
    # Remove control characters
    text = "".join(ch for ch in text if unicodedata.category(ch)[0] != "C")
    text = re.sub(r'\s+', ' ', text)
    return text
@@ -71,8 +78,104 @@ def normalize_string(s: str) -> str:
    """Basic normalization (lowercase, stripped)."""
    return s.lower().strip() if s else ""
 def simple_normalize_url(url: str) -> str:
    """Normalizes a URL to its core domain (e.g. 'https://www.example.com/foo' -> 'example.com')."""
    if not url or url.lower() in ["k.a.", "nan", "none"]:
        return "k.A."
    # Ensure protocol for urlparse
    if not url.startswith(('http://', 'https://')):
        url = 'http://' + url
    try:
        parsed = urlparse(url)
        domain = parsed.netloc or parsed.path
        # Remove www.
        if domain.startswith('www.'):
            domain = domain[4:]
        return domain.lower()
    except Exception:
        return "k.A."
 def normalize_company_name(name: str) -> str:
    """Normalizes a company name by removing legal forms and special characters."""
    if not name:
        return ""
    name = name.lower()
    # Remove common legal forms
    legal_forms = [
        r'\bgmbh\b', r'\bag\b', r'\bkg\b', r'\bohg\b', r'\bug\b', r'\bltd\b', 
        r'\bllc\b', r'\binc\b', r'\bcorp\b', r'\bco\b', r'\b& co\b', r'\be\.v\.\b'
    ]
    for form in legal_forms:
        name = re.sub(form, '', name)
    # Remove special chars and extra spaces
    name = re.sub(r'[^\w\s]', '', name)
    name = re.sub(r'\s+', ' ', name).strip()
    return name
 def extract_numeric_value(raw_value: str, is_umsatz: bool = False) -> str:
    """
    Extracts a numeric value from a string, handling 'Mio', 'Mrd', etc.
    Returns string representation of the number or 'k.A.'.
    """
    if not raw_value:
        return "k.A."
    raw_value = str(raw_value).strip().lower()
    if raw_value in ["k.a.", "nan", "none"]:
        return "k.A."
    # Simple multiplier handling
    multiplier = 1.0
    if 'mrd' in raw_value or 'billion' in raw_value:
        multiplier = 1000.0 if is_umsatz else 1000000000.0
    elif 'mio' in raw_value or 'million' in raw_value:
        multiplier = 1.0 if is_umsatz else 1000000.0
    elif 'tsd' in raw_value or 'thousand' in raw_value:
        multiplier = 0.001 if is_umsatz else 1000.0
    # Extract number
    # Matches 123,45 or 123.45
    matches = re.findall(r'(\d+[.,]?\d*)', raw_value)
    if not matches:
        return "k.A."
    try:
        # Take the first number found
        num_str = matches[0].replace(',', '.')
        # Fix for thousands separator if like 1.000.000 -> 1000000
        if num_str.count('.') > 1:
            num_str = num_str.replace('.', '')
        val = float(num_str) * multiplier
        # Round appropriately
        if is_umsatz:
            # Return in millions, e.g. "250.5"
            return f"{val:.2f}".rstrip('0').rstrip('.')
        else:
            # Return integer for employees
            return str(int(val))
    except ValueError:
        return "k.A."
 def fuzzy_similarity(str1: str, str2: str) -> float:
    """Returns fuzzy similarity between two strings (0.0 to 1.0)."""
    if not str1 or not str2:
        return 0.0
    return fuzz.ratio(str1, str2) / 100.0
 # ==============================================================================
 # 3. LLM WRAPPER (GEMINI)
 # ==============================================================================
@retry_on_failure(max_retries=3)
--- a/company-explorer/backend/services/classification.py
+++ b/company-explorer/backend/services/classification.py
@@ -4,6 +4,7 @@ import os
 from typing import Dict, Any, List
 from ..lib.core_utils import call_gemini
 from ..config import settings
 from ..database import SessionLocal, RoboticsCategory
 logger = logging.getLogger(__name__)
@@ -21,6 +22,27 @@ class ClassificationService:
            logger.error(f"Failed to load allowed industries: {e}")
            return ["Sonstige"]
    def _get_category_prompts(self) -> str:
        """
        Fetches the latest category definitions from the database.
        """
        db = SessionLocal()
        try:
            categories = db.query(RoboticsCategory).all()
            if not categories:
                return "Error: No categories defined."
            prompt_parts = []
            for cat in categories:
                prompt_parts.append(f"* **{cat.name} ({cat.key}):**\n     - Definition: {cat.description}\n     - Scoring Guide: {cat.reasoning_guide}")
            return "\n".join(prompt_parts)
        except Exception as e:
            logger.error(f"Error fetching categories: {e}")
            return "Error loading categories."
        finally:
            db.close()
    def analyze_robotics_potential(self, company_name: str, website_text: str) -> Dict[str, Any]:
        """
        Analyzes the company for robotics potential based on website content.
@@ -28,36 +50,49 @@ class ClassificationService:
        """
        if not website_text or len(website_text) < 100:
            return {"error": "Insufficient text content"}
        category_guidance = self._get_category_prompts()
        prompt = f"""
-        You are a Senior B2B Market Analyst for 'Roboplanet', a robotics distributor.
+        You are a Senior B2B Market Analyst for 'Roboplanet', a specialized robotics distributor.
-        Your job is to analyze a target company based on their website text and determine their potential for using robots.
+        Your task is to analyze a target company based on their website text to determine their **operational need** for service robotics.
        --- TARGET COMPANY ---
        Name: {company_name}
        Website Content (Excerpt):
-        {website_text[:15000]} 
+        {website_text[:20000]} 
        --- ALLOWED INDUSTRIES (STRICT) ---
        You MUST assign the company to exactly ONE of these industries. If unsure, choose the closest match or "Sonstige".
        {json.dumps(self.allowed_industries, ensure_ascii=False)}
-        --- ANALYSIS TASKS ---
+        --- ANALYSIS GUIDELINES (CHAIN OF THOUGHT) ---
-        1. **Industry Classification:** Pick one from the list.
+        1. **Infrastructure Analysis:** What physical assets does this company likely operate based on their business model? 
-        2. **Robotics Potential Scoring (0-100):**
+           - Factories / Production Plants? (-> Needs Cleaning, Security, Intralogistics)
-           - **Cleaning:** Does the company manage large floors, hospitals, hotels, or public spaces? (Keywords: Hygiene, Cleaning, SPA, Facility Management)
+           - Large Warehouses? (-> Needs Intralogistics, Security, Floor Washing)
-           - **Transport/Logistics:** Do they move goods internally? (Keywords: Warehouse, Intralogistics, Production line, Hospital logistics)
+           - Offices / Headquarters? (-> Needs Vacuuming, Window Cleaning)
-           - **Security:** Do they have large perimeters or night patrols? (Keywords: Werkschutz, Security, Monitoring)
+           - Critical Infrastructure (Solar Parks, Wind Farms)? (-> Needs Perimeter Security, Inspection)
-           - **Service:** Do they interact with guests/patients? (Keywords: Reception, Restaurant, Nursing)
+           - Hotels / Hospitals? (-> Needs Service, Cleaning, Transport)
-        3. **Explanation:** A short, strategic reason for the scoring (German).
+        2. **Provider vs. User Distinction (CRITICAL):**
           - If a company SELLS cleaning products (e.g., 3M, Henkel), they do NOT necessarily have a higher need for cleaning robots than any other manufacturer. Do not score them high just because the word "cleaning" appears. Score them based on their *factories*.
           - If a company SELLS security services, they might be a potential PARTNER, but check if they *manage* sites.
        3. **Scale Assessment:** 
           - 5 locations implies more need than 1. 
           - "Global player" implies large facilities.
        --- SCORING CATEGORIES (0-100) ---
        Based on the current strategic focus of Roboplanet:
        {category_guidance}
        --- OUTPUT FORMAT (JSON ONLY) ---
        {{
            "industry": "String (from list)",
-            "summary": "Short business summary (German)",
+            "summary": "Concise analysis of their infrastructure and business model (German)",
            "potentials": {{
-                "cleaning": {{ "score": 0-100, "reason": "..." }},
+                "cleaning": {{ "score": 0-100, "reason": "Specific reasoning based on infrastructure (e.g. 'Operates 5 production plants in DE')." }},
                "transport": {{ "score": 0-100, "reason": "..." }},
                "security": {{ "score": 0-100, "reason": "..." }},
                "service": {{ "score": 0-100, "reason": "..." }}
@@ -69,7 +104,7 @@ class ClassificationService:
            response_text = call_gemini(
                prompt=prompt,
                json_mode=True,
-                temperature=0.2 # Low temp for consistency
+                temperature=0.1 # Very low temp for analytical reasoning
            )
            return json.loads(response_text)
        except Exception as e:
--- a/company-explorer/backend/services/discovery.py
+++ b/company-explorer/backend/services/discovery.py
@@ -5,6 +5,7 @@ from typing import Optional, Dict, Tuple
 from urllib.parse import urlparse
 from ..config import settings
 from ..lib.core_utils import retry_on_failure, normalize_string
 from .wikipedia_service import WikipediaService
 logger = logging.getLogger(__name__)
@@ -21,6 +22,9 @@ class DiscoveryService:
        self.api_key = settings.SERP_API_KEY
        if not self.api_key:
            logger.warning("SERP_API_KEY not set. Discovery features will fail.")
        # Initialize the specialized Wikipedia Service
        self.wiki_service = WikipediaService()
    @retry_on_failure(max_retries=2)
    def find_company_website(self, company_name: str, city: Optional[str] = None) -> str:
@@ -67,42 +71,42 @@ class DiscoveryService:
            return "k.A."
    @retry_on_failure(max_retries=2)
-    def find_wikipedia_url(self, company_name: str) -> str:
+    def find_wikipedia_url(self, company_name: str, website: str = None, city: str = None) -> str:
        """
-        Searches for a specific German Wikipedia article.
+        Searches for a specific German Wikipedia article using the robust WikipediaService.
        Includes validation via website domain and city.
        """
        if not self.api_key:
            return "k.A."
        query = f"{company_name} Wikipedia"
        try:
-            params = {
+            # Delegate to the robust service
-                "engine": "google",
+            # parent_name could be added if available in the future
-                "q": query,
+            page = self.wiki_service.search_company_article(
-                "api_key": self.api_key,
+                company_name=company_name,
-                "num": 3,
+                website=website,
-                "gl": "de",
+                crm_city=city
-                "hl": "de"
+            )
-            }
+            
-            response = requests.get("https://serpapi.com/search", params=params, timeout=15)
+            if page:
-            response.raise_for_status()
+                return page.url
            data = response.json()
            for result in data.get("organic_results", []):
                link = result.get("link", "")
                if "de.wikipedia.org/wiki/" in link:
                    # Basic validation: Is the title roughly the company?
                    title = result.get("title", "").replace(" – Wikipedia", "")
                    if self._check_name_similarity(company_name, title):
                        return link
            return "k.A."
        except Exception as e:
-            logger.error(f"Wiki Search Error: {e}")
+            logger.error(f"Wiki Search Error via Service: {e}")
            return "k.A."
    def extract_wikipedia_data(self, url: str) -> dict:
        """
        Extracts full company data from a given Wikipedia URL.
        """
        try:
            return self.wiki_service.extract_company_data(url)
        except Exception as e:
            logger.error(f"Wiki Extraction Error for {url}: {e}")
            return {"url": url, "error": str(e)}
    def _is_credible_url(self, url: str) -> bool:
        """Filters out social media, directories, and junk."""
        if not url: return False
@@ -118,9 +122,3 @@ class DiscoveryService:
        except:
            return False
    def _check_name_similarity(self, name1: str, name2: str) -> bool:
        """Simple fuzzy check for validation."""
        n1 = normalize_string(name1)
        n2 = normalize_string(name2)
        # Very permissive: if one is contained in the other
        return n1 in n2 or n2 in n1
--- a/company-explorer/backend/services/wikipedia_service.py
+++ b/company-explorer/backend/services/wikipedia_service.py
@@ -0,0 +1,448 @@
 #!/usr/bin/env python3
 """
 wikipedia_service.py
 Service class for interacting with Wikipedia, including search,
 validation, and extraction of company data.
 """
 import logging
 import re
 from urllib.parse import unquote
 import requests
 import wikipedia
 from bs4 import BeautifulSoup
 # Import settings and helpers
 from ..config import settings
 from ..lib.core_utils import (
    retry_on_failure, 
    simple_normalize_url,
    normalize_company_name, 
    extract_numeric_value,
    clean_text, 
    fuzzy_similarity
 )
 logger = logging.getLogger(__name__)
 class WikipediaService:
    """
    Handles searching for Wikipedia articles and extracting relevant
    company data. Includes validation logic for articles.
    """
    def __init__(self, user_agent=None):
        """
        Initialize the scraper with a requests session.
        """
        self.user_agent = user_agent or 'Mozilla/5.0 (compatible; CompanyExplorer/1.0; +http://www.example.com/bot)'
        self.session = requests.Session()
        self.session.headers.update({'User-Agent': self.user_agent})
        self.keywords_map = {
            'branche': ['branche', 'wirtschaftszweig', 'industry', 'taetigkeit', 'sektor', 'produkte', 'leistungen'],
            'umsatz': ['umsatz', 'erloes', 'revenue', 'jahresumsatz', 'konzernumsatz', 'ergebnis'],
            'mitarbeiter': ['mitarbeiter', 'mitarbeiterzahl', 'beschaeftigte', 'employees', 'number of employees', 'personal', 'belegschaft'],
            'sitz': ['sitz', 'hauptsitz', 'unternehmenssitz', 'firmensitz', 'headquarters', 'standort', 'sitz des unternehmens', 'anschrift', 'adresse']
        }
        try:
            # Default to German for now, could be configurable
            wiki_lang = 'de'
            wikipedia.set_lang(wiki_lang)
            wikipedia.set_rate_limiting(False)
            logger.info(f"Wikipedia library language set to '{wiki_lang}'. Rate limiting DISABLED.")
        except Exception as e:
            logger.warning(f"Error setting Wikipedia language or rate limiting: {e}")
    @retry_on_failure(max_retries=3)
    def serp_wikipedia_lookup(self, company_name: str, lang: str = 'de') -> str:
        """
        Searches for the best Wikipedia URL for a company using Google Search (via SerpAPI).
        Prioritizes Knowledge Graph hits and then organic results.
        Args:
            company_name (str): The name of the company to search for.
            lang (str): The language code for Wikipedia search (e.g., 'de').
        Returns:
            str: The URL of the best hit or None if nothing suitable was found.
        """
        logger.info(f"Starting SerpAPI Wikipedia search for '{company_name}'...")
        serp_key = settings.SERP_API_KEY
        if not serp_key:
            logger.warning("SerpAPI Key not configured. Skipping search.")
            return None
        query = f'site:{lang}.wikipedia.org "{company_name}"'
        params = {"engine": "google", "q": query, "api_key": serp_key, "hl": lang}
        try:
            response = requests.get("https://serpapi.com/search", params=params, timeout=15)
            response.raise_for_status()
            data = response.json()
            # 1. Check Knowledge Graph (highest priority)
            if "knowledge_graph" in data and "source" in data["knowledge_graph"]:
                source = data["knowledge_graph"]["source"]
                if "link" in source and f"{lang}.wikipedia.org" in source["link"]:
                    url = source["link"]
                    logger.info(f"  -> Hit found in Knowledge Graph: {url}")
                    return url
            # 2. Check organic results
            if "organic_results" in data:
                for result in data.get("organic_results", []):
                    link = result.get("link")
                    if link and f"{lang}.wikipedia.org/wiki/" in link:
                        logger.info(f"  -> Best organic hit found: {link}")
                        return link
            logger.warning(f"  -> No suitable Wikipedia URL found for '{company_name}' in SerpAPI results.")
            return None
        except Exception as e:
            logger.error(f"Error during SerpAPI request for '{company_name}': {e}")
            return None
    @retry_on_failure(max_retries=3)
    def _get_page_soup(self, url: str) -> BeautifulSoup:
        """
        Fetches HTML from a URL and returns a BeautifulSoup object.
        """
        if not url or not isinstance(url, str) or not url.lower().startswith(("http://", "https://")):
            logger.warning(f"_get_page_soup: Invalid URL '{str(url)[:100]}...'")
            return None
        try:
            response = self.session.get(url, timeout=15)
            response.raise_for_status()
            # Handle encoding
            response.encoding = response.apparent_encoding
            soup = BeautifulSoup(response.text, 'html.parser')
            return soup
        except Exception as e:
            logger.error(f"_get_page_soup: Error fetching or parsing HTML from {str(url)[:100]}...: {e}")
            raise e
    def _extract_first_paragraph_from_soup(self, soup: BeautifulSoup) -> str:
        """
        Extracts the first meaningful paragraph from the Wikipedia article soup.
        Mimics the sophisticated cleaning from the legacy system.
        """
        if not soup: return "k.A."
        paragraph_text = "k.A."
        try:
            content_div = soup.find('div', class_='mw-parser-output')
            search_area = content_div if content_div else soup
            paragraphs = search_area.find_all('p', recursive=False)
            if not paragraphs: paragraphs = search_area.find_all('p')
            for p in paragraphs:
                # Remove references [1], [2], etc.
                for sup in p.find_all('sup', class_='reference'): sup.decompose()
                # Remove hidden spans
                for span in p.find_all('span', style=lambda v: v and 'display:none' in v): span.decompose()
                # Remove coordinates
                for span in p.find_all('span', id='coordinates'): span.decompose()
                text = clean_text(p.get_text(separator=' ', strip=True))
                # Filter out meta-paragraphs or too short ones
                if text != "k.A." and len(text) > 50 and not re.match(r'^(Datei:|Abbildung:|Siehe auch:|Einzelnachweise|Siehe auch|Literatur)', text, re.IGNORECASE):
                    paragraph_text = text[:2000] # Limit length
                    break
        except Exception as e:
            logger.error(f"Error extracting first paragraph: {e}")
        return paragraph_text
    def extract_categories(self, soup: BeautifulSoup) -> str:
        """
        Extracts Wikipedia categories from the soup object, filtering out meta-categories.
        """
        if not soup: return "k.A."
        cats_filtered = []
        try:
            cat_div = soup.find('div', id="mw-normal-catlinks")
            if cat_div:
                ul = cat_div.find('ul')
                if ul:
                    cats = [clean_text(li.get_text()) for li in ul.find_all('li')]
                    cats_filtered = [c for c in cats if c and isinstance(c, str) and c.strip() and "kategorien:" not in c.lower()]
        except Exception as e:
            logger.error(f"Error extracting categories: {e}")
        return ", ".join(cats_filtered) if cats_filtered else "k.A."
    def _validate_article(self, page, company_name: str, website: str, crm_city: str, parent_name: str = None) -> bool:
        """
        Validates fact-based whether a Wikipedia article matches the company.
        Prioritizes hard facts (Domain, City) over pure name similarity.
        """
        if not page or not hasattr(page, 'html'):
            return False
        logger.debug(f"Validating article '{page.title}' for company '{company_name}'...")
        try:
            page_html = page.html()
            soup = BeautifulSoup(page_html, 'html.parser')
        except Exception as e:
            logger.error(f"Could not parse HTML for article '{page.title}': {e}")
            return False
        # --- Stage 1: Website Domain Validation (very strong signal) ---
        normalized_domain = simple_normalize_url(website)
        if normalized_domain != "k.A.":
            # Search for domain in "External links" section or infobox
            external_links = soup.select('.external, .infobox a[href*="."]')
            for link in external_links:
                href = link.get('href', '')
                if normalized_domain in href:
                    logger.info(f"  => VALIDATION SUCCESS (Domain Match): Domain '{normalized_domain}' found in links.")
                    return True
        # --- Stage 2: City Validation (strong signal) ---
        if crm_city and crm_city.lower() != 'k.a.':
            infobox_sitz_raw = self._extract_infobox_value(soup, 'sitz')
            if infobox_sitz_raw and infobox_sitz_raw.lower() != 'k.a.':
                if crm_city.lower() in infobox_sitz_raw.lower():
                    logger.info(f"  => VALIDATION SUCCESS (City Match): CRM City '{crm_city}' found in Infobox City '{infobox_sitz_raw}'.")
                    return True
        # --- Stage 3: Parent Validation ---
        normalized_parent = normalize_company_name(parent_name) if parent_name else None
        if normalized_parent:
            page_content_for_check = (page.title + " " + page.summary).lower()
            if normalized_parent in page_content_for_check:
                logger.info(f"  => VALIDATION SUCCESS (Parent Match): Parent Name '{parent_name}' found in article.")
                return True
        # --- Stage 4: Name Similarity (Fallback with stricter rules) ---
        normalized_company = normalize_company_name(company_name)
        normalized_title = normalize_company_name(page.title)
        similarity = fuzzy_similarity(normalized_title, normalized_company)
        if similarity > 0.85: # Stricter threshold
            logger.info(f"  => VALIDATION SUCCESS (High Similarity): High name similarity ({similarity:.2f}).")
            return True
        logger.debug(f"  => VALIDATION FAILED: No hard fact (Domain, City, Parent) and similarity ({similarity:.2f}) too low.")
        return False
    def search_company_article(self, company_name: str, website: str = None, crm_city: str = None, parent_name: str = None):
        """
        Searches and validates a matching Wikipedia article using the 'Google-First' strategy.
        1. Finds the best URL via SerpAPI.
        2. Validates the found article with hard facts.
        """
        if not company_name:
            return None
        logger.info(f"Starting 'Google-First' Wikipedia search for '{company_name}'...")
        # 1. Find the best URL candidate via Google Search
        url_candidate = self.serp_wikipedia_lookup(company_name)
        if not url_candidate:
            logger.warning(f"  -> No URL found via SerpAPI. Search aborted.")
            return None
        # 2. Load and validate the found article
        try:
            page_title = unquote(url_candidate.split('/wiki/')[-1].replace('_', ' '))
            page = wikipedia.page(title=page_title, auto_suggest=False, redirect=True)
            # Use the new fact-based validation
            if self._validate_article(page, company_name, website, crm_city, parent_name):
                logger.info(f"  -> Article '{page.title}' successfully validated.")
                return page
            else:
                logger.warning(f"  -> Article '{page.title}' could not be validated.")
                return None
        except wikipedia.exceptions.PageError:
            logger.error(f"  -> Error: Found URL '{url_candidate}' did not lead to a valid Wikipedia page.")
            return None
        except Exception as e:
            logger.error(f"  -> Unexpected error processing page '{url_candidate}': {e}")
            return None
    def _extract_infobox_value(self, soup: BeautifulSoup, target: str) -> str:
        """
        Targetedly extracts values (Industry, Revenue, etc.) from the infobox.
        """
        if not soup or target not in self.keywords_map:
            return "k.A."
        keywords = self.keywords_map[target]
        infobox = soup.select_one('table[class*="infobox"]')
        if not infobox: return "k.A."
        value_found = "k.A."
        try:
            rows = infobox.find_all('tr')
            for row in rows:
                cells = row.find_all(['th', 'td'], recursive=False)
                header_text, value_cell = None, None
                if len(cells) >= 2:
                    if cells[0].name == 'th':
                        header_text, value_cell = cells[0].get_text(strip=True), cells[1]
                    elif cells[0].name == 'td' and cells[1].name == 'td':
                        style = cells[0].get('style', '').lower()
                        is_header_like = 'font-weight' in style and ('bold' in style or '700' in style) or cells[0].find(['b', 'strong'], recursive=False)
                        if is_header_like:
                            header_text, value_cell = cells[0].get_text(strip=True), cells[1]
                if header_text and value_cell:
                    if any(kw in header_text.lower() for kw in keywords):
                        for sup in value_cell.find_all(['sup', 'span']):
                            sup.decompose()
                        raw_value_text = value_cell.get_text(separator=' ', strip=True)
                        if target == 'branche' or target == 'sitz':
                            value_found = clean_text(raw_value_text).split('\n')[0].strip()
                        elif target == 'umsatz':
                            value_found = extract_numeric_value(raw_value_text, is_umsatz=True)
                        elif target == 'mitarbeiter':
                            value_found = extract_numeric_value(raw_value_text, is_umsatz=False)
                        value_found = value_found if value_found else "k.A."
                        logger.info(f"        --> Infobox '{target}' found: '{value_found}'")
                        break
        except Exception as e:
            logger.error(f"Error iterating infobox rows for '{target}': {e}")
            return "k.A."
        return value_found
    def _parse_sitz_string_detailed(self, raw_sitz_string_input: str) -> dict:
        """
        Attempts to extract City and Country in detail from a raw Sitz string.
        """
        sitz_stadt_val, sitz_land_val = "k.A.", "k.A."
        if not raw_sitz_string_input or not isinstance(raw_sitz_string_input, str):
            return {'sitz_stadt': sitz_stadt_val, 'sitz_land': sitz_land_val}
        temp_sitz = raw_sitz_string_input.strip()
        if not temp_sitz or temp_sitz.lower() == "k.a.":
            return {'sitz_stadt': sitz_stadt_val, 'sitz_land': sitz_land_val}
        known_countries_detailed = {
            "deutschland": "Deutschland", "germany": "Deutschland", "de": "Deutschland",
            "österreich": "Österreich", "austria": "Österreich", "at": "Österreich",
            "schweiz": "Schweiz", "switzerland": "Schweiz", "ch": "Schweiz", "suisse": "Schweiz",
            "usa": "USA", "u.s.": "USA", "united states": "USA", "vereinigte staaten": "USA",
            "vereinigtes königreich": "Vereinigtes Königreich", "united kingdom": "Vereinigtes Königreich", "uk": "Vereinigtes Königreich",
        }
        region_to_country = {
            "nrw": "Deutschland", "nordrhein-westfalen": "Deutschland", "bayern": "Deutschland", "hessen": "Deutschland",
            "zg": "Schweiz", "zug": "Schweiz", "zh": "Schweiz", "zürich": "Schweiz",
            "ca": "USA", "california": "USA", "ny": "USA", "new york": "USA",
        }
        extracted_country = ""
        original_temp_sitz = temp_sitz
        klammer_match = re.search(r'\(([^)]+)\)$', temp_sitz)
        if klammer_match:
            suffix_in_klammer = klammer_match.group(1).strip().lower()
            if suffix_in_klammer in known_countries_detailed:
                extracted_country = known_countries_detailed[suffix_in_klammer]
                temp_sitz = temp_sitz[:klammer_match.start()].strip(" ,")
            elif suffix_in_klammer in region_to_country:
                extracted_country = region_to_country[suffix_in_klammer]
                temp_sitz = temp_sitz[:klammer_match.start()].strip(" ,")
        if not extracted_country and ',' in temp_sitz:
            parts = [p.strip() for p in temp_sitz.split(',')]
            if len(parts) > 1:
                last_part_lower = parts[-1].lower()
                if last_part_lower in known_countries_detailed:
                    extracted_country = known_countries_detailed[last_part_lower]
                    temp_sitz = ", ".join(parts[:-1]).strip(" ,")
                elif last_part_lower in region_to_country:
                    extracted_country = region_to_country[last_part_lower]
                    temp_sitz = ", ".join(parts[:-1]).strip(" ,")
        sitz_land_val = extracted_country if extracted_country else "k.A."
        sitz_stadt_val = re.sub(r'^\d{4,8}\s*', '', temp_sitz).strip(" ,")
        if not sitz_stadt_val:
            sitz_stadt_val = "k.A." if sitz_land_val != "k.A." else re.sub(r'^\d{4,8}\s*', '', original_temp_sitz).strip(" ,") or "k.A."
        return {'sitz_stadt': sitz_stadt_val, 'sitz_land': sitz_land_val}
    @retry_on_failure(max_retries=3)
    def extract_company_data(self, url_or_page) -> dict:
        """
        Extracts structured company data from a Wikipedia article (URL or page object).
        """
        default_result = {
            'url': 'k.A.', 'title': 'k.A.', 'sitz_stadt': 'k.A.', 'sitz_land': 'k.A.',
            'first_paragraph': 'k.A.', 'branche': 'k.A.', 'umsatz': 'k.A.',
            'mitarbeiter': 'k.A.', 'categories': 'k.A.', 'full_text': ''
        }
        page = None
        try:
            if isinstance(url_or_page, str) and "wikipedia.org" in url_or_page:
                page_title = unquote(url_or_page.split('/wiki/')[-1].replace('_', ' '))
                page = wikipedia.page(title=page_title, auto_suggest=False, redirect=True)
            elif not isinstance(url_or_page, str): # Assumption: it is a page object
                page = url_or_page
            else:
                logger.warning(f"extract_company_data: Invalid Input '{str(url_or_page)[:100]}...")
                return default_result
            logger.info(f"Extracting data for Wiki Article: {page.title[:100]}...")
            # Extract basic data directly from page object
            first_paragraph = page.summary.split('\n')[0] if page.summary else 'k.A.'
            categories = ", ".join(page.categories)
            full_text = page.content
            # BeautifulSoup needed for infobox and refined extraction
            soup = self._get_page_soup(page.url)
            if not soup:
                 logger.warning(f"  -> Could not load page for Soup parsing. Extracting basic data only.")
                 return {
                     'url': page.url, 'title': page.title, 'sitz_stadt': 'k.A.', 'sitz_land': 'k.A.',
                     'first_paragraph': page.summary.split('\n')[0] if page.summary else 'k.A.', 
                     'branche': 'k.A.', 'umsatz': 'k.A.',
                     'mitarbeiter': 'k.A.', 'categories': ", ".join(page.categories), 'full_text': full_text
                 }
            # Refined Extraction from Soup
            first_paragraph = self._extract_first_paragraph_from_soup(soup)
            categories = self.extract_categories(soup)
            # Extract infobox data
            branche_val = self._extract_infobox_value(soup, 'branche')
            umsatz_val = self._extract_infobox_value(soup, 'umsatz')
            mitarbeiter_val = self._extract_infobox_value(soup, 'mitarbeiter')
            raw_sitz_string = self._extract_infobox_value(soup, 'sitz')
            parsed_sitz = self._parse_sitz_string_detailed(raw_sitz_string)
            sitz_stadt_val = parsed_sitz['sitz_stadt']
            sitz_land_val = parsed_sitz['sitz_land']
            result = {
                'url': page.url,
                'title': page.title,
                'sitz_stadt': sitz_stadt_val,
                'sitz_land': sitz_land_val,
                'first_paragraph': first_paragraph,
                'branche': branche_val,
                'umsatz': umsatz_val,
                'mitarbeiter': mitarbeiter_val,
                'categories': categories,
                'full_text': full_text
            }
            logger.info(f"  -> Extracted Data: City='{sitz_stadt_val}', Country='{sitz_land_val}', Rev='{umsatz_val}', Emp='{mitarbeiter_val}'")
            return result
        except wikipedia.exceptions.PageError:
            logger.error(f"  -> Error: Wikipedia article for '{str(url_or_page)[:100]}' could not be found (PageError).")
            return {**default_result, 'url': str(url_or_page) if isinstance(url_or_page, str) else 'k.A.'}
        except Exception as e:
            logger.error(f"  -> Unexpected error extracting from '{str(url_or_page)[:100]}': {e}")
            return {**default_result, 'url': str(url_or_page) if isinstance(url_or_page, str) else 'k.A.'}
--- a/company-explorer/frontend/src/App.tsx
+++ b/company-explorer/frontend/src/App.tsx
@@ -2,8 +2,9 @@ import { useState, useEffect } from 'react'
 import axios from 'axios'
 import { CompanyTable } from './components/CompanyTable'
 import { ImportWizard } from './components/ImportWizard'
-import { Inspector } from './components/Inspector' // NEW
+import { Inspector } from './components/Inspector'
-import { LayoutDashboard, UploadCloud, Search, RefreshCw } from 'lucide-react'
+import { RoboticsSettings } from './components/RoboticsSettings' // NEW
 import { LayoutDashboard, UploadCloud, Search, RefreshCw, Settings } from 'lucide-react'
 // Base URL detection (Production vs Dev)
 const API_BASE = import.meta.env.BASE_URL === '/ce/' ? '/ce/api' : '/api';
@@ -16,7 +17,8 @@ function App() {
  const [stats, setStats] = useState<Stats>({ total: 0 })
  const [refreshKey, setRefreshKey] = useState(0)
  const [isImportOpen, setIsImportOpen] = useState(false)
-  const [selectedCompanyId, setSelectedCompanyId] = useState<number | null>(null) // NEW
+  const [isSettingsOpen, setIsSettingsOpen] = useState(false) // NEW
  const [selectedCompanyId, setSelectedCompanyId] = useState<number | null>(null)
  const fetchStats = async () => {
    try {
@@ -48,6 +50,13 @@ function App() {
        onSuccess={() => setRefreshKey(k => k + 1)}
      />
      {/* Robotics Logic Settings */}
      <RoboticsSettings
        isOpen={isSettingsOpen}
        onClose={() => setIsSettingsOpen(false)}
        apiBase={API_BASE}
      />
      {/* Inspector Sidebar */}
      <Inspector 
        companyId={selectedCompanyId} 
@@ -73,6 +82,14 @@ function App() {
              <span className="text-white font-bold">{stats.total}</span> Companies
            </div>
            <button 
              onClick={() => setIsSettingsOpen(true)}
              className="p-2 hover:bg-slate-800 rounded-full transition-colors text-slate-400 hover:text-white"
              title="Configure Robotics Logic"
            >
              <Settings className="h-5 w-5" />
            </button>
            <button 
              onClick={() => setRefreshKey(k => k + 1)}
              className="p-2 hover:bg-slate-800 rounded-full transition-colors text-slate-400 hover:text-white"
--- a/company-explorer/frontend/src/components/Inspector.tsx
+++ b/company-explorer/frontend/src/components/Inspector.tsx
@@ -1,6 +1,6 @@
 import { useEffect, useState } from 'react'
 import axios from 'axios'
-import { X, ExternalLink, Robot, Briefcase, Calendar } from 'lucide-react'
+import { X, ExternalLink, Bot, Briefcase, Calendar, Globe, Users, DollarSign, MapPin, Tag, RefreshCw as RefreshCwIcon, Search as SearchIcon, Pencil, Check } from 'lucide-react'
 import clsx from 'clsx'
 interface InspectorProps {
@@ -16,6 +16,12 @@ type Signal = {
  proof_text: string
 }
 type EnrichmentData = {
  source_type: string
  content: any
  is_locked?: boolean
 }
 type CompanyDetail = {
  id: number
  name: string
@@ -24,25 +30,99 @@ type CompanyDetail = {
  status: string
  created_at: string
  signals: Signal[]
  enrichment_data: EnrichmentData[]
 }
 export function Inspector({ companyId, onClose, apiBase }: InspectorProps) {
  const [data, setData] = useState<CompanyDetail | null>(null)
  const [loading, setLoading] = useState(false)
  const [isProcessing, setIsProcessing] = useState(false)
  // Manual Override State
  const [isEditingWiki, setIsEditingWiki] = useState(false)
  const [wikiUrlInput, setWikiUrlInput] = useState("")
  const [isEditingWebsite, setIsEditingWebsite] = useState(false)
  const [websiteInput, setWebsiteInput] = useState("")
-  useEffect(() => {
+  const fetchData = () => {
    if (!companyId) return
    setLoading(true)
    axios.get(`${apiBase}/companies/${companyId}`)
      .then(res => setData(res.data))
      .catch(console.error)
      .finally(() => setLoading(false))
  }
  useEffect(() => {
    fetchData()
    setIsEditingWiki(false)
    setIsEditingWebsite(false)
  }, [companyId])
  const handleDiscover = async () => {
    if (!companyId) return
    setIsProcessing(true)
    try {
      await axios.post(`${apiBase}/enrich/discover`, { company_id: companyId })
      setTimeout(fetchData, 3000)
    } catch (e) {
      console.error(e)
    } finally {
      setIsProcessing(false)
    }
  }
  const handleAnalyze = async () => {
    if (!companyId) return
    setIsProcessing(true)
    try {
      await axios.post(`${apiBase}/enrich/analyze`, { company_id: companyId })
      setTimeout(fetchData, 5000)
    } catch (e) {
      console.error(e)
    } finally {
      setIsProcessing(false)
    }
  }
  const handleWikiOverride = async () => {
    if (!companyId) return
    setIsProcessing(true)
    try {
        await axios.post(`${apiBase}/companies/${companyId}/override/wiki?url=${encodeURIComponent(wikiUrlInput)}`)
        setIsEditingWiki(false)
        fetchData()
    } catch (e) {
        alert("Update failed")
        console.error(e)
    } finally {
        setIsProcessing(false)
    }
  }
  const handleWebsiteOverride = async () => {
    if (!companyId) return
    setIsProcessing(true)
    try {
        await axios.post(`${apiBase}/companies/${companyId}/override/website?url=${encodeURIComponent(websiteInput)}`)
        setIsEditingWebsite(false)
        fetchData()
    } catch (e) {
        alert("Update failed")
        console.error(e)
    } finally {
        setIsProcessing(false)
    }
  }
  if (!companyId) return null
  const wikiEntry = data?.enrichment_data?.find(e => e.source_type === 'wikipedia')
  const wiki = wikiEntry?.content
  const isLocked = wikiEntry?.is_locked
  return (
-    <div className="fixed inset-y-0 right-0 w-[500px] bg-slate-900 border-l border-slate-800 shadow-2xl transform transition-transform duration-300 ease-in-out z-40 overflow-y-auto">
+    <div className="fixed inset-y-0 right-0 w-[550px] bg-slate-900 border-l border-slate-800 shadow-2xl transform transition-transform duration-300 ease-in-out z-40 overflow-y-auto">
      {loading ? (
        <div className="p-8 text-slate-500">Loading details...</div>
      ) : !data ? (
@@ -53,30 +133,241 @@ export function Inspector({ companyId, onClose, apiBase }: InspectorProps) {
          <div className="p-6 border-b border-slate-800 bg-slate-950/50">
            <div className="flex justify-between items-start mb-4">
              <h2 className="text-xl font-bold text-white leading-tight">{data.name}</h2>
-              <button onClick={onClose} className="text-slate-400 hover:text-white">
+              <div className="flex items-center gap-2">
-                <X className="h-6 w-6" />
+                <button 
-              </button>
+                  onClick={fetchData} 
                  className="p-1.5 text-slate-500 hover:text-white transition-colors"
                  title="Refresh"
                >
                  <RefreshCwIcon className={clsx("h-4 w-4", loading && "animate-spin")} />
                </button>
                <button onClick={onClose} className="p-1.5 text-slate-400 hover:text-white transition-colors">
                  <X className="h-6 w-6" />
                </button>
              </div>
            </div>
-            <div className="flex flex-wrap gap-2 text-sm">
+            <div className="flex flex-wrap gap-2 text-sm items-center">
-              {data.website && (
+              {!isEditingWebsite ? (
-                <a href={data.website} target="_blank" className="flex items-center gap-1 text-blue-400 hover:underline">
+                  <div className="flex items-center gap-2">
-                  <ExternalLink className="h-3 w-3" /> {new URL(data.website).hostname.replace('www.', '')}
+                    {data.website && data.website !== "k.A." ? (
-                </a>
+                        <a href={data.website} target="_blank" className="flex items-center gap-1 text-blue-400 hover:text-blue-300 transition-colors">
                        <ExternalLink className="h-3 w-3" /> {new URL(data.website).hostname.replace('www.', '')}
                        </a>
                    ) : (
                        <span className="text-slate-500 italic">No website</span>
                    )}
                    <button 
                        onClick={() => { setWebsiteInput(data.website && data.website !== "k.A." ? data.website : ""); setIsEditingWebsite(true); }}
                        className="p-1 text-slate-600 hover:text-white transition-colors"
                        title="Edit Website URL"
                    >
                        <Pencil className="h-3 w-3" />
                    </button>
                  </div>
              ) : (
                  <div className="flex items-center gap-1 animate-in fade-in zoom-in duration-200">
                      <input 
                        type="text" 
                        value={websiteInput}
                        onChange={e => setWebsiteInput(e.target.value)}
                        placeholder="https://..."
                        className="bg-slate-800 border border-slate-700 rounded px-2 py-0.5 text-xs text-white focus:ring-1 focus:ring-blue-500 outline-none w-48"
                        autoFocus
                      />
                      <button 
                        onClick={handleWebsiteOverride}
                        className="p-1 bg-green-900/50 text-green-400 rounded hover:bg-green-900 transition-colors"
                      >
                          <Check className="h-3 w-3" />
                      </button>
                      <button 
                        onClick={() => setIsEditingWebsite(false)}
                        className="p-1 text-slate-500 hover:text-red-400 transition-colors"
                      >
                          <X className="h-3 w-3" />
                      </button>
                  </div>
              )}
              {data.industry_ai && (
                <span className="flex items-center gap-1 px-2 py-0.5 bg-slate-800 text-slate-300 rounded border border-slate-700">
                  <Briefcase className="h-3 w-3" /> {data.industry_ai}
                </span>
              )}
              <span className={clsx(
                "px-2 py-0.5 rounded text-[10px] font-bold uppercase tracking-wider",
                data.status === 'ENRICHED' ? "bg-green-900/40 text-green-400 border border-green-800/50" : 
                data.status === 'DISCOVERED' ? "bg-blue-900/40 text-blue-400 border border-blue-800/50" :
                "bg-slate-800 text-slate-400 border border-slate-700"
              )}>
                {data.status}
              </span>
            </div>
            {/* Action Bar */}
            <div className="mt-6 flex gap-2">
              <button 
                onClick={handleDiscover}
                disabled={isProcessing}
                className="flex-1 flex items-center justify-center gap-2 bg-slate-800 hover:bg-slate-700 disabled:opacity-50 text-white text-xs font-bold py-2 rounded-md border border-slate-700 transition-all"
              >
                <SearchIcon className="h-3.5 w-3.5" />
                {isProcessing ? "Processing..." : "DISCOVER"}
              </button>
              <button 
                onClick={handleAnalyze}
                disabled={isProcessing || !data.website || data.website === 'k.A.'}
                className="flex-1 flex items-center justify-center gap-2 bg-blue-600 hover:bg-blue-500 disabled:opacity-50 text-white text-xs font-bold py-2 rounded-md transition-all shadow-lg shadow-blue-900/20"
              >
                <Bot className="h-3.5 w-3.5" />
                {isProcessing ? "Analyzing..." : "ANALYZE POTENTIAL"}
              </button>
            </div>
          </div>
-          {/* Robotics Scorecard */}
+          <div className="p-6 space-y-8">
-          <div className="p-6 space-y-6">
+            {/* Wikipedia Section */}
            <div className="space-y-4">
               <div className="flex items-center justify-between">
                  <h3 className="text-sm font-semibold text-slate-400 uppercase tracking-wider flex items-center gap-2">
                    <Globe className="h-4 w-4" /> Company Profile (Wikipedia)
                  </h3>
                  {!isEditingWiki ? (
                      <button 
                        onClick={() => { setWikiUrlInput(wiki?.url || ""); setIsEditingWiki(true); }}
                        className="p-1 text-slate-500 hover:text-blue-400 transition-colors"
                        title="Edit / Override URL"
                      >
                          <Pencil className="h-3.5 w-3.5" />
                      </button>
                  ) : (
                      <div className="flex items-center gap-1">
                          <button 
                            onClick={handleWikiOverride}
                            className="p-1 bg-green-900/50 text-green-400 rounded hover:bg-green-900 transition-colors"
                            title="Save & Rescan"
                          >
                              <Check className="h-3.5 w-3.5" />
                          </button>
                          <button 
                            onClick={() => setIsEditingWiki(false)}
                            className="p-1 text-slate-500 hover:text-red-400 transition-colors"
                            title="Cancel"
                          >
                              <X className="h-3.5 w-3.5" />
                          </button>
                      </div>
                  )}
               </div>
               {isEditingWiki && (
                   <div className="mb-2">
                       <input 
                         type="text" 
                         value={wikiUrlInput} 
                         onChange={e => setWikiUrlInput(e.target.value)}
                         placeholder="Paste Wikipedia URL here..."
                         className="w-full bg-slate-800 border border-slate-700 rounded px-2 py-1 text-sm text-white focus:ring-1 focus:ring-blue-500 outline-none"
                       />
                       <p className="text-[10px] text-slate-500 mt-1">Paste a valid URL. Saving will trigger a re-scan.</p>
                   </div>
               )}
            {wiki && wiki.url !== 'k.A.' && !isEditingWiki ? (
              <div>
                {/* ... existing wiki content ... */}
                <div className="bg-slate-800/30 rounded-xl p-5 border border-slate-800/50 relative overflow-hidden">
                  <div className="absolute top-0 right-0 p-3 opacity-10">
                    <Globe className="h-16 w-16" />
                  </div>
                  {isLocked && (
                      <div className="absolute top-2 right-2 flex items-center gap-1 px-1.5 py-0.5 bg-yellow-900/30 border border-yellow-800/50 rounded text-[9px] text-yellow-500">
                          <Tag className="h-2.5 w-2.5" /> Manual Override
                      </div>
                  )}
                  <p className="text-sm text-slate-300 leading-relaxed italic mb-4">
                    "{wiki.first_paragraph}"
                  </p>
                  <div className="grid grid-cols-2 gap-y-4 gap-x-6">
                    <div className="flex items-center gap-3">
                      <div className="p-2 bg-slate-900 rounded-lg text-blue-400">
                        <Users className="h-4 w-4" />
                      </div>
                      <div>
                        <div className="text-[10px] text-slate-500 uppercase font-bold tracking-tight">Employees</div>
                        <div className="text-sm text-slate-200 font-medium">{wiki.mitarbeiter || 'k.A.'}</div>
                      </div>
                    </div>
                    <div className="flex items-center gap-3">
                      <div className="p-2 bg-slate-900 rounded-lg text-green-400">
                        <DollarSign className="h-4 w-4" />
                      </div>
                      <div>
                        <div className="text-[10px] text-slate-500 uppercase font-bold tracking-tight">Revenue</div>
                        <div className="text-sm text-slate-200 font-medium">{wiki.umsatz ? `${wiki.umsatz} Mio. €` : 'k.A.'}</div>
                      </div>
                    </div>
                    <div className="flex items-center gap-3">
                      <div className="p-2 bg-slate-900 rounded-lg text-orange-400">
                        <MapPin className="h-4 w-4" />
                      </div>
                      <div>
                        <div className="text-[10px] text-slate-500 uppercase font-bold tracking-tight">Headquarters</div>
                        <div className="text-sm text-slate-200 font-medium">{wiki.sitz_stadt}{wiki.sitz_land ? `, ${wiki.sitz_land}` : ''}</div>
                      </div>
                    </div>
                    <div className="flex items-center gap-3">
                      <div className="p-2 bg-slate-900 rounded-lg text-purple-400">
                        <Briefcase className="h-4 w-4" />
                      </div>
                      <div>
                        <div className="text-[10px] text-slate-500 uppercase font-bold tracking-tight">Wiki Industry</div>
                        <div className="text-sm text-slate-200 font-medium truncate max-w-[150px]" title={wiki.branche}>{wiki.branche || 'k.A.'}</div>
                      </div>
                    </div>
                  </div>
                  {wiki.categories && wiki.categories !== 'k.A.' && (
                    <div className="mt-6 pt-5 border-t border-slate-800/50">
                      <div className="flex items-start gap-2 text-xs text-slate-500 mb-2">
                        <Tag className="h-3 w-3 mt-0.5" /> Categories
                      </div>
                      <div className="flex flex-wrap gap-1.5">
                        {wiki.categories.split(',').map((cat: string) => (
                          <span key={cat} className="px-2 py-0.5 bg-slate-900 text-slate-400 rounded-full text-[10px] border border-slate-800">
                            {cat.trim()}
                          </span>
                        ))}
                      </div>
                    </div>
                  )}
                  <div className="mt-4 flex justify-end">
                    <a href={wiki.url} target="_blank" className="text-[10px] text-blue-500 hover:text-blue-400 flex items-center gap-1 font-bold">
                      WIKIPEDIA <ExternalLink className="h-2.5 w-2.5" />
                    </a>
                  </div>
                </div>
              </div>
            ) : !isEditingWiki ? (
              <div className="p-4 rounded-xl border border-dashed border-slate-800 text-center text-slate-600">
                <Globe className="h-5 w-5 mx-auto mb-2 opacity-20" />
                <p className="text-xs">No Wikipedia profile found yet.</p>
              </div>
            ) : null}
            </div>
            {/* Robotics Scorecard */}
            <div>
              <h3 className="text-sm font-semibold text-slate-400 uppercase tracking-wider mb-3 flex items-center gap-2">
-                <Robot className="h-4 w-4" /> Robotics Potential
+                <Bot className="h-4 w-4" /> Robotics Potential
              </h3>
              <div className="grid grid-cols-2 gap-4">
@@ -110,10 +401,13 @@ export function Inspector({ companyId, onClose, apiBase }: InspectorProps) {
            </div>
            {/* Meta Info */}
-            <div className="pt-6 border-t border-slate-800">
+            <div className="pt-6 border-t border-slate-800 flex items-center justify-between">
-              <div className="text-xs text-slate-500 flex items-center gap-2">
+              <div className="text-[10px] text-slate-500 flex items-center gap-2 uppercase font-bold tracking-widest">
                <Calendar className="h-3 w-3" /> Added: {new Date(data.created_at).toLocaleDateString()}
              </div>
              <div className="text-[10px] text-slate-600 italic">
                ID: CE-{data.id.toString().padStart(4, '0')}
              </div>
            </div>
          </div>
        </div>
--- a/company-explorer/frontend/src/components/RoboticsSettings.tsx
+++ b/company-explorer/frontend/src/components/RoboticsSettings.tsx
@@ -0,0 +1,134 @@
 import { useState, useEffect } from 'react'
 import axios from 'axios'
 import { X, Save, Settings, Loader2 } from 'lucide-react'
 interface RoboticsSettingsProps {
  isOpen: boolean
  onClose: () => void
  apiBase: string
 }
 type Category = {
  id: number
  key: string
  name: string
  description: string
  reasoning_guide: string
 }
 export function RoboticsSettings({ isOpen, onClose, apiBase }: RoboticsSettingsProps) {
  const [categories, setCategories] = useState<Category[]>([])
  const [loading, setLoading] = useState(false)
  const [savingId, setSavingId] = useState<number | null>(null)
  useEffect(() => {
    if (isOpen) {
      setLoading(true)
      axios.get(`${apiBase}/robotics/categories`)
        .then(res => setCategories(res.data))
        .catch(console.error)
        .finally(() => setLoading(false))
    }
  }, [isOpen])
  const handleSave = async (cat: Category) => {
    setSavingId(cat.id)
    try {
      await axios.put(`${apiBase}/robotics/categories/${cat.id}`, {
        description: cat.description,
        reasoning_guide: cat.reasoning_guide
      })
      // Success indicator?
    } catch (e) {
      alert("Failed to save settings")
    } finally {
      setSavingId(null)
    }
  }
  const handleChange = (id: number, field: keyof Category, value: string) => {
    setCategories(prev => prev.map(c => 
      c.id === id ? { ...c, [field]: value } : c
    ))
  }
  if (!isOpen) return null
  return (
    <div className="fixed inset-0 z-50 flex items-center justify-center bg-black/80 backdrop-blur-sm">
      <div className="bg-slate-900 border border-slate-800 rounded-xl shadow-2xl w-full max-w-4xl max-h-[90vh] flex flex-col">
        {/* Header */}
        <div className="p-6 border-b border-slate-800 flex justify-between items-center bg-slate-950/50 rounded-t-xl">
          <div className="flex items-center gap-3">
            <div className="p-2 bg-blue-600/20 rounded-lg text-blue-400">
              <Settings className="h-6 w-6" />
            </div>
            <div>
              <h2 className="text-xl font-bold text-white">Robotics Logic Configuration</h2>
              <p className="text-sm text-slate-400">Define how the AI assesses potential for each category.</p>
            </div>
          </div>
          <button onClick={onClose} className="text-slate-400 hover:text-white transition-colors">
            <X className="h-6 w-6" />
          </button>
        </div>
        {/* Content */}
        <div className="flex-1 overflow-y-auto p-6 space-y-6">
          {loading ? (
            <div className="flex items-center justify-center py-20 text-slate-500">
              <Loader2 className="h-8 w-8 animate-spin" />
            </div>
          ) : (
            <div className="grid grid-cols-1 gap-6">
              {categories.map(cat => (
                <div key={cat.id} className="bg-slate-800/30 border border-slate-700/50 rounded-lg p-5">
                  <div className="flex justify-between items-start mb-4">
                    <h3 className="text-lg font-bold text-white flex items-center gap-2">
                      <span className="capitalize">{cat.name}</span>
                      <span className="text-xs font-mono text-slate-500 bg-slate-900 px-1.5 py-0.5 rounded border border-slate-800">{cat.key}</span>
                    </h3>
                    <button 
                      onClick={() => handleSave(cat)}
                      disabled={savingId === cat.id}
                      className="flex items-center gap-2 px-3 py-1.5 bg-blue-600 hover:bg-blue-500 disabled:opacity-50 text-white text-xs font-bold rounded transition-colors"
                    >
                      {savingId === cat.id ? <Loader2 className="h-3 w-3 animate-spin" /> : <Save className="h-3 w-3" />}
                      SAVE
                    </button>
                  </div>
                  <div className="grid grid-cols-1 md:grid-cols-2 gap-6">
                    <div className="space-y-2">
                      <label className="text-xs font-bold text-slate-400 uppercase tracking-wider">Definition (When to trigger?)</label>
                      <textarea
                        value={cat.description}
                        onChange={(e) => handleChange(cat.id, 'description', e.target.value)}
                        className="w-full h-32 bg-slate-950 border border-slate-700 rounded p-3 text-sm text-slate-200 focus:ring-1 focus:ring-blue-500 outline-none resize-none font-mono leading-relaxed"
                      />
                      <p className="text-[10px] text-slate-500">
                        Instructions for the AI on what business models or assets imply this need.
                      </p>
                    </div>
                    <div className="space-y-2">
                      <label className="text-xs font-bold text-slate-400 uppercase tracking-wider">Scoring Guide (High/Med/Low)</label>
                      <textarea
                        value={cat.reasoning_guide}
                        onChange={(e) => handleChange(cat.id, 'reasoning_guide', e.target.value)}
                        className="w-full h-32 bg-slate-950 border border-slate-700 rounded p-3 text-sm text-slate-200 focus:ring-1 focus:ring-blue-500 outline-none resize-none font-mono leading-relaxed"
                      />
                      <p className="text-[10px] text-slate-500">
                        Explicit examples for scoring logic to ensure consistency.
                      </p>
                    </div>
                  </div>
                </div>
              ))}
            </div>
          )}
        </div>
      </div>
    </div>
  )
 }
--- a/company-explorer/requirements.txt
+++ b/company-explorer/requirements.txt
@@ -13,3 +13,6 @@ google-genai
 pillow
 python-multipart
 python-dotenv
 wikipedia
 google-search-results
--- a/create_dashboard.py
+++ b/create_dashboard.py
@@ -1,39 +1,42 @@
-import time
+import json
 from notion_client import Client
-def final_push():
+# SETUP
-    # --- KONFIGURATION DIREKT IN DER FUNKTION ---
+TOKEN = "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"
-    token = "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"
+SECTOR_DB_ID = "59a4598a20084ddaa035f5eba750a1be"
-    database_id = "acf0e7e1-fff2-425b-81a1-00fbc76085b8" 
+
 notion = Client(auth=TOKEN)
 def inspect_via_page():
    print(f"🔍 Suche nach einer Seite in DB {SECTOR_DB_ID}...")
-    notion = Client(auth=token)
+    try:
-    
+        # 1. Wir holen uns die erste verfügbare Seite aus der Datenbank
-    print(f"🚀 Starte Injektion in DB: {database_id}")
+        response = notion.databases.query(
            database_id=SECTOR_DB_ID,
            page_size=1
        )
        results = response.get("results")
        if not results:
            print("⚠️ Keine Seiten in der Datenbank gefunden. Bitte lege manuell eine an.")
            return
-    sectors = [
+        page = results[0]
-        {"name": "Hotellerie", "desc": "Relevant für Empfang, Reinigung Zimmer, Parkplatz & Spa. Fokus auf Wellness vs. Business."},
+        print(f"✅ Seite gefunden: '{page['id']}'")
-        {"name": "Pflege & Kliniken", "desc": "Hohe Hygienestandards, Desinfektion, Transport von Mahlzeiten/Wäsche."},
+        
-        {"name": "Lager & Produktion", "desc": "Großflächenreinigung, Objektschutz (Security), Intralogistik-Transport."},
+        # 2. Wir inspizieren die Properties der Seite
-        {"name": "Einzelhandel", "desc": "Frequenzorientierte Reinigung, interaktive Verkaufsförderung (Ads), Nachtreinigung."}
+        properties = page.get("properties", {})
-    ]
+        
        print("\n--- INTERNE PROPERTY-MAP DER SEITE ---")
        print(json.dumps(properties, indent=2))
        print("\n--- ZUSAMMENFASSUNG FÜR DEINE PIPELINE ---")
        for prop_name, prop_data in properties.items():
            print(f"Spaltenname: '{prop_name}' | ID: {prop_data.get('id')} | Typ: {prop_data.get('type')}")
-    for s in sectors:
+    except Exception as e:
-        try:
+        print(f"💥 Fehler beim Inspect: {e}")
            notion.pages.create(
                parent={"database_id": database_id},
                properties={
                    "Name": {"title": [{"text": {"content": s["name"]}}]},
                    "Beschreibung": {"rich_text": [{"text": {"content": s["desc"]}}]},
                    "Art": {"select": {"name": "Sector"}}
                }
            )
            print(f"   ✅ {s['name']} wurde erfolgreich angelegt.")
            time.sleep(0.5) 
        except Exception as e:
            print(f"   ❌ Fehler bei {s['name']}: {e}")
    print("\n🏁 FERTIG. Schau jetzt in dein Notion Dashboard!")
 if __name__ == "__main__":
-    final_push()
+    inspect_via_page()