Brancheneinstufung2/GEMINI.md

# Gemini Code Assistant Context

## CRITICAL RULE: DOCUMENTATION PRESERVATION (DO NOT IGNORE)

**ES IST STRENGSTENS UNTERSAGT, DOKUMENTATION ZU LÖSCHEN ODER DURCH PLATZHALTER WIE `... (rest of the file)` ZU ERSETZEN.**

Dies ist in der Vergangenheit mehrfach passiert und hat zu massivem Datenverlust in kritischen Dateien wie `MIGRATION_PLAN.md` geführt.

**Regeln für den Agenten:**
1.  **Niemals** große Textblöcke löschen, es sei denn, der User fordert dies *explizit* an.
2.  **Immer** `git diff` prüfen, bevor ein Commit erstellt wird. Wenn eine Dokumentationsdatei 100 Zeilen verliert, ist das fast immer ein Fehler.
3.  Beim Aktualisieren von Dokumentation: **Nur** neue Informationen hinzufügen oder veraltete präzise korrigieren. **Niemals** den Rest der Datei überschreiben.
4.  Wenn du eine Datei "restoren" musst, nutze `git log -p <filename>` und stelle sicher, dass du wirklich *alles* wiederherstellst.

---

## Wichtige Hinweise

- **Projektdokumentation:** Die primäre und umfassendste Dokumentation für dieses Projekt befindet sich in der Datei `readme.md`. Bitte ziehen Sie diese Datei für ein detailliertes Verständnis der Architektur und der einzelnen Module zu Rate.
- **Git-Repository:** Dieses Projekt wird über ein Git-Repository verwaltet. Alle Änderungen am Code werden versioniert. Beachten Sie den Abschnitt "Git Workflow & Conventions" für unsere Arbeitsregeln.
    - **WICHTIG:** Der AI-Agent kann Änderungen committen, aber aus Sicherheitsgründen oft nicht `git push` ausführen. Bitte führen Sie `git push` manuell aus, wenn der Agent dies meldet.

---
## ‼️ Aktueller Projekt-Fokus (März 2026): Migration & Stabilisierung

**Das System wurde am 07. März 2026 erfolgreich stabilisiert und für den Umzug auf die Ubuntu VM (`docker1`) vorbereitet.**

Alle kritischen Komponenten (Company Explorer, Connector, Lead Engine) sind nun funktionsfähig und resilient konfiguriert.

Alle weiteren Aufgaben für den Umzug sind hier zentralisiert:
➡️ **[`RELOCATION.md`](./RELOCATION.md)**

---

## ✅ Current Status (March 7, 2026) - STABLE & RESILIENT

Das System läuft stabil und ist für den Produktivbetrieb vorbereitet. Wesentliche Fortschritte wurden erzielt:

### 1. SuperOffice Connector (v2.1.1 - "Echo Shield")
*   **Echo-Prävention:** Implementierung eines robusten "Echo Shield" im Worker. Der Worker identifiziert seine eigenen Aktionen (via `ChangedByAssociateId`) und vermeidet dadurch Endlosschleifen. Änderungen sind nur noch bei externen, relevanten Feldaktualisierungen (Name, Website, JobTitle) relevant.
*   **Webhook:** Erfolgreich registriert auf `https://floke-ai.duckdns.org/connector/webhook` mit sicherer Token-Validierung.

### 2. Company Explorer (v0.7.4)
*   **Datenbank:** Schema-Integrität wiederhergestellt. Fehlende Spalten (`street`, `zip_code`, `unsubscribe_token`, `strategy_briefing`) wurden mit Migrations-Skripten nachgerüstet. Keine 500er Fehler mehr.
*   **Frontend:** Build-Pipeline mit PostCSS/Tailwind-Styling repariert, sodass die UI wieder einwandfrei funktioniert.

### 3. Lead Engine (Trading Twins - Voll funktionsfähig)
*   **Integration:** Service erfolgreich in den Docker-Stack integriert und über Nginx unter `/lead/` und `/feedback/` erreichbar.
*   **Persistent State:** Led-Daten und Job-Status werden nun zuverlässig in einer SQLite-Datenbank (`/app/data/trading_twins.db`) gespeichert.
*   **Roundtrip-Funktionalität:** Der komplette Prozess (Lead -> CE -> KI -> Teams-Benachrichtigung -> E-Mail mit Kalender-Links -> Outlook-Termin) funktioniert End-to-End.
*   **Fehlerbehebung (Debugging-Iterationen):
    *   **`sqlalchemy` & Imports:** Installation von `sqlalchemy` sichergestellt, Pfade für Module (`trading_twins`) im Docker-Build korrigiert.
    *   **Nginx Routing:** Konfiguration optimiert, um `/feedback/` und `/lead/` korrekt an den FastAPI-Server weiterzuleiten. Globale `auth_basic` entfernt, um öffentlichen Endpunkten den Zugriff zu ermöglichen.
    *   **FastAPI `root_path`:** Bereinigt, um Konflikte mit Nginx-Pfaden zu vermeiden.
    *   **Server Stabilität:** `uvicorn` startet nun als separater Prozess, und der `monitor.py` importiert die Module sauber.
    *   **API-Schlüssel:** Alle notwendigen Keys (`INFO_*`, `CAL_*`, `SERP_API`, `WEBHOOK_*`, `GEMINI_API_KEY`) werden korrekt aus `.env` an die Container gemappt.

### 5. DuckDNS & DNS Monitor
*   **Erfolgreich reaktiviert:** Der DynDNS-Service läuft und aktualisiert die IP, die Netzwerk-Konnektivität ist stabil.

---

## Git Workflow & Conventions

### Den Arbeitstag abschließen mit `#fertig`

Um einen Arbeitsschritt oder einen Task abzuschließen, verwenden Sie den Befehl `#fertig`.

**WICHTIG:** Verwenden Sie **nicht** `/fertig` oder nur `fertig`. Nur der Befehl mit der Raute (`#`) wird korrekt erkannt.

Wenn Sie `#fertig` eingeben, führt der Agent folgende Schritte aus:
1.  **Analyse:** Der Agent prüft, ob seit dem letzten Commit Änderungen am Code vorgenommen wurden.
2.  **Zusammenfassung:** Er generiert eine automatische Arbeitszusammenfassung basierend auf den Code-Änderungen.
3.  **Status-Update:** Der Agent führt das Skript `python3 dev_session.py --report-status` im Hintergrund aus.
    - Die in der aktuellen Session investierte Zeit wird berechnet und in Notion gespeichert.
    - Ein neuer Statusbericht mit der Zusammenfassung wird an den Notion-Task angehängt.
    - Der Status des Tasks in Notion wird auf "Done" (oder einen anderen passenden Status) gesetzt.
4.  **Commit & Push:** Wenn Code-Änderungen vorhanden sind, wird ein Commit erstellt und ein `git push` interaktiv angefragt.

### ⚠️ Troubleshooting: Git `push`/`pull` Fehler in Docker-Containern

Gelegentlich kann es vorkommen, dass `git push` oder `git pull` Befehle aus dem `gemini-session` Docker-Container heraus mit Fehlern wie `Could not resolve host` oder `Failed to connect to <Gitea-Domain>` fehlschlagen, selbst wenn die externe Gitea-URL (z.B. `floke-gitea.duckdns.org`) im Host-System erreichbar ist. Dies liegt daran, dass der Docker-Container möglicherweise nicht dieselben DNS-Auflösungsmechanismen oder eine direkte Verbindung zur externen Adresse hat.

**Problem:** Standard-DNS-Auflösung und externe Hostnamen schlagen innerhalb des Docker-Containers fehl.

**Lösung:** Um eine robuste und direkte Verbindung zum Gitea-Container auf dem *selben Docker-Host* herzustellen, sollte die Git Remote URL auf die **lokale IP-Adresse des Docker-Hosts** und die **token-basierte Authentifizierung** umgestellt werden.

**Schritte zur Konfiguration:**

1.  **Lokale IP des Docker-Hosts ermitteln:**
    *   Finden Sie die lokale IP-Adresse des Servers (z.B. Ihrer Diskstation), auf dem die Docker-Container laufen. Beispiel: `192.168.178.6`.
2.  **Gitea-Token aus `.env` ermitteln:**
    *   Finden Sie das Gitea-Token (das im Format `<Token>` in der `.env`-Datei oder in der vorherigen `git remote -v` Ausgabe zu finden ist). Beispiel: `318c736205934dd066b6bbcb1d732931eaa7c8c4`.
3.  **Git Remote URL aktualisieren:**
    *   Verwenden Sie den folgenden Befehl, um die Remote-URL zu aktualisieren. Ersetzen Sie `<Username>`, `<Token>` und `<Local-IP-Adresse>` durch Ihre Werte.
    ```bash
    git remote set-url origin http://<Username>:<Token>@<Local-IP-Adresse>:3000/Floke/Brancheneinstufung2.git
    ```
    *   **Beispiel (mit Ihren Daten):**
    ```bash
    git remote set-url origin http://Floke:318c736205934dd066b6bbcb1d732931eaa7c8c4@192.168.178.6:3000/Floke/Brancheneinstufung2.git
    ```
    *(Hinweis: Für die interne Docker-Kommunikation ist `http` anstelle von `https` oft ausreichend und kann Probleme mit SSL-Zertifikaten vermeiden.)*
4.  **Verifizierung:**
    *   Führen Sie `git fetch` aus, um die neue Konfiguration zu testen. Es sollte nun ohne Passwortabfrage funktionieren:
    ```bash
    git fetch
    ```

Diese Konfiguration gewährleistet eine stabile Git-Verbindung innerhalb Ihrer Docker-Umgebung.

---

## Project Overview

This project is a Python-based system for automated company data enrichment and lead generation. It focuses on identifying B2B companies with high potential for robotics automation (Cleaning, Transport, Security, Service).

The system architecture has evolved from a CLI-based toolset to a modern web application (`company-explorer`) backed by Docker containers.

## Current Status (Jan 15, 2026) - Company Explorer (Robotics Edition v0.5.0)

### 1. Contacts Management (v0.5)
*   **Full CRUD:** Integrated Contact Management system with direct editing capabilities.
*   **Global List View:** Dedicated view for all contacts across all companies with search and filter.
*   **Data Model:** Supports advanced fields like Academic Title, Role Interpretation (Decision Maker vs. User), and Marketing Automation Status.
*   **Bulk Import:** CSV-based bulk import for contacts that automatically creates missing companies and prevents duplicates via email matching.

### 2. UI/UX Modernization
*   **Light/Dark Mode:** Full theme support with toggle.
*   **Grid Layout:** Unified card-based layout for both Company and Contact lists.
*   **Mobile Responsiveness:** Optimized Inspector overlay and navigation for mobile devices.
*   **Tabbed Inspector:** Clean separation between Company Overview and Contact Management within the details pane.

### 3. Advanced Configuration (Settings)
*   **Industry Verticals:** Database-backed configuration for target industries (Description, Focus Flag, Primary Product).
*   **Job Role Mapping:** Configurable patterns (Regex/Text) to map job titles on business cards to internal roles (e.g., "CTO" -> "Innovation Driver").
*   **Robotics Categories:** Existing AI reasoning logic remains configurable via the UI.

### 4. Robotics Potential Analysis (v2.3)
*   **Chain-of-Thought Logic:** The AI analysis (`ClassificationService`) uses multi-step reasoning to evaluate physical infrastructure.
*   **Provider vs. User:** Strict differentiation logic implemented.

### 5. Web Scraping & Legal Data (v2.2)
*   **Impressum Scraping:** 2-Hop Strategy and Root Fallback logic.
*   **Manual Overrides:** Users can manually correct Wikipedia, Website, and Impressum URLs directly in the UI.

## Lessons Learned & Best Practices

1.  **Numeric Extraction (German Locale):**
    *   **Problem:** "1.005 Mitarbeiter" was extracted as "1" (treating dot as decimal).
    *   **Solution:** Implemented context-aware logic. If a number has a dot followed by exactly 3 digits (and no comma), it is treated as a thousands separator.
    *   **Revenue:** For revenue (`is_revenue=True`), dots are generally treated as decimals (e.g. "375.6 Mio") unless unambiguous multiple dots exist. Billion/Mrd is converted to 1000 Million.

2.  **The Wolfra/Greilmeier/Erding Fixes (Advanced Metric Parsing):**
    *   **Problem:** Simple regex parsers fail on complex sentences with multiple numbers, concatenated years, or misleading prefixes.
    *   **Solution (Hybrid Extraction & Regression Testing):**
        1. **LLM Guidance:** The LLM provides an `expected_value` (e.g., "8.000 m²").
        2. **Robust Python Parser (`MetricParser`):** This parser aggressively cleans the `expected_value` (stripping units like "m²") to get a numerical target. It then intelligently searches the full text for this target, ignoring other numbers (like "2" in "An 2 Standorten").
        3. **Specific Bug Fixes:**
            - **Year-Suffix:** Logic to detect and remove trailing years from concatenated numbers (e.g., "802020" -> "80").
            - **Year-Prefix:** Logic to ignore year-like numbers (1900-2100) if other, more likely candidates exist in the text.
            - **Sentence Truncation:** Removed overly aggressive logic that cut off sentences after a hyphen, which caused metrics at the end of a phrase to be missed.
    *   **Safeguard:** These specific cases are now locked in via `test_metric_parser.py` to prevent future regressions.

3.  **LLM JSON Stability:**
    *   **Problem:** LLMs often wrap JSON in Markdown blocks (` ```json `), causing `json.loads()` to fail.
    *   **Solution:** ALWAYS use a `clean_json_response` helper that strips markers before parsing. Never trust raw LLM output.

4.  **LLM Structure Inconsistency:**
    *   **Problem:** Even with `json_mode=True`, models sometimes wrap the result in a list `[...]` instead of a flat object `{...}`, breaking frontend property access.
    *   **Solution:** Implement a check: `if isinstance(result, list): result = result[0]`.

5.  **Scraping Navigation:**
    *   **Problem:** Searching for "Impressum" only on the *scraped* URL (which might be a subpage found via Google) often fails.
    *   **Solution:** Always implement a fallback to the **Root Domain** AND a **2-Hop check** via the "Kontakt" page.

6.  **Frontend State Management:**
    *   **Problem:** Users didn't see when a background job finished.
    *   **Solution:** Implementing a polling mechanism (`setInterval`) tied to a `isProcessing` state is superior to static timeouts for long-running AI tasks.

7.  **Hyper-Personalized Marketing Engine (v3.2) - "Deep Persona Injection":**
    *   **Problem:** Marketing texts were too generic and didn't reflect the specific psychological or operative profile of the different target roles (e.g., CFO vs. Facility Manager).
    *   **Solution (Deep Sync & Prompt Hardening):**
        1.  **Extended Schema:** Added `description`, `convincing_arguments`, and `kpis` to the `Persona` database model to store richer profile data.
        2.  **Notion Master Sync:** Updated the synchronization logic to pull these deep insights directly from the Notion "Personas / Roles" database.
        3.  **Role-Centric Prompts:** The `MarketingMatrix` generator was re-engineered to inject the persona's "Mindset" and "KPIs" into the prompt.
    *   **Example (Healthcare):**
        - **Infrastructure Lead:** Focuses now on "IT Security", "DSGVO Compliance", and "WLAN integration".
        - **Economic Buyer (CFO):** Focuses on "ROI Amortization", "Reduction of Overtime", and "Flexible Financing (RaaS)".
    *   **Verification:** Verified that the transition from a company-specific **Opener** (e.g., observing staff shortages at Klinikum Erding) to the **Role-specific Intro** (e.g., pitching transport robots to reduce walking distances for nursing directors) is seamless and logical.

## Metric Parser - Regression Tests
To ensure the stability and accuracy of the metric extraction logic, a dedicated test suite (`/company-explorer/backend/tests/test_metric_parser.py`) has been created. It covers the following critical, real-world bug fixes:

1.  **`test_wolfra_concatenated_year_bug`**:
    *   **Problem:** A number and year were concatenated (e.g., "802020").
    *   **Test:** Ensures the parser correctly identifies and strips the trailing year, extracting `80`.

2.  **`test_erding_year_prefix_bug`**:
    *   **Problem:** A year appeared before the actual metric in the sentence (e.g., "2022 ... 200.000 Besucher").
    *   **Test:** Verifies that the parser's "Smart Year Skip" logic ignores the year and correctly extracts `200000`.

3.  **`test_greilmeier_multiple_numbers_bug`**:
    *   **Problem:** The text contained multiple numbers ("An 2 Standorten ... 8.000 m²"), and the parser incorrectly picked the first one.
    *   **Test:** Confirms that when an `expected_value` (like "8.000 m²") is provided, the parser correctly cleans it and extracts the corresponding number (`8000`), ignoring other irrelevant numbers.

These tests are crucial for preventing regressions as the parser logic evolves.

## Notion Maintenance & Data Sync

Since the "Golden Record" for Industry Verticals (Pains, Gains, Products) resides in Notion, specific tools are available to read and sync this data.

**Location:** `/app/company-explorer/backend/scripts/notion_maintenance/`

**Prerequisites:**
- Ensure `.env` is loaded with `NOTION_API_KEY` and correct DB IDs.

**Key Scripts:**

1.  **`check_relations.py` (Reader - Deep):**
    *   **Purpose:** Reads Verticals and resolves linked Product Categories (Relation IDs -> Names). Essential for verifying the "Primary/Secondary Product" logic.
    *   **Usage:** `python3 check_relations.py`

2.  **`update_notion_full.py` (Writer - Batch):**
    *   **Purpose:** Batch updates Pains and Gains for multiple verticals. Use this as a template when refining the messaging strategy.
    *   **Usage:** Edit the dictionary in the script, then run `python3 update_notion_full.py`.

3.  **`list_notion_structure.py` (Schema Discovery):**
    *   **Purpose:** Lists all property keys and page titles. Use this to debug schema changes (e.g. if a column was renamed).
        - **Usage:** `python3 list_notion_structure.py`

    ## Next Steps (Updated Feb 27, 2026)

    ***HINWEIS:*** *Dieser Abschnitt ist veraltet. Die aktuellen nächsten Schritte beziehen sich auf die Migrations-Vorbereitung und sind in der Datei [`RELOCATION.md`](./RELOCATION.md) dokumentiert.*

    *   **Notion Content:** Finalize "Pains" and "Gains" for all 25 verticals in the Notion master database.
    *   **Intelligence:** Run `generate_matrix.py` in the Company Explorer backend to populate the matrix for all new English vertical names.
    *   **Automation:** Register the production webhook (requires `admin-webhooks` rights) to enable real-time CRM sync without manual job injection.
    *   **Execution:** Connect the "Sending Engine" (the actual email dispatch logic) to the SuperOffice fields.
    *   **Monitoring:** Monitor the 'Atomic PATCH' logs in production for any 400 errors regarding field length or specific character sets.


    ## Company Explorer Access & Debugging

    The **Company Explorer** is the central intelligence engine.

    **Core Paths:**
    *   **Database:** `/app/companies_v3_fixed_2.db` (SQLite)
    *   **Backend Code:** `/app/company-explorer/backend/`
    *   **Logs:** `/app/logs_debug/company_explorer_debug.log`

    **Accessing Data:**
    To inspect live data without starting the full stack, use `sqlite3` directly or the helper scripts (if environment permits).

    *   **Direct SQL:** `sqlite3 /app/companies_v3_fixed_2.db "SELECT * FROM companies WHERE name LIKE '%Firma%';" `
    *   **Python (requires env):** The app runs in a Docker container. When debugging from outside (CLI agent), Python dependencies like `sqlalchemy` might be missing in the global scope. Prefer `sqlite3` for quick checks.

    **Key Endpoints (Internal API :8000):**
    *   `POST /api/provision/superoffice-contact`: Triggers the text generation logic.
    *   `GET /api/companies/{id}`: Full company profile including enrichment data.

    **Troubleshooting:**
    *   **"BaseModel" Error:** Usually a mix-up between Pydantic and SQLAlchemy `Base`. Check imports in `database.py`.
    *   **Missing Dependencies:** The CLI agent runs in `/app` but not necessarily inside the container's venv. Use standard tools (`grep`, `sqlite3`) where possible.

---

## Critical Debugging Session (Feb 21, 2026) - Re-Stabilizing the Analysis Engine

A critical session was required to fix a series of cascading failures in the `ClassificationService`. The key takeaways are documented here to prevent future issues.

1.  **The "Phantom" `NameError`:**
    *   **Symptom:** The application crashed with a `NameError: name 'joinedload' is not defined`, even though the import was correctly added to `classification.py`.
    *   **Root Cause:** The `uvicorn` server's hot-reload mechanism within the Docker container did not reliably pick up file changes made from outside the container. A simple `docker-compose restart` was insufficient to clear the process's cached state.
    *   **Solution:** After any significant code change, especially to imports or core logic, a forced-recreation of the container is **mandatory**.
        ```bash
        # Correct Way to Apply Changes:
        docker-compose up -d --build --force-recreate company-explorer
        ```

2.  **The "Invisible" Logs:**
    *   **Symptom:** No debug logs were being written, making it impossible to trace the execution flow.
    *   **Root Cause:** The `LOG_DIR` path in `/company-explorer/backend/config.py` was misconfigured (`/app/logs_debug`) and did not point to the actual, historical log directory (`/app/Log_from_docker`).
    *   **Solution:** Configuration paths must be treated as absolute and verified. Correcting the `LOG_DIR` path immediately resolved the issue.

3.  **Inefficient Debugging Loop:**
    *   **Symptom:** The cycle of triggering a background job via API, waiting, and then manually checking logs was slow and inefficient.
    *   **Root Cause:** Lack of a tool to test the core application logic in isolation.
    *   **Solution:** The creation of a dedicated, interactive test script (`/company-explorer/backend/scripts/debug_single_company.py`). This script allows running the entire analysis for a single company in the foreground, providing immediate and detailed feedback. This pattern is invaluable for complex, multi-step processes and should be a standard for future development.
## Production Migration & Multi-Campaign Support (Feb 27, 2026)

The system has been fully migrated to the SuperOffice production environment (`online3.superoffice.com`, tenant `Cust26720`).

### 1. Final UDF Mappings (Production)
These ProgIDs are verified and active for the production tenant:

| Field Purpose | Entity | ProgID | Notes |
| :--- | :--- | :--- | :--- |
| **MA Subject** | Person | `SuperOffice:19` | |
| **MA Intro** | Person | `SuperOffice:20` | |
| **MA Social Proof** | Person | `SuperOffice:21` | |
| **MA Unsubscribe** | Person | `SuperOffice:22` | URL format |
| **MA Campaign** | Person | `SuperOffice:23` | List field (uses `:DisplayText`) |
| **Vertical** | Contact | `SuperOffice:83` | List field (mapped via JSON) |
| **AI Summary** | Contact | `SuperOffice:84` | Truncated to 132 chars |
| **AI Last Update** | Contact | `SuperOffice:85` | Format: `[D:MM/DD/YYYY HH:MM:SS]` |
| **Opener Primary** | Contact | `SuperOffice:86` | |
| **Opener Secondary**| Contact | `SuperOffice:87` | |
| **Last Outreach** | Contact | `SuperOffice:88` | |

### 2. Vertical ID Mapping (Production)
The full list of 25 verticals with their internal SuperOffice IDs (List `udlist331`):
`Automotive - Dealer: 1613, Corporate - Campus: 1614, Energy - Grid & Utilities: 1615, Energy - Solar/Wind: 1616, Healthcare - Care Home: 1617, Healthcare - Hospital: 1618, Hospitality - Gastronomy: 1619, Hospitality - Hotel: 1620, Industry - Manufacturing: 1621, Infrastructure - Communities: 1622, Infrastructure - Public: 1623, Infrastructure - Transport: 1624, Infrastructure - Parking: 1625, Leisure - Entertainment: 1626, Leisure - Fitness: 1627, Leisure - Indoor Active: 1628, Leisure - Outdoor Park: 1629, Leisure - Wet & Spa: 1630, Logistics - Warehouse: 1631, Others: 1632, Reinigungsdienstleister: 1633, Retail - Food: 1634, Retail - Non-Food: 1635, Retail - Shopping Center: 1636, Tech - Data Center: 1637`.

### 3. Technical Lessons Learned (SO REST API)

1.  **Atomic PATCH (Stability):** Bundling all contact updates into a single `PATCH` request to the `/Contact/{id}` endpoint is far more stable than sequential UDF updates. If one field fails (e.g. invalid property), the whole transaction might roll back or partially fail—proactive validation is key.
2.  **Website Sync (`Urls` Array):** Updating the website via REST requires manipulating the `Urls` array property. Simple field assignment to `UrlAddress` fails during `PATCH`.
    *   *Correct Format:* `"Urls": [{"Value": "https://example.com", "Description": "AI Discovered"}]`.
3.  **List Resolution (`:DisplayText`):** To get the clean string value of a list field (like Campaign Name) without extra API calls, use the pseudo-field `ProgID:DisplayText` in the `$select` parameter.
4.  **Field Length Limits:** Standard SuperOffice text UDFs are limited to approx. 140-254 characters. AI-generated summaries must be truncated (e.g. 132 chars) to avoid 400 Bad Request errors.
5.  **Docker `env_file` Importance:** For production, mapping individual variables in `docker-compose.yml` is error-prone. Using `env_file: .env` ensures all services stay synchronized with the latest UDF IDs and mappings.
6.  **Production URL Schema:** The production API is strictly hosted on `online3.superoffice.com` (for this tenant), while OAuth remains at `online.superoffice.com`.

### 4. Campaign Trigger Logic
The `worker.py` (v1.8) now extracts the `campaign_tag` from `SuperOffice:23:DisplayText`. This tag is passed to the Company Explorer's provisioning API. If a matching entry exists in the `MarketingMatrix` for that tag, specific texts are used; otherwise, it falls back to the "standard" Kaltakquise texts.

### 5. SuperOffice Authentication (Critical Update Feb 28, 2026)

**Problem:** Authentication failures ("Invalid refresh token" or "Invalid client_id") occurred because standard `load_dotenv()` did not override stale environment variables present in the shell process.

**Solution:** Always use `load_dotenv(override=True)` in Python scripts to force loading the actual values from the `.env` file.

**Correct Authentication Pattern (Python):**
```python
from dotenv import load_dotenv
import os

# CRITICAL: override=True ensures we read from .env even if env vars are already set
load_dotenv(override=True)

client_id = os.getenv("SO_CLIENT_ID")
# ...
```

**Known Working Config (Production):**
*   **Environment:** `online3`
*   **Tenant:** `Cust26720`
*   **Token Logic:** The `AuthHandler` implementation in `health_check_so.py` is the reference standard. Avoid using legacy `superoffice_client.py` without verifying it uses `override=True`.

### 6. Sales & Opportunities (Roboplanet Specifics)

When creating sales via API, specific constraints apply due to the shared tenant with Wackler:

*   **SaleTypeId:** MUST be **14** (`GE:"Roboplanet Verkauf";`) to ensure the sale is assigned to the correct business unit.
    *   *Alternative:* ID 16 (`GE:"Roboplanet Teststellung";`) for trials.
*   **Mandatory Fields:**
    *   `Saledate` (Estimated Date): Must be provided in ISO format (e.g., `YYYY-MM-DDTHH:MM:SSZ`).
    *   `Person`: Highly recommended linking to a specific person, not just the company.
*   **Context:** Avoid creating sales on the parent company "Wackler Service Group" (ID 3). Always target the specific lead company.

### Analyse der SuperOffice `Sale`-Entität (März 2026)

- **Ziel:** Erstellung eines Reports, der abbildet, welche Kunden welche Produkte angeboten bekommen oder gekauft haben. Die initiale Vermutung war, dass Produktinformationen oft als Freitext-Einträge und nicht über den offiziellen Produktkatalog erfasst werden.
- **Problem:** Die Untersuchung der Datenstruktur zeigte, dass die API-Endpunkte zur Abfrage von `Quote`-Objekten (Angeboten) und `QuoteLines` (Angebotspositionen) über `Sale`-, `Contact`- oder `Project`-Beziehungen hinweg nicht zuverlässig funktionierten. Viele Abfragen resultierten in `500 Internal Server Errors` oder leeren Datenmengen, was eine direkte Verknüpfung von Verkauf zu Produkt unmöglich machte.
- **Kern-Erkenntnis (Datenstruktur):**
    1.  **Freitext statt strukturierter Daten:** Die Analyse eines konkreten `Sale`-Objekts (ID `342243`) bestätigte die ursprüngliche Hypothese. Produktinformationen (z.B. `2xOmnie CD-01 mit Nachlass`) werden direkt in das `Heading`-Feld (Betreff) des `Sale`-Objekts als Freitext eingetragen. Es existieren oft keine verknüpften `Quote`- oder `QuoteLine`-Entitäten.
    2.  **Datenqualität bei Verknüpfungen:** Eine signifikante Anzahl von `Sale`-Objekten im System weist keine Verknüpfung zu einem `Contact`-Objekt auf (`Contact: null`). Dies erschwert die automatische Zuordnung von Verkäufen zu Kunden erheblich.
- **Nächster Schritt / Lösungsweg:** Ein Skript (`/app/connector-superoffice/generate_customer_product_report.py`) wurde entwickelt, das diese Probleme adressiert. Es fragt gezielt nur `Sale`-Objekte ab, die eine gültige `Contact`-Verknüpfung besitzen (`$filter=Contact ne null`). Anschließend extrahiert es den Kundennamen und das `Heading`-Feld des Verkaufs und durchsucht letzteres nach vordefinierten Produkt-Schlüsselwörtern. Die Ergebnisse werden für die manuelle Analyse in einer CSV-Datei (`product_report.csv`) gespeichert. Dieser Ansatz ist der einzig verlässliche Weg, um die gewünschten Informationen aus dem System zu extrahieren.

### 7. Service & Tickets (Anfragen)

SuperOffice Tickets represent the support and request system. Like Sales, they are organized to allow separation between Roboplanet and Wackler.

*   **Entity Name:** `ticket`
*   **Roboplanet Specific Categories (CategoryId):**
    *   **ID 46:** `GE:"Lead Roboplanet";`
    *   **ID 47:** `GE:"Vertriebspartner Roboplanet";`
    *   **ID 48:** `GE:"Weitergabe Roboplanet";`
    *   **Hierarchical:** `Roboplanet/Support` (often used for technical issues).
*   **Key Fields:**
    *   `ticketId`: Internal ID.
    *   `title`: The subject of the request.
    *   `contactId` / `personId`: Links to company and contact person.
    *   `ticketStatusId`: 1 (Unbearbeitet), 2 (In Arbeit), 3 (Bearbeitet).
    *   `ownedBy`: Often "ROBO" for Roboplanet staff.
*   **Cross-Links:** Tickets can be linked to `saleId` (to track support during a sale) or `projectId`.

---
This is the core logic used to generate the company-specific opener.