[30388f42] Infrastructure Hardening: Repaired CE/Connector DB schema, fixed frontend styling build, implemented robust echo shield in worker v2.1.1, and integrated Lead Engine into gateway.

2026-03-07 14:08:42 +00:00
parent efcaa57cf0
commit ae2303b733
404 changed files with 24100 additions and 13301 deletions
--- a/GEMINI.md
+++ b/GEMINI.md
@@ -20,6 +20,41 @@ Dies ist in der Vergangenheit mehrfach passiert und hat zu massivem Datenverlust
 - **Git-Repository:** Dieses Projekt wird über ein Git-Repository verwaltet. Alle Änderungen am Code werden versioniert. Beachten Sie den Abschnitt "Git Workflow & Conventions" für unsere Arbeitsregeln.
    - **WICHTIG:** Der AI-Agent kann Änderungen committen, aber aus Sicherheitsgründen oft nicht `git push` ausführen. Bitte führen Sie `git push` manuell aus, wenn der Agent dies meldet.

+---
+## ‼️ Aktueller Projekt-Fokus (März 2026): Migration & Stabilisierung
+
+**Das System wurde am 07. März 2026 vollständig stabilisiert und für den Umzug auf die Ubuntu VM (`docker1`) vorbereitet.**
+
+Alle aktuellen Aufgaben für den Umzug sind hier zentralisiert:
+➡️ **[`RELOCATION.md`](./RELOCATION.md)**
+
+---
+
+## ✅ Current Status (March 7, 2026) - STABLE
+
+Das System läuft stabil auf der Synology-Entwicklungsumgebung.
+
+### 1. SuperOffice Connector (v2.1.1 - "Echo Shield")
+*   **Echo-Prävention (Härtung):** Der Worker (`worker.py`) identifiziert sich beim Start dynamisch (`/Associate/Me`) und ignoriert strikt alle Events, die vom eigenen User (z.B. ID 528) ausgelöst wurden.
+*   **Feld-Filter:** Änderungen werden nur verarbeitet, wenn relevante Felder (Name, URL, JobTitle) betroffen sind. Irrelevante Updates (z.B. `lastUpdated`) werden geskippt.
+*   **Webhook:** Registriert auf `https://floke-ai.duckdns.org/connector/webhook` mit Token-Validierung im Query-String.
+
+### 2. Company Explorer (v0.7.4)
+*   **Datenbank:** Schema repariert (`fix_missing_columns.py` ausgeführt). Fehlende Spalten (`street`, `zip_code`, `unsubscribe_token`) sind nun vorhanden.
+*   **Frontend:** Build-Pipeline repariert. PostCSS/Tailwind generieren jetzt wieder korrektes Styling.
+*   **Persistence:** Datenbank liegt sicher im Docker Volume `explorer_db_data`.
+
+### 3. Lead Engine (Trading Twins)
+*   **Integration:** In `docker-compose.yml` integriert und unter `/lead/` via Gateway erreichbar.
+*   **Persistence:** Nutzt Volume `lead_engine_data`.
+*   **Status:** UI läuft. E-Mail-Ingest via MS Graph benötigt noch Credentials.
+
+### 4. Infrastructure
+*   **Secrets:** Alle API-Keys (OpenAI, Gemini, SO, DuckDNS) sind zentral in der `.env` Datei.
+*   **DuckDNS:** Service läuft und aktualisiert die IP erfolgreich.
+
+---
+
 ## Git Workflow & Conventions

 ### Den Arbeitstag abschließen mit `#fertig`
@@ -37,6 +72,38 @@ Wenn Sie `#fertig` eingeben, führt der Agent folgende Schritte aus:
    - Der Status des Tasks in Notion wird auf "Done" (oder einen anderen passenden Status) gesetzt.
 4.  **Commit & Push:** Wenn Code-Änderungen vorhanden sind, wird ein Commit erstellt und ein `git push` interaktiv angefragt.

+### ⚠️ Troubleshooting: Git `push`/`pull` Fehler in Docker-Containern
+
+Gelegentlich kann es vorkommen, dass `git push` oder `git pull` Befehle aus dem `gemini-session` Docker-Container heraus mit Fehlern wie `Could not resolve host` oder `Failed to connect to <Gitea-Domain>` fehlschlagen, selbst wenn die externe Gitea-URL (z.B. `floke-gitea.duckdns.org`) im Host-System erreichbar ist. Dies liegt daran, dass der Docker-Container möglicherweise nicht dieselben DNS-Auflösungsmechanismen oder eine direkte Verbindung zur externen Adresse hat.
+
+**Problem:** Standard-DNS-Auflösung und externe Hostnamen schlagen innerhalb des Docker-Containers fehl.
+
+**Lösung:** Um eine robuste und direkte Verbindung zum Gitea-Container auf dem *selben Docker-Host* herzustellen, sollte die Git Remote URL auf die **lokale IP-Adresse des Docker-Hosts** und die **token-basierte Authentifizierung** umgestellt werden.
+
+**Schritte zur Konfiguration:**
+
+1.  **Lokale IP des Docker-Hosts ermitteln:**
+    *   Finden Sie die lokale IP-Adresse des Servers (z.B. Ihrer Diskstation), auf dem die Docker-Container laufen. Beispiel: `192.168.178.6`.
+2.  **Gitea-Token aus `.env` ermitteln:**
+    *   Finden Sie das Gitea-Token (das im Format `<Token>` in der `.env`-Datei oder in der vorherigen `git remote -v` Ausgabe zu finden ist). Beispiel: `318c736205934dd066b6bbcb1d732931eaa7c8c4`.
+3.  **Git Remote URL aktualisieren:**
+    *   Verwenden Sie den folgenden Befehl, um die Remote-URL zu aktualisieren. Ersetzen Sie `<Username>`, `<Token>` und `<Local-IP-Adresse>` durch Ihre Werte.
+    ```bash
+    git remote set-url origin http://<Username>:<Token>@<Local-IP-Adresse>:3000/Floke/Brancheneinstufung2.git
+    ```
+    *   **Beispiel (mit Ihren Daten):**
+    ```bash
+    git remote set-url origin http://Floke:318c736205934dd066b6bbcb1d732931eaa7c8c4@192.168.178.6:3000/Floke/Brancheneinstufung2.git
+    ```
+    *(Hinweis: Für die interne Docker-Kommunikation ist `http` anstelle von `https` oft ausreichend und kann Probleme mit SSL-Zertifikaten vermeiden.)*
+4.  **Verifizierung:**
+    *   Führen Sie `git fetch` aus, um die neue Konfiguration zu testen. Es sollte nun ohne Passwortabfrage funktionieren:
+    ```bash
+    git fetch
+    ```
+
+Diese Konfiguration gewährleistet eine stabile Git-Verbindung innerhalb Ihrer Docker-Umgebung.
+

 ## Project Overview

@@ -105,6 +172,17 @@ The system architecture has evolved from a CLI-based toolset to a modern web app
    *   **Problem:** Users didn't see when a background job finished.
    *   **Solution:** Implementing a polling mechanism (`setInterval`) tied to a `isProcessing` state is superior to static timeouts for long-running AI tasks.

+7.  **Hyper-Personalized Marketing Engine (v3.2) - "Deep Persona Injection":**
+    *   **Problem:** Marketing texts were too generic and didn't reflect the specific psychological or operative profile of the different target roles (e.g., CFO vs. Facility Manager).
+    *   **Solution (Deep Sync & Prompt Hardening):**
+        1.  **Extended Schema:** Added `description`, `convincing_arguments`, and `kpis` to the `Persona` database model to store richer profile data.
+        2.  **Notion Master Sync:** Updated the synchronization logic to pull these deep insights directly from the Notion "Personas / Roles" database.
+        3.  **Role-Centric Prompts:** The `MarketingMatrix` generator was re-engineered to inject the persona's "Mindset" and "KPIs" into the prompt.
+    *   **Example (Healthcare):**
+        - **Infrastructure Lead:** Focuses now on "IT Security", "DSGVO Compliance", and "WLAN integration".
+        - **Economic Buyer (CFO):** Focuses on "ROI Amortization", "Reduction of Overtime", and "Flexible Financing (RaaS)".
+    *   **Verification:** Verified that the transition from a company-specific **Opener** (e.g., observing staff shortages at Klinikum Erding) to the **Role-specific Intro** (e.g., pitching transport robots to reduce walking distances for nursing directors) is seamless and logical.
+
 ## Metric Parser - Regression Tests
 To ensure the stability and accuracy of the metric extraction logic, a dedicated test suite (`/company-explorer/backend/tests/test_metric_parser.py`) has been created. It covers the following critical, real-world bug fixes:

@@ -122,8 +200,185 @@ To ensure the stability and accuracy of the metric extraction logic, a dedicated

 These tests are crucial for preventing regressions as the parser logic evolves.

-## Next Steps
-*   **Marketing Automation:** Implement the actual sending logic (or export) based on the contact status.
-*   **Job Role Mapping Engine:** Connect the configured patterns to the contact import/creation process to auto-assign roles.
-*   **Industry Classification Engine:** Connect the configured industries to the AI Analysis prompt to enforce the "Strict Mode" mapping.
-*   **Export:** Generate Excel/CSV enriched reports (already partially implemented via JSON export).
+## Notion Maintenance & Data Sync
+
+Since the "Golden Record" for Industry Verticals (Pains, Gains, Products) resides in Notion, specific tools are available to read and sync this data.
+
+**Location:** `/app/company-explorer/backend/scripts/notion_maintenance/`
+
+**Prerequisites:**
+- Ensure `.env` is loaded with `NOTION_API_KEY` and correct DB IDs.
+
+**Key Scripts:**
+
+1.  **`check_relations.py` (Reader - Deep):**
+    -   **Purpose:** Reads Verticals and resolves linked Product Categories (Relation IDs -> Names). Essential for verifying the "Primary/Secondary Product" logic.
+    -   **Usage:** `python3 check_relations.py`
+
+2.  **`update_notion_full.py` (Writer - Batch):**
+    -   **Purpose:** Batch updates Pains and Gains for multiple verticals. Use this as a template when refining the messaging strategy.
+    -   **Usage:** Edit the dictionary in the script, then run `python3 update_notion_full.py`.
+
+3.  **`list_notion_structure.py` (Schema Discovery):**
+    -   **Purpose:** Lists all property keys and page titles. Use this to debug schema changes (e.g. if a column was renamed).
+        - **Usage:** `python3 list_notion_structure.py`
+    
+    ## Next Steps (Updated Feb 27, 2026)
+    
+    ***HINWEIS:*** *Dieser Abschnitt ist veraltet. Die aktuellen nächsten Schritte beziehen sich auf die Migrations-Vorbereitung und sind in der Datei [`RELOCATION.md`](./RELOCATION.md) dokumentiert.*
+
+    *   **Notion Content:** Finalize "Pains" and "Gains" for all 25 verticals in the Notion master database.
+    *   **Intelligence:** Run `generate_matrix.py` in the Company Explorer backend to populate the matrix for all new English vertical names.
+    *   **Automation:** Register the production webhook (requires `admin-webhooks` rights) to enable real-time CRM sync without manual job injection.
+    *   **Execution:** Connect the "Sending Engine" (the actual email dispatch logic) to the SuperOffice fields.
+    *   **Monitoring:** Monitor the 'Atomic PATCH' logs in production for any 400 errors regarding field length or specific character sets.
+
+    
+    ## Company Explorer Access & Debugging
+    
+    The **Company Explorer** is the central intelligence engine.
+    
+    **Core Paths:**
+    *   **Database:** `/app/companies_v3_fixed_2.db` (SQLite)
+    *   **Backend Code:** `/app/company-explorer/backend/`
+    *   **Logs:** `/app/logs_debug/company_explorer_debug.log`
+    
+    **Accessing Data:**
+    To inspect live data without starting the full stack, use `sqlite3` directly or the helper scripts (if environment permits).
+    
+    *   **Direct SQL:** `sqlite3 /app/companies_v3_fixed_2.db "SELECT * FROM companies WHERE name LIKE '%Firma%';" `
+    *   **Python (requires env):** The app runs in a Docker container. When debugging from outside (CLI agent), Python dependencies like `sqlalchemy` might be missing in the global scope. Prefer `sqlite3` for quick checks.
+    
+    **Key Endpoints (Internal API :8000):**
+    *   `POST /api/provision/superoffice-contact`: Triggers the text generation logic.
+    *   `GET /api/companies/{id}`: Full company profile including enrichment data.
+    
+    **Troubleshooting:**
+    *   **"BaseModel" Error:** Usually a mix-up between Pydantic and SQLAlchemy `Base`. Check imports in `database.py`.
+    *   **Missing Dependencies:** The CLI agent runs in `/app` but not necessarily inside the container's venv. Use standard tools (`grep`, `sqlite3`) where possible.
+
+---
+
+## Critical Debugging Session (Feb 21, 2026) - Re-Stabilizing the Analysis Engine
+
+A critical session was required to fix a series of cascading failures in the `ClassificationService`. The key takeaways are documented here to prevent future issues.
+
+1.  **The "Phantom" `NameError`:**
+    *   **Symptom:** The application crashed with a `NameError: name 'joinedload' is not defined`, even though the import was correctly added to `classification.py`.
+    *   **Root Cause:** The `uvicorn` server's hot-reload mechanism within the Docker container did not reliably pick up file changes made from outside the container. A simple `docker-compose restart` was insufficient to clear the process's cached state.
+    *   **Solution:** After any significant code change, especially to imports or core logic, a forced-recreation of the container is **mandatory**.
+        ```bash
+        # Correct Way to Apply Changes:
+        docker-compose up -d --build --force-recreate company-explorer
+        ```
+
+2.  **The "Invisible" Logs:**
+    *   **Symptom:** No debug logs were being written, making it impossible to trace the execution flow.
+    *   **Root Cause:** The `LOG_DIR` path in `/company-explorer/backend/config.py` was misconfigured (`/app/logs_debug`) and did not point to the actual, historical log directory (`/app/Log_from_docker`).
+    *   **Solution:** Configuration paths must be treated as absolute and verified. Correcting the `LOG_DIR` path immediately resolved the issue.
+
+3.  **Inefficient Debugging Loop:**
+    *   **Symptom:** The cycle of triggering a background job via API, waiting, and then manually checking logs was slow and inefficient.
+    *   **Root Cause:** Lack of a tool to test the core application logic in isolation.
+    *   **Solution:** The creation of a dedicated, interactive test script (`/company-explorer/backend/scripts/debug_single_company.py`). This script allows running the entire analysis for a single company in the foreground, providing immediate and detailed feedback. This pattern is invaluable for complex, multi-step processes and should be a standard for future development.
+## Production Migration & Multi-Campaign Support (Feb 27, 2026)
+
+The system has been fully migrated to the SuperOffice production environment (`online3.superoffice.com`, tenant `Cust26720`). 
+
+### 1. Final UDF Mappings (Production)
+These ProgIDs are verified and active for the production tenant:
+
+| Field Purpose | Entity | ProgID | Notes |
+| :--- | :--- | :--- | :--- |
+| **MA Subject** | Person | `SuperOffice:19` | |
+| **MA Intro** | Person | `SuperOffice:20` | |
+| **MA Social Proof** | Person | `SuperOffice:21` | |
+| **MA Unsubscribe** | Person | `SuperOffice:22` | URL format |
+| **MA Campaign** | Person | `SuperOffice:23` | List field (uses `:DisplayText`) |
+| **Vertical** | Contact | `SuperOffice:83` | List field (mapped via JSON) |
+| **AI Summary** | Contact | `SuperOffice:84` | Truncated to 132 chars |
+| **AI Last Update** | Contact | `SuperOffice:85` | Format: `[D:MM/DD/YYYY HH:MM:SS]` |
+| **Opener Primary** | Contact | `SuperOffice:86` | |
+| **Opener Secondary**| Contact | `SuperOffice:87` | |
+| **Last Outreach** | Contact | `SuperOffice:88` | |
+
+### 2. Vertical ID Mapping (Production)
+The full list of 25 verticals with their internal SuperOffice IDs (List `udlist331`):
+`Automotive - Dealer: 1613, Corporate - Campus: 1614, Energy - Grid & Utilities: 1615, Energy - Solar/Wind: 1616, Healthcare - Care Home: 1617, Healthcare - Hospital: 1618, Hospitality - Gastronomy: 1619, Hospitality - Hotel: 1620, Industry - Manufacturing: 1621, Infrastructure - Communities: 1622, Infrastructure - Public: 1623, Infrastructure - Transport: 1624, Infrastructure - Parking: 1625, Leisure - Entertainment: 1626, Leisure - Fitness: 1627, Leisure - Indoor Active: 1628, Leisure - Outdoor Park: 1629, Leisure - Wet & Spa: 1630, Logistics - Warehouse: 1631, Others: 1632, Reinigungsdienstleister: 1633, Retail - Food: 1634, Retail - Non-Food: 1635, Retail - Shopping Center: 1636, Tech - Data Center: 1637`.
+
+### 3. Technical Lessons Learned (SO REST API)
+
+1.  **Atomic PATCH (Stability):** Bundling all contact updates into a single `PATCH` request to the `/Contact/{id}` endpoint is far more stable than sequential UDF updates. If one field fails (e.g. invalid property), the whole transaction might roll back or partially fail—proactive validation is key.
+2.  **Website Sync (`Urls` Array):** Updating the website via REST requires manipulating the `Urls` array property. Simple field assignment to `UrlAddress` fails during `PATCH`.
+    *   *Correct Format:* `"Urls": [{"Value": "https://example.com", "Description": "AI Discovered"}]`.
+3.  **List Resolution (`:DisplayText`):** To get the clean string value of a list field (like Campaign Name) without extra API calls, use the pseudo-field `ProgID:DisplayText` in the `$select` parameter.
+4.  **Field Length Limits:** Standard SuperOffice text UDFs are limited to approx. 140-254 characters. AI-generated summaries must be truncated (e.g. 132 chars) to avoid 400 Bad Request errors.
+5.  **Docker `env_file` Importance:** For production, mapping individual variables in `docker-compose.yml` is error-prone. Using `env_file: .env` ensures all services stay synchronized with the latest UDF IDs and mappings.
+6.  **Production URL Schema:** The production API is strictly hosted on `online3.superoffice.com` (for this tenant), while OAuth remains at `online.superoffice.com`.
+
+### 4. Campaign Trigger Logic
+The `worker.py` (v1.8) now extracts the `campaign_tag` from `SuperOffice:23:DisplayText`. This tag is passed to the Company Explorer's provisioning API. If a matching entry exists in the `MarketingMatrix` for that tag, specific texts are used; otherwise, it falls back to the "standard" Kaltakquise texts.
+
+### 5. SuperOffice Authentication (Critical Update Feb 28, 2026)
+
+**Problem:** Authentication failures ("Invalid refresh token" or "Invalid client_id") occurred because standard `load_dotenv()` did not override stale environment variables present in the shell process.
+
+**Solution:** Always use `load_dotenv(override=True)` in Python scripts to force loading the actual values from the `.env` file.
+
+**Correct Authentication Pattern (Python):**
+```python
+from dotenv import load_dotenv
+import os
+
+# CRITICAL: override=True ensures we read from .env even if env vars are already set
+load_dotenv(override=True)
+
+client_id = os.getenv("SO_CLIENT_ID")
+# ...
+```
+
+**Known Working Config (Production):**
+*   **Environment:** `online3`
+*   **Tenant:** `Cust26720`
+*   **Token Logic:** The `AuthHandler` implementation in `health_check_so.py` is the reference standard. Avoid using legacy `superoffice_client.py` without verifying it uses `override=True`.
+
+### 6. Sales & Opportunities (Roboplanet Specifics)
+
+When creating sales via API, specific constraints apply due to the shared tenant with Wackler:
+
+*   **SaleTypeId:** MUST be **14** (`GE:"Roboplanet Verkauf";`) to ensure the sale is assigned to the correct business unit.
+    *   *Alternative:* ID 16 (`GE:"Roboplanet Teststellung";`) for trials.
+*   **Mandatory Fields:**
+    *   `Saledate` (Estimated Date): Must be provided in ISO format (e.g., `YYYY-MM-DDTHH:MM:SSZ`).
+    *   `Person`: Highly recommended linking to a specific person, not just the company.
+*   **Context:** Avoid creating sales on the parent company "Wackler Service Group" (ID 3). Always target the specific lead company.
+
+### Analyse der SuperOffice `Sale`-Entität (März 2026)
+
+- **Ziel:** Erstellung eines Reports, der abbildet, welche Kunden welche Produkte angeboten bekommen oder gekauft haben. Die initiale Vermutung war, dass Produktinformationen oft als Freitext-Einträge und nicht über den offiziellen Produktkatalog erfasst werden.
+- **Problem:** Die Untersuchung der Datenstruktur zeigte, dass die API-Endpunkte zur Abfrage von `Quote`-Objekten (Angeboten) und `QuoteLines` (Angebotspositionen) über `Sale`-, `Contact`- oder `Project`-Beziehungen hinweg nicht zuverlässig funktionierten. Viele Abfragen resultierten in `500 Internal Server Errors` oder leeren Datenmengen, was eine direkte Verknüpfung von Verkauf zu Produkt unmöglich machte.
+- **Kern-Erkenntnis (Datenstruktur):**
+    1.  **Freitext statt strukturierter Daten:** Die Analyse eines konkreten `Sale`-Objekts (ID `342243`) bestätigte die ursprüngliche Hypothese. Produktinformationen (z.B. `2xOmnie CD-01 mit Nachlass`) werden direkt in das `Heading`-Feld (Betreff) des `Sale`-Objekts als Freitext eingetragen. Es existieren oft keine verknüpften `Quote`- oder `QuoteLine`-Entitäten.
+    2.  **Datenqualität bei Verknüpfungen:** Eine signifikante Anzahl von `Sale`-Objekten im System weist keine Verknüpfung zu einem `Contact`-Objekt auf (`Contact: null`). Dies erschwert die automatische Zuordnung von Verkäufen zu Kunden erheblich.
+- **Nächster Schritt / Lösungsweg:** Ein Skript (`/app/connector-superoffice/generate_customer_product_report.py`) wurde entwickelt, das diese Probleme adressiert. Es fragt gezielt nur `Sale`-Objekte ab, die eine gültige `Contact`-Verknüpfung besitzen (`$filter=Contact ne null`). Anschließend extrahiert es den Kundennamen und das `Heading`-Feld des Verkaufs und durchsucht letzteres nach vordefinierten Produkt-Schlüsselwörtern. Die Ergebnisse werden für die manuelle Analyse in einer CSV-Datei (`product_report.csv`) gespeichert. Dieser Ansatz ist der einzig verlässliche Weg, um die gewünschten Informationen aus dem System zu extrahieren.
+
+### 7. Service & Tickets (Anfragen)
+
+SuperOffice Tickets represent the support and request system. Like Sales, they are organized to allow separation between Roboplanet and Wackler.
+
+*   **Entity Name:** `ticket`
+*   **Roboplanet Specific Categories (CategoryId):**
+    *   **ID 46:** `GE:"Lead Roboplanet";`
+    *   **ID 47:** `GE:"Vertriebspartner Roboplanet";`
+    *   **ID 48:** `GE:"Weitergabe Roboplanet";`
+    *   **Hierarchical:** `Roboplanet/Support` (often used for technical issues).
+*   **Key Fields:**
+    *   `ticketId`: Internal ID.
+    *   `title`: The subject of the request.
+    *   `contactId` / `personId`: Links to company and contact person.
+    *   `ticketStatusId`: 1 (Unbearbeitet), 2 (In Arbeit), 3 (Bearbeitet).
+    *   `ownedBy`: Often "ROBO" for Roboplanet staff.
+*   **Cross-Links:** Tickets can be linked to `saleId` (to track support during a sale) or `projectId`.
+
+---
+This is the core logic used to generate the company-specific opener.