- Fixed Nginx proxy for /feedback/ and /lead/ routes. - Restored manager.py to use persistent SQLite DB and corrected test lead triggers. - Refined Dockerfile for lead-engine to ensure clean dependency installs. - Applied latest API configs (.env) to lead-engine and duckdns services. - Updated documentation (GEMINI.md, readme.md, RELOCATION.md, lead-engine/README.md) to reflect final state and lessons learned. - Committed all pending changes to main branch.
391 lines
29 KiB
Markdown
391 lines
29 KiB
Markdown
# Gemini Code Assistant Context
|
|
|
|
## CRITICAL RULE: DOCUMENTATION PRESERVATION (DO NOT IGNORE)
|
|
|
|
**ES IST STRENGSTENS UNTERSAGT, DOKUMENTATION ZU LÖSCHEN ODER DURCH PLATZHALTER WIE `... (rest of the file)` ZU ERSETZEN.**
|
|
|
|
Dies ist in der Vergangenheit mehrfach passiert und hat zu massivem Datenverlust in kritischen Dateien wie `MIGRATION_PLAN.md` geführt.
|
|
|
|
**Regeln für den Agenten:**
|
|
1. **Niemals** große Textblöcke löschen, es sei denn, der User fordert dies *explizit* an.
|
|
2. **Immer** `git diff` prüfen, bevor ein Commit erstellt wird. Wenn eine Dokumentationsdatei 100 Zeilen verliert, ist das fast immer ein Fehler.
|
|
3. Beim Aktualisieren von Dokumentation: **Nur** neue Informationen hinzufügen oder veraltete präzise korrigieren. **Niemals** den Rest der Datei überschreiben.
|
|
4. Wenn du eine Datei "restoren" musst, nutze `git log -p <filename>` und stelle sicher, dass du wirklich *alles* wiederherstellst.
|
|
|
|
---
|
|
|
|
## Wichtige Hinweise
|
|
|
|
- **Projektdokumentation:** Die primäre und umfassendste Dokumentation für dieses Projekt befindet sich in der Datei `readme.md`. Bitte ziehen Sie diese Datei für ein detailliertes Verständnis der Architektur und der einzelnen Module zu Rate.
|
|
- **Git-Repository:** Dieses Projekt wird über ein Git-Repository verwaltet. Alle Änderungen am Code werden versioniert. Beachten Sie den Abschnitt "Git Workflow & Conventions" für unsere Arbeitsregeln.
|
|
- **WICHTIG:** Der AI-Agent kann Änderungen committen, aber aus Sicherheitsgründen oft nicht `git push` ausführen. Bitte führen Sie `git push` manuell aus, wenn der Agent dies meldet.
|
|
|
|
---
|
|
## ‼️ Aktueller Projekt-Fokus (März 2026): Migration & Stabilisierung
|
|
|
|
**Das System wurde am 07. März 2026 erfolgreich stabilisiert und für den Umzug auf die Ubuntu VM (`docker1`) vorbereitet.**
|
|
|
|
Alle kritischen Komponenten (Company Explorer, Connector, Lead Engine) sind nun funktionsfähig und resilient konfiguriert.
|
|
|
|
Alle weiteren Aufgaben für den Umzug sind hier zentralisiert:
|
|
➡️ **[`RELOCATION.md`](./RELOCATION.md)**
|
|
|
|
---
|
|
|
|
## ✅ Current Status (March 7, 2026) - STABLE & RESILIENT
|
|
|
|
Das System läuft stabil und ist für den Produktivbetrieb vorbereitet. Wesentliche Fortschritte wurden erzielt:
|
|
|
|
### 1. SuperOffice Connector (v2.1.1 - "Echo Shield")
|
|
* **Echo-Prävention:** Implementierung eines robusten "Echo Shield" im Worker. Der Worker identifiziert seine eigenen Aktionen (via `ChangedByAssociateId`) und vermeidet dadurch Endlosschleifen. Änderungen sind nur noch bei externen, relevanten Feldaktualisierungen (Name, Website, JobTitle) relevant.
|
|
* **Webhook:** Erfolgreich registriert auf `https://floke-ai.duckdns.org/connector/webhook` mit sicherer Token-Validierung.
|
|
|
|
### 2. Company Explorer (v0.7.4)
|
|
* **Datenbank:** Schema-Integrität wiederhergestellt. Fehlende Spalten (`street`, `zip_code`, `unsubscribe_token`, `strategy_briefing`) wurden mit Migrations-Skripten nachgerüstet. Keine 500er Fehler mehr.
|
|
* **Frontend:** Build-Pipeline mit PostCSS/Tailwind-Styling repariert, sodass die UI wieder einwandfrei funktioniert.
|
|
|
|
### 3. Lead Engine (Trading Twins - Voll funktionsfähig)
|
|
* **Integration:** Service erfolgreich in den Docker-Stack integriert und über Nginx unter `/lead/` und `/feedback/` erreichbar.
|
|
* **Persistent State:** Led-Daten und Job-Status werden nun zuverlässig in einer SQLite-Datenbank (`/app/data/trading_twins.db`) gespeichert.
|
|
* **Roundtrip-Funktionalität:** Der komplette Prozess (Lead -> CE -> KI -> Teams-Benachrichtigung -> E-Mail mit Kalender-Links -> Outlook-Termin) funktioniert End-to-End.
|
|
* **Fehlerbehebung (Debugging-Iterationen):
|
|
* **`sqlalchemy` & Imports:** Installation von `sqlalchemy` sichergestellt, Pfade für Module (`trading_twins`) im Docker-Build korrigiert.
|
|
* **Nginx Routing:** Konfiguration optimiert, um `/feedback/` und `/lead/` korrekt an den FastAPI-Server weiterzuleiten. Globale `auth_basic` entfernt, um öffentlichen Endpunkten den Zugriff zu ermöglichen.
|
|
* **FastAPI `root_path`:** Bereinigt, um Konflikte mit Nginx-Pfaden zu vermeiden.
|
|
* **Server Stabilität:** `uvicorn` startet nun als separater Prozess, und der `monitor.py` importiert die Module sauber.
|
|
* **API-Schlüssel:** Alle notwendigen Keys (`INFO_*`, `CAL_*`, `SERP_API`, `WEBHOOK_*`, `GEMINI_API_KEY`) werden korrekt aus `.env` an die Container gemappt.
|
|
|
|
### 5. DuckDNS & DNS Monitor
|
|
* **Erfolgreich reaktiviert:** Der DynDNS-Service läuft und aktualisiert die IP, die Netzwerk-Konnektivität ist stabil.
|
|
|
|
---
|
|
|
|
## Git Workflow & Conventions
|
|
|
|
### Den Arbeitstag abschließen mit `#fertig`
|
|
|
|
Um einen Arbeitsschritt oder einen Task abzuschließen, verwenden Sie den Befehl `#fertig`.
|
|
|
|
**WICHTIG:** Verwenden Sie **nicht** `/fertig` oder nur `fertig`. Nur der Befehl mit der Raute (`#`) wird korrekt erkannt.
|
|
|
|
Wenn Sie `#fertig` eingeben, führt der Agent folgende Schritte aus:
|
|
1. **Analyse:** Der Agent prüft, ob seit dem letzten Commit Änderungen am Code vorgenommen wurden.
|
|
2. **Zusammenfassung:** Er generiert eine automatische Arbeitszusammenfassung basierend auf den Code-Änderungen.
|
|
3. **Status-Update:** Der Agent führt das Skript `python3 dev_session.py --report-status` im Hintergrund aus.
|
|
- Die in der aktuellen Session investierte Zeit wird berechnet und in Notion gespeichert.
|
|
- Ein neuer Statusbericht mit der Zusammenfassung wird an den Notion-Task angehängt.
|
|
- Der Status des Tasks in Notion wird auf "Done" (oder einen anderen passenden Status) gesetzt.
|
|
4. **Commit & Push:** Wenn Code-Änderungen vorhanden sind, wird ein Commit erstellt und ein `git push` interaktiv angefragt.
|
|
|
|
### ⚠️ Troubleshooting: Git `push`/`pull` Fehler in Docker-Containern
|
|
|
|
Gelegentlich kann es vorkommen, dass `git push` oder `git pull` Befehle aus dem `gemini-session` Docker-Container heraus mit Fehlern wie `Could not resolve host` oder `Failed to connect to <Gitea-Domain>` fehlschlagen, selbst wenn die externe Gitea-URL (z.B. `floke-gitea.duckdns.org`) im Host-System erreichbar ist. Dies liegt daran, dass der Docker-Container möglicherweise nicht dieselben DNS-Auflösungsmechanismen oder eine direkte Verbindung zur externen Adresse hat.
|
|
|
|
**Problem:** Standard-DNS-Auflösung und externe Hostnamen schlagen innerhalb des Docker-Containers fehl.
|
|
|
|
**Lösung:** Um eine robuste und direkte Verbindung zum Gitea-Container auf dem *selben Docker-Host* herzustellen, sollte die Git Remote URL auf die **lokale IP-Adresse des Docker-Hosts** und die **token-basierte Authentifizierung** umgestellt werden.
|
|
|
|
**Schritte zur Konfiguration:**
|
|
|
|
1. **Lokale IP des Docker-Hosts ermitteln:**
|
|
* Finden Sie die lokale IP-Adresse des Servers (z.B. Ihrer Diskstation), auf dem die Docker-Container laufen. Beispiel: `192.168.178.6`.
|
|
2. **Gitea-Token aus `.env` ermitteln:**
|
|
* Finden Sie das Gitea-Token (das im Format `<Token>` in der `.env`-Datei oder in der vorherigen `git remote -v` Ausgabe zu finden ist). Beispiel: `318c736205934dd066b6bbcb1d732931eaa7c8c4`.
|
|
3. **Git Remote URL aktualisieren:**
|
|
* Verwenden Sie den folgenden Befehl, um die Remote-URL zu aktualisieren. Ersetzen Sie `<Username>`, `<Token>` und `<Local-IP-Adresse>` durch Ihre Werte.
|
|
```bash
|
|
git remote set-url origin http://<Username>:<Token>@<Local-IP-Adresse>:3000/Floke/Brancheneinstufung2.git
|
|
```
|
|
* **Beispiel (mit Ihren Daten):**
|
|
```bash
|
|
git remote set-url origin http://Floke:318c736205934dd066b6bbcb1d732931eaa7c8c4@192.168.178.6:3000/Floke/Brancheneinstufung2.git
|
|
```
|
|
*(Hinweis: Für die interne Docker-Kommunikation ist `http` anstelle von `https` oft ausreichend und kann Probleme mit SSL-Zertifikaten vermeiden.)*
|
|
4. **Verifizierung:**
|
|
* Führen Sie `git fetch` aus, um die neue Konfiguration zu testen. Es sollte nun ohne Passwortabfrage funktionieren:
|
|
```bash
|
|
git fetch
|
|
```
|
|
|
|
Diese Konfiguration gewährleistet eine stabile Git-Verbindung innerhalb Ihrer Docker-Umgebung.
|
|
|
|
---
|
|
|
|
## Project Overview
|
|
|
|
This project is a Python-based system for automated company data enrichment and lead generation. It focuses on identifying B2B companies with high potential for robotics automation (Cleaning, Transport, Security, Service).
|
|
|
|
The system architecture has evolved from a CLI-based toolset to a modern web application (`company-explorer`) backed by Docker containers.
|
|
|
|
## Current Status (Jan 15, 2026) - Company Explorer (Robotics Edition v0.5.0)
|
|
|
|
### 1. Contacts Management (v0.5)
|
|
* **Full CRUD:** Integrated Contact Management system with direct editing capabilities.
|
|
* **Global List View:** Dedicated view for all contacts across all companies with search and filter.
|
|
* **Data Model:** Supports advanced fields like Academic Title, Role Interpretation (Decision Maker vs. User), and Marketing Automation Status.
|
|
* **Bulk Import:** CSV-based bulk import for contacts that automatically creates missing companies and prevents duplicates via email matching.
|
|
|
|
### 2. UI/UX Modernization
|
|
* **Light/Dark Mode:** Full theme support with toggle.
|
|
* **Grid Layout:** Unified card-based layout for both Company and Contact lists.
|
|
* **Mobile Responsiveness:** Optimized Inspector overlay and navigation for mobile devices.
|
|
* **Tabbed Inspector:** Clean separation between Company Overview and Contact Management within the details pane.
|
|
|
|
### 3. Advanced Configuration (Settings)
|
|
* **Industry Verticals:** Database-backed configuration for target industries (Description, Focus Flag, Primary Product).
|
|
* **Job Role Mapping:** Configurable patterns (Regex/Text) to map job titles on business cards to internal roles (e.g., "CTO" -> "Innovation Driver").
|
|
* **Robotics Categories:** Existing AI reasoning logic remains configurable via the UI.
|
|
|
|
### 4. Robotics Potential Analysis (v2.3)
|
|
* **Chain-of-Thought Logic:** The AI analysis (`ClassificationService`) uses multi-step reasoning to evaluate physical infrastructure.
|
|
* **Provider vs. User:** Strict differentiation logic implemented.
|
|
|
|
### 5. Web Scraping & Legal Data (v2.2)
|
|
* **Impressum Scraping:** 2-Hop Strategy and Root Fallback logic.
|
|
* **Manual Overrides:** Users can manually correct Wikipedia, Website, and Impressum URLs directly in the UI.
|
|
|
|
## Lessons Learned & Best Practices
|
|
|
|
1. **Numeric Extraction (German Locale):**
|
|
* **Problem:** "1.005 Mitarbeiter" was extracted as "1" (treating dot as decimal).
|
|
* **Solution:** Implemented context-aware logic. If a number has a dot followed by exactly 3 digits (and no comma), it is treated as a thousands separator.
|
|
* **Revenue:** For revenue (`is_revenue=True`), dots are generally treated as decimals (e.g. "375.6 Mio") unless unambiguous multiple dots exist. Billion/Mrd is converted to 1000 Million.
|
|
|
|
2. **The Wolfra/Greilmeier/Erding Fixes (Advanced Metric Parsing):**
|
|
* **Problem:** Simple regex parsers fail on complex sentences with multiple numbers, concatenated years, or misleading prefixes.
|
|
* **Solution (Hybrid Extraction & Regression Testing):**
|
|
1. **LLM Guidance:** The LLM provides an `expected_value` (e.g., "8.000 m²").
|
|
2. **Robust Python Parser (`MetricParser`):** This parser aggressively cleans the `expected_value` (stripping units like "m²") to get a numerical target. It then intelligently searches the full text for this target, ignoring other numbers (like "2" in "An 2 Standorten").
|
|
3. **Specific Bug Fixes:**
|
|
- **Year-Suffix:** Logic to detect and remove trailing years from concatenated numbers (e.g., "802020" -> "80").
|
|
- **Year-Prefix:** Logic to ignore year-like numbers (1900-2100) if other, more likely candidates exist in the text.
|
|
- **Sentence Truncation:** Removed overly aggressive logic that cut off sentences after a hyphen, which caused metrics at the end of a phrase to be missed.
|
|
* **Safeguard:** These specific cases are now locked in via `test_metric_parser.py` to prevent future regressions.
|
|
|
|
3. **LLM JSON Stability:**
|
|
* **Problem:** LLMs often wrap JSON in Markdown blocks (` ```json `), causing `json.loads()` to fail.
|
|
* **Solution:** ALWAYS use a `clean_json_response` helper that strips markers before parsing. Never trust raw LLM output.
|
|
|
|
4. **LLM Structure Inconsistency:**
|
|
* **Problem:** Even with `json_mode=True`, models sometimes wrap the result in a list `[...]` instead of a flat object `{...}`, breaking frontend property access.
|
|
* **Solution:** Implement a check: `if isinstance(result, list): result = result[0]`.
|
|
|
|
5. **Scraping Navigation:**
|
|
* **Problem:** Searching for "Impressum" only on the *scraped* URL (which might be a subpage found via Google) often fails.
|
|
* **Solution:** Always implement a fallback to the **Root Domain** AND a **2-Hop check** via the "Kontakt" page.
|
|
|
|
6. **Frontend State Management:**
|
|
* **Problem:** Users didn't see when a background job finished.
|
|
* **Solution:** Implementing a polling mechanism (`setInterval`) tied to a `isProcessing` state is superior to static timeouts for long-running AI tasks.
|
|
|
|
7. **Hyper-Personalized Marketing Engine (v3.2) - "Deep Persona Injection":**
|
|
* **Problem:** Marketing texts were too generic and didn't reflect the specific psychological or operative profile of the different target roles (e.g., CFO vs. Facility Manager).
|
|
* **Solution (Deep Sync & Prompt Hardening):**
|
|
1. **Extended Schema:** Added `description`, `convincing_arguments`, and `kpis` to the `Persona` database model to store richer profile data.
|
|
2. **Notion Master Sync:** Updated the synchronization logic to pull these deep insights directly from the Notion "Personas / Roles" database.
|
|
3. **Role-Centric Prompts:** The `MarketingMatrix` generator was re-engineered to inject the persona's "Mindset" and "KPIs" into the prompt.
|
|
* **Example (Healthcare):**
|
|
- **Infrastructure Lead:** Focuses now on "IT Security", "DSGVO Compliance", and "WLAN integration".
|
|
- **Economic Buyer (CFO):** Focuses on "ROI Amortization", "Reduction of Overtime", and "Flexible Financing (RaaS)".
|
|
* **Verification:** Verified that the transition from a company-specific **Opener** (e.g., observing staff shortages at Klinikum Erding) to the **Role-specific Intro** (e.g., pitching transport robots to reduce walking distances for nursing directors) is seamless and logical.
|
|
|
|
## Metric Parser - Regression Tests
|
|
To ensure the stability and accuracy of the metric extraction logic, a dedicated test suite (`/company-explorer/backend/tests/test_metric_parser.py`) has been created. It covers the following critical, real-world bug fixes:
|
|
|
|
1. **`test_wolfra_concatenated_year_bug`**:
|
|
* **Problem:** A number and year were concatenated (e.g., "802020").
|
|
* **Test:** Ensures the parser correctly identifies and strips the trailing year, extracting `80`.
|
|
|
|
2. **`test_erding_year_prefix_bug`**:
|
|
* **Problem:** A year appeared before the actual metric in the sentence (e.g., "2022 ... 200.000 Besucher").
|
|
* **Test:** Verifies that the parser's "Smart Year Skip" logic ignores the year and correctly extracts `200000`.
|
|
|
|
3. **`test_greilmeier_multiple_numbers_bug`**:
|
|
* **Problem:** The text contained multiple numbers ("An 2 Standorten ... 8.000 m²"), and the parser incorrectly picked the first one.
|
|
* **Test:** Confirms that when an `expected_value` (like "8.000 m²") is provided, the parser correctly cleans it and extracts the corresponding number (`8000`), ignoring other irrelevant numbers.
|
|
|
|
These tests are crucial for preventing regressions as the parser logic evolves.
|
|
|
|
## Notion Maintenance & Data Sync
|
|
|
|
Since the "Golden Record" for Industry Verticals (Pains, Gains, Products) resides in Notion, specific tools are available to read and sync this data.
|
|
|
|
**Location:** `/app/company-explorer/backend/scripts/notion_maintenance/`
|
|
|
|
**Prerequisites:**
|
|
- Ensure `.env` is loaded with `NOTION_API_KEY` and correct DB IDs.
|
|
|
|
**Key Scripts:**
|
|
|
|
1. **`check_relations.py` (Reader - Deep):**
|
|
* **Purpose:** Reads Verticals and resolves linked Product Categories (Relation IDs -> Names). Essential for verifying the "Primary/Secondary Product" logic.
|
|
* **Usage:** `python3 check_relations.py`
|
|
|
|
2. **`update_notion_full.py` (Writer - Batch):**
|
|
* **Purpose:** Batch updates Pains and Gains for multiple verticals. Use this as a template when refining the messaging strategy.
|
|
* **Usage:** Edit the dictionary in the script, then run `python3 update_notion_full.py`.
|
|
|
|
3. **`list_notion_structure.py` (Schema Discovery):**
|
|
* **Purpose:** Lists all property keys and page titles. Use this to debug schema changes (e.g. if a column was renamed).
|
|
- **Usage:** `python3 list_notion_structure.py`
|
|
|
|
## Next Steps (Updated Feb 27, 2026)
|
|
|
|
***HINWEIS:*** *Dieser Abschnitt ist veraltet. Die aktuellen nächsten Schritte beziehen sich auf die Migrations-Vorbereitung und sind in der Datei [`RELOCATION.md`](./RELOCATION.md) dokumentiert.*
|
|
|
|
* **Notion Content:** Finalize "Pains" and "Gains" for all 25 verticals in the Notion master database.
|
|
* **Intelligence:** Run `generate_matrix.py` in the Company Explorer backend to populate the matrix for all new English vertical names.
|
|
* **Automation:** Register the production webhook (requires `admin-webhooks` rights) to enable real-time CRM sync without manual job injection.
|
|
* **Execution:** Connect the "Sending Engine" (the actual email dispatch logic) to the SuperOffice fields.
|
|
* **Monitoring:** Monitor the 'Atomic PATCH' logs in production for any 400 errors regarding field length or specific character sets.
|
|
|
|
|
|
## Company Explorer Access & Debugging
|
|
|
|
The **Company Explorer** is the central intelligence engine.
|
|
|
|
**Core Paths:**
|
|
* **Database:** `/app/companies_v3_fixed_2.db` (SQLite)
|
|
* **Backend Code:** `/app/company-explorer/backend/`
|
|
* **Logs:** `/app/logs_debug/company_explorer_debug.log`
|
|
|
|
**Accessing Data:**
|
|
To inspect live data without starting the full stack, use `sqlite3` directly or the helper scripts (if environment permits).
|
|
|
|
* **Direct SQL:** `sqlite3 /app/companies_v3_fixed_2.db "SELECT * FROM companies WHERE name LIKE '%Firma%';" `
|
|
* **Python (requires env):** The app runs in a Docker container. When debugging from outside (CLI agent), Python dependencies like `sqlalchemy` might be missing in the global scope. Prefer `sqlite3` for quick checks.
|
|
|
|
**Key Endpoints (Internal API :8000):**
|
|
* `POST /api/provision/superoffice-contact`: Triggers the text generation logic.
|
|
* `GET /api/companies/{id}`: Full company profile including enrichment data.
|
|
|
|
**Troubleshooting:**
|
|
* **"BaseModel" Error:** Usually a mix-up between Pydantic and SQLAlchemy `Base`. Check imports in `database.py`.
|
|
* **Missing Dependencies:** The CLI agent runs in `/app` but not necessarily inside the container's venv. Use standard tools (`grep`, `sqlite3`) where possible.
|
|
|
|
---
|
|
|
|
## Critical Debugging Session (Feb 21, 2026) - Re-Stabilizing the Analysis Engine
|
|
|
|
A critical session was required to fix a series of cascading failures in the `ClassificationService`. The key takeaways are documented here to prevent future issues.
|
|
|
|
1. **The "Phantom" `NameError`:**
|
|
* **Symptom:** The application crashed with a `NameError: name 'joinedload' is not defined`, even though the import was correctly added to `classification.py`.
|
|
* **Root Cause:** The `uvicorn` server's hot-reload mechanism within the Docker container did not reliably pick up file changes made from outside the container. A simple `docker-compose restart` was insufficient to clear the process's cached state.
|
|
* **Solution:** After any significant code change, especially to imports or core logic, a forced-recreation of the container is **mandatory**.
|
|
```bash
|
|
# Correct Way to Apply Changes:
|
|
docker-compose up -d --build --force-recreate company-explorer
|
|
```
|
|
|
|
2. **The "Invisible" Logs:**
|
|
* **Symptom:** No debug logs were being written, making it impossible to trace the execution flow.
|
|
* **Root Cause:** The `LOG_DIR` path in `/company-explorer/backend/config.py` was misconfigured (`/app/logs_debug`) and did not point to the actual, historical log directory (`/app/Log_from_docker`).
|
|
* **Solution:** Configuration paths must be treated as absolute and verified. Correcting the `LOG_DIR` path immediately resolved the issue.
|
|
|
|
3. **Inefficient Debugging Loop:**
|
|
* **Symptom:** The cycle of triggering a background job via API, waiting, and then manually checking logs was slow and inefficient.
|
|
* **Root Cause:** Lack of a tool to test the core application logic in isolation.
|
|
* **Solution:** The creation of a dedicated, interactive test script (`/company-explorer/backend/scripts/debug_single_company.py`). This script allows running the entire analysis for a single company in the foreground, providing immediate and detailed feedback. This pattern is invaluable for complex, multi-step processes and should be a standard for future development.
|
|
## Production Migration & Multi-Campaign Support (Feb 27, 2026)
|
|
|
|
The system has been fully migrated to the SuperOffice production environment (`online3.superoffice.com`, tenant `Cust26720`).
|
|
|
|
### 1. Final UDF Mappings (Production)
|
|
These ProgIDs are verified and active for the production tenant:
|
|
|
|
| Field Purpose | Entity | ProgID | Notes |
|
|
| :--- | :--- | :--- | :--- |
|
|
| **MA Subject** | Person | `SuperOffice:19` | |
|
|
| **MA Intro** | Person | `SuperOffice:20` | |
|
|
| **MA Social Proof** | Person | `SuperOffice:21` | |
|
|
| **MA Unsubscribe** | Person | `SuperOffice:22` | URL format |
|
|
| **MA Campaign** | Person | `SuperOffice:23` | List field (uses `:DisplayText`) |
|
|
| **Vertical** | Contact | `SuperOffice:83` | List field (mapped via JSON) |
|
|
| **AI Summary** | Contact | `SuperOffice:84` | Truncated to 132 chars |
|
|
| **AI Last Update** | Contact | `SuperOffice:85` | Format: `[D:MM/DD/YYYY HH:MM:SS]` |
|
|
| **Opener Primary** | Contact | `SuperOffice:86` | |
|
|
| **Opener Secondary**| Contact | `SuperOffice:87` | |
|
|
| **Last Outreach** | Contact | `SuperOffice:88` | |
|
|
|
|
### 2. Vertical ID Mapping (Production)
|
|
The full list of 25 verticals with their internal SuperOffice IDs (List `udlist331`):
|
|
`Automotive - Dealer: 1613, Corporate - Campus: 1614, Energy - Grid & Utilities: 1615, Energy - Solar/Wind: 1616, Healthcare - Care Home: 1617, Healthcare - Hospital: 1618, Hospitality - Gastronomy: 1619, Hospitality - Hotel: 1620, Industry - Manufacturing: 1621, Infrastructure - Communities: 1622, Infrastructure - Public: 1623, Infrastructure - Transport: 1624, Infrastructure - Parking: 1625, Leisure - Entertainment: 1626, Leisure - Fitness: 1627, Leisure - Indoor Active: 1628, Leisure - Outdoor Park: 1629, Leisure - Wet & Spa: 1630, Logistics - Warehouse: 1631, Others: 1632, Reinigungsdienstleister: 1633, Retail - Food: 1634, Retail - Non-Food: 1635, Retail - Shopping Center: 1636, Tech - Data Center: 1637`.
|
|
|
|
### 3. Technical Lessons Learned (SO REST API)
|
|
|
|
1. **Atomic PATCH (Stability):** Bundling all contact updates into a single `PATCH` request to the `/Contact/{id}` endpoint is far more stable than sequential UDF updates. If one field fails (e.g. invalid property), the whole transaction might roll back or partially fail—proactive validation is key.
|
|
2. **Website Sync (`Urls` Array):** Updating the website via REST requires manipulating the `Urls` array property. Simple field assignment to `UrlAddress` fails during `PATCH`.
|
|
* *Correct Format:* `"Urls": [{"Value": "https://example.com", "Description": "AI Discovered"}]`.
|
|
3. **List Resolution (`:DisplayText`):** To get the clean string value of a list field (like Campaign Name) without extra API calls, use the pseudo-field `ProgID:DisplayText` in the `$select` parameter.
|
|
4. **Field Length Limits:** Standard SuperOffice text UDFs are limited to approx. 140-254 characters. AI-generated summaries must be truncated (e.g. 132 chars) to avoid 400 Bad Request errors.
|
|
5. **Docker `env_file` Importance:** For production, mapping individual variables in `docker-compose.yml` is error-prone. Using `env_file: .env` ensures all services stay synchronized with the latest UDF IDs and mappings.
|
|
6. **Production URL Schema:** The production API is strictly hosted on `online3.superoffice.com` (for this tenant), while OAuth remains at `online.superoffice.com`.
|
|
|
|
### 4. Campaign Trigger Logic
|
|
The `worker.py` (v1.8) now extracts the `campaign_tag` from `SuperOffice:23:DisplayText`. This tag is passed to the Company Explorer's provisioning API. If a matching entry exists in the `MarketingMatrix` for that tag, specific texts are used; otherwise, it falls back to the "standard" Kaltakquise texts.
|
|
|
|
### 5. SuperOffice Authentication (Critical Update Feb 28, 2026)
|
|
|
|
**Problem:** Authentication failures ("Invalid refresh token" or "Invalid client_id") occurred because standard `load_dotenv()` did not override stale environment variables present in the shell process.
|
|
|
|
**Solution:** Always use `load_dotenv(override=True)` in Python scripts to force loading the actual values from the `.env` file.
|
|
|
|
**Correct Authentication Pattern (Python):**
|
|
```python
|
|
from dotenv import load_dotenv
|
|
import os
|
|
|
|
# CRITICAL: override=True ensures we read from .env even if env vars are already set
|
|
load_dotenv(override=True)
|
|
|
|
client_id = os.getenv("SO_CLIENT_ID")
|
|
# ...
|
|
```
|
|
|
|
**Known Working Config (Production):**
|
|
* **Environment:** `online3`
|
|
* **Tenant:** `Cust26720`
|
|
* **Token Logic:** The `AuthHandler` implementation in `health_check_so.py` is the reference standard. Avoid using legacy `superoffice_client.py` without verifying it uses `override=True`.
|
|
|
|
### 6. Sales & Opportunities (Roboplanet Specifics)
|
|
|
|
When creating sales via API, specific constraints apply due to the shared tenant with Wackler:
|
|
|
|
* **SaleTypeId:** MUST be **14** (`GE:"Roboplanet Verkauf";`) to ensure the sale is assigned to the correct business unit.
|
|
* *Alternative:* ID 16 (`GE:"Roboplanet Teststellung";`) for trials.
|
|
* **Mandatory Fields:**
|
|
* `Saledate` (Estimated Date): Must be provided in ISO format (e.g., `YYYY-MM-DDTHH:MM:SSZ`).
|
|
* `Person`: Highly recommended linking to a specific person, not just the company.
|
|
* **Context:** Avoid creating sales on the parent company "Wackler Service Group" (ID 3). Always target the specific lead company.
|
|
|
|
### Analyse der SuperOffice `Sale`-Entität (März 2026)
|
|
|
|
- **Ziel:** Erstellung eines Reports, der abbildet, welche Kunden welche Produkte angeboten bekommen oder gekauft haben. Die initiale Vermutung war, dass Produktinformationen oft als Freitext-Einträge und nicht über den offiziellen Produktkatalog erfasst werden.
|
|
- **Problem:** Die Untersuchung der Datenstruktur zeigte, dass die API-Endpunkte zur Abfrage von `Quote`-Objekten (Angeboten) und `QuoteLines` (Angebotspositionen) über `Sale`-, `Contact`- oder `Project`-Beziehungen hinweg nicht zuverlässig funktionierten. Viele Abfragen resultierten in `500 Internal Server Errors` oder leeren Datenmengen, was eine direkte Verknüpfung von Verkauf zu Produkt unmöglich machte.
|
|
- **Kern-Erkenntnis (Datenstruktur):**
|
|
1. **Freitext statt strukturierter Daten:** Die Analyse eines konkreten `Sale`-Objekts (ID `342243`) bestätigte die ursprüngliche Hypothese. Produktinformationen (z.B. `2xOmnie CD-01 mit Nachlass`) werden direkt in das `Heading`-Feld (Betreff) des `Sale`-Objekts als Freitext eingetragen. Es existieren oft keine verknüpften `Quote`- oder `QuoteLine`-Entitäten.
|
|
2. **Datenqualität bei Verknüpfungen:** Eine signifikante Anzahl von `Sale`-Objekten im System weist keine Verknüpfung zu einem `Contact`-Objekt auf (`Contact: null`). Dies erschwert die automatische Zuordnung von Verkäufen zu Kunden erheblich.
|
|
- **Nächster Schritt / Lösungsweg:** Ein Skript (`/app/connector-superoffice/generate_customer_product_report.py`) wurde entwickelt, das diese Probleme adressiert. Es fragt gezielt nur `Sale`-Objekte ab, die eine gültige `Contact`-Verknüpfung besitzen (`$filter=Contact ne null`). Anschließend extrahiert es den Kundennamen und das `Heading`-Feld des Verkaufs und durchsucht letzteres nach vordefinierten Produkt-Schlüsselwörtern. Die Ergebnisse werden für die manuelle Analyse in einer CSV-Datei (`product_report.csv`) gespeichert. Dieser Ansatz ist der einzig verlässliche Weg, um die gewünschten Informationen aus dem System zu extrahieren.
|
|
|
|
### 7. Service & Tickets (Anfragen)
|
|
|
|
SuperOffice Tickets represent the support and request system. Like Sales, they are organized to allow separation between Roboplanet and Wackler.
|
|
|
|
* **Entity Name:** `ticket`
|
|
* **Roboplanet Specific Categories (CategoryId):**
|
|
* **ID 46:** `GE:"Lead Roboplanet";`
|
|
* **ID 47:** `GE:"Vertriebspartner Roboplanet";`
|
|
* **ID 48:** `GE:"Weitergabe Roboplanet";`
|
|
* **Hierarchical:** `Roboplanet/Support` (often used for technical issues).
|
|
* **Key Fields:**
|
|
* `ticketId`: Internal ID.
|
|
* `title`: The subject of the request.
|
|
* `contactId` / `personId`: Links to company and contact person.
|
|
* `ticketStatusId`: 1 (Unbearbeitet), 2 (In Arbeit), 3 (Bearbeitet).
|
|
* `ownedBy`: Often "ROBO" for Roboplanet staff.
|
|
* **Cross-Links:** Tickets can be linked to `saleId` (to track support during a sale) or `projectId`.
|
|
|
|
---
|
|
This is the core logic used to generate the company-specific opener.
|