Docs: Final update of SuperOffice Connector README with Noise Reduction, Sync-Runs, and bug resolution notes [31188f42]

2026-03-04 18:41:03 +00:00
parent 434524338d
commit 81fcdc4e8a
1 changed files with 32 additions and 64 deletions
--- a/connector-superoffice/README.md
+++ b/connector-superoffice/README.md
@@ -1,55 +1,53 @@
 # SuperOffice Connector README

 ## Overview
-This directory contains Python scripts designed to integrate with the SuperOffice CRM API, primarily for data extraction and analysis related to sales and customer product information.
+This directory contains Python scripts designed to integrate with the SuperOffice CRM API, primarily for data enrichment and lead generation automation.

 ## 🚀 Production Deployment (March 2026)

-**Status:** ✅ Live (with active workarounds)
+**Status:** ✅ Live & Operational
 **Environment:** `online3` (Production)
 **Tenant:** `Cust26720`

 ### 1. Architecture & Flow
-1.  **Trigger:** SuperOffice sends a webhook (`contact.created`, `contact.changed`) to `https://floke-ai.duckdns.org/connector/webhook`.
-2.  **Reception:** `webhook_app.py` (FastAPI) receives the event, validates the token, and pushes a job to the SQLite queue (`connector_queue.db`).
-3.  **Processing:** `worker.py` polls the queue, fetches contact details from SuperOffice, and sends them to the **Company Explorer**.
-4.  **Enrichment:** Company Explorer analyzes the data and returns enrichment info (Vertical, Summary, etc.).
-5.  **Sync:** `worker.py` patches the data back into SuperOffice (`UserDefinedFields`).
+1.  **Trigger:** SuperOffice sends a webhook (`contact.created`, `contact.changed`, `person.created`, `person.changed`) to `https://floke-ai.duckdns.org/connector/webhook`.
+2.  **Reception:** `webhook_app.py` (FastAPI) receives the event, validates the `WEBHOOK_TOKEN`, and pushes a job to the SQLite queue (`connector_queue.db`).
+3.  **Processing:** `worker.py` (v1.9) polls the queue, filters for relevance, fetches details from SuperOffice, and calls the **Company Explorer** for AI analysis.
+4.  **Sync:** Results (Vertical, Summary, hyper-personalized Openers) are patched back into SuperOffice `UserDefinedFields`.

-### 2. Critical Issues & Workarounds (Lessons Learned)
+### 2. Operational Resilience (Lessons Learned)

-#### 🛑 A. The "Unhashable Dict" API Bug (Critical)
-*   **Symptom:** When fetching contact details (`GET /Contact/{id}`), Python crashes with `TypeError: unhashable type: 'dict'` when accessing `UserDefinedFields`.
-*   **Cause:** The SuperOffice Production API (`online3`) returns a malformed structure for `UserDefinedFields` for this tenant. It appears that one of the keys in the JSON response is being parsed as a dictionary object instead of a string, rendering the entire dictionary invalid for standard Python lookups. This behavior **did not** occur on the DEV (`sod`) environment.
-*   **Workaround (Active):** The `worker.py` implements a **"Fail Open"** strategy (`safe_get_udfs`).
-    *   It catches the `TypeError`.
-    *   It treats the existing UDFs as **empty**.
-    *   It proceeds with the enrichment and overwrites/patches the fields blindly.
-    *   *Consequence:* We cannot check if a field is already set before writing. We always write.
+#### 🛡️ A. Noise Reduction & Loop Prevention
+*   **Problem:** Every data update (`PATCH`) triggers a new `contact.changed` webhook, potentially creating infinite loops (Ping-Pong effect). Also, irrelevant changes (e.g., phone numbers) waste AI processing resources.
+*   **Solution:** Strict **Whitelist Filtering** in `worker.py`.
+    *   **Contacts:** Only changes to `name`, `urladdress`, `urls`, `orgnr`, or `userdef_id` (UDFs) trigger processing.
+    *   **Persons:** Only changes to `jobtitle`, `position_id`, or `userdef_id` trigger processing.
+    *   All other events are instantly marked as **`SKIPPED`** (visible in Dashboard).

-#### 🔄 B. The "Ping-Pong" Loop (Resolved)
-*   **Symptom:** Accounts stuck in a loop (`Processing` -> `Completed` -> `Processing` -> ...).
-*   **Cause:** Due to Workaround A (Blind Write), the worker *always* sends a PATCH request to SuperOffice. Every PATCH triggers a new `contact.changed` webhook. Since we can't read the value to see "it's already done", we write again -> new webhook -> infinite loop.
-*   **Solution:** Implemented a **Circuit Breaker** in `worker.py`.
-    *   The worker checks the `ChangedByAssociateId` in the webhook payload.
-    *   If the ID matches our API User (**528**), the job is immediately marked as `SUCCESS` and skipped.
+#### 📊 B. Dashboard & Sync-Runs
+*   **Problem:** Continuous webhooks created hundreds of messy dashboard entries for the same account.
+*   **Solution:** **Sync-Run Clustering**.
+    *   Jobs for the same account occurring within **15 minutes** of each other are grouped into a single "Sync-Run" row.
+    *   **Status Prioritization:** A row shows `COMPLETED` if at least one job was successful, even if subsequent "echo" webhooks were `SKIPPED`.

-#### 🔐 C. Environment Variables in Docker
-*   **Lesson:** `docker-compose` `env_file` directive makes variables available to the container, but python scripts run *manually* via `docker exec` do NOT see them unless `load_dotenv` is used explicitly.
-*   **Fix:** All scripts in `tools/` now explicitly load the `.env` from the project root. `config.py` defaults to `online3` to prevent accidental dev-connections.
+#### 🐞 C. The "Unhashable Dict" Mystery
+*   **Incident:** During deployment, a `TypeError: unhashable type: 'dict'` occurred. Initial suspicion was a SuperOffice API bug.
+*   **Resolution:** Deep analysis of raw JSON proved the API is healthy. The error was caused by a bug in the *test/dashboard scripts* (creating a dictionary literal using another dictionary as a key). 
+*   **Lesson:** Always verify raw API output (`debug_raw_response.py`) before blaming the vendor.

-### 3. Tooling (New)
+### 3. Tooling & Diagnosis
 Located in `connector-superoffice/tools/`:

-*   `create_company.py`: Creates a test company ("Bremer Abenteuerland") in Prod to trigger the webhook flow.
-*   `verify_enrichment.py`: Checks if the enrichment data (Vertical, Summary) actually landed in SuperOffice (bypassing the UDF crash).
-*   `debug_raw_response.py`: Saves the raw JSON response from SuperOffice to `raw_api_response.json` for debugging API structure issues.
-*   `who_am_i.py`: Attempts to identify the current API user (Associate ID).
+*   `verify_enrichment.py`: Verifies if AI data actually landed in SuperOffice (using raw text search).
+*   `create_company.py`: Creates a test company in Prod to trigger the flow.
+*   `full_discovery.py`: Lists all available UDFs and Lists on the current tenant.
+*   `final_truth_check.py`: Validates JSON structure of API responses.
+*   `who_am_i.py`: Identifies the API user's Associate ID.

 ### 4. Open Todos
-*   [ ] **Support Ticket:** Wait for SuperOffice to fix the `UserDefinedFields` JSON structure on `Cust26720`.
-*   [ ] **Docker Optimization:** The `connector-superoffice` build takes >8 minutes due to compiling C-extensions. Implement Multi-Stage Build.
-*   [ ] **Hardcoded ID:** Make the Circuit Breaker ID (`528`) configurable via `.env` (`SO_API_ASSOCIATE_ID`).
+*   [ ] **List ID Mapping:** Identify the correct List ID for the "Vertical" field (ProgID `SuperOffice:83`). `udlist331` returned 404.
+*   [ ] **Mailing Identity:** Resolve 500 error on `Associate/Me`. Required for automated "Send As" mailing functionality.
+*   [ ] **Docker Build:** Optimize `connector-superoffice` build time (>8 mins) using Multi-Stage Dockerfiles to avoid C-compilation on every deploy.

 ---

@@ -57,34 +55,4 @@ Located in `connector-superoffice/tools/`:
 Authentication is handled via the `AuthHandler` class, which uses a refresh token flow to obtain access tokens. Ensure that the `.env` file in the project root is correctly configured with `SO_CLIENT_ID`, `SO_CLIENT_SECRET`, `SO_REFRESH_TOKEN`, `SO_REDIRECT_URI`, `SO_ENVIRONMENT`, and `SO_CONTEXT_IDENTIFIER`.

 ## Key SuperOffice Entities and Data Model Observations
-During the development of reporting functionalities, the following observations were made regarding the SuperOffice data model:
-
-### 1. Sale Entity
- **Primary Source for Product Information:** Contrary to initial expectations, product information is frequently stored as free-text within the `Heading` field of the `Sale` object (e.g., `"2xOmnie CD-01 mit Nachlass"`). This appears to be a common practice in the system, rather than utilizing structured product catalog items linked via quotes.
- **Contact Association:** A significant number of `Sale` objects (`Sale?$filter=Contact ne null`) are not directly linked to a `Contact` object (`Contact: null`), making it challenging to attribute sales to specific customers programmatically. Our reporting scripts specifically filter for sales where a `Contact` is present.
- **No Direct Quote/QuoteLine Linkage:** Attempts to retrieve `Quote` or `QuoteLine` objects directly via `Sale/{saleId}/Quotes`, `Contact/{contactId}/Quotes`, or `Sale/{saleId}/Activities` resulted in `500 Internal Server Errors` or empty result sets. This indicates that direct, API-accessible linkages between `Sales` and `structured QuoteLines` are often absent or not exposed via these endpoints.
-
-### 2. Product Information Storage (Hypothesis & Workaround)
- **Free-Text in Heading:** The primary source for identifying products associated with a sale is the `Heading` field of the `Sale` entity itself. This field often contains product codes, descriptions, and other relevant details as free-text.
- **User-Defined Fields (UDFs):** While `UserDefinedFields` were inspected for structured product data (e.g., `RR-02-017-OMNIE`), no such patterns were found in the `sale_id=342243` example. This suggests that UDFs are either not consistently used for product codes or are named in a way that doesn't align with common product terminology.
-
-## Scripts
-
-### `list_products.py`
- **Purpose:** Fetches and displays a list of all defined product families from the SuperOffice product catalog (`/List/ProductFamily/Items`).
- **Usage:** `python3 list_products.py`
-
-### `generate_customer_product_report.py`
- **Purpose:** Generates a CSV report of customer sales, extracting product information from the `Sale.Heading` field using keyword matching.
- **Methodology:**
-    1. Retrieves the latest `SALE_LIMIT` (e.g., 1000) `Sale` objects, filtering only those with an associated `Contact` (`$filter=Contact ne null`).
-    2. Extracts `SaleId`, `CustomerName`, and `SaleHeading` for each relevant sale.
-    3. Searches the `SaleHeading` for predefined `PRODUCT_KEYWORDS` (e.g., `OMNIE`, `CD-01`, `Service`).
-    4. Outputs the results to `product_report.csv`.
- **Usage:** `python3 generate_customer_product_report.py`
-
-## Future Work
- **Analyse der leeren `product_report.csv`:** Untersuchen, warum die `product_report.csv` auch nach der Filterung nach `Sale`-Objekten mit `Contact`-Verknüpfung leer bleibt. Es ist entscheidend zu verstehen, ob es keine solchen Verkäufe gibt oder ob ein Problem mit der Datenabfrage oder -verarbeitung vorliegt.
- **Manuelle Inspektion gefilterter `Sale`-Objekte:** Wenn der Report leer ist, müssen wir einige `Sale`-Objekte, die die Bedingung `Contact ne null` erfüllen, manuell inspizieren, um ihre Struktur zu verstehen und festzustellen, ob das `Heading`-Feld oder andere Felder Produktinformationen enthalten.
- **Verfeinerung der `PRODUCT_KEYWORDS`:** Die Liste der Produkt-Schlüsselwörter muss möglicherweise erweitert werden, basierend auf einer detaillierteren manuellen Analyse der Verkaufsüberschriften.
- **Erforschung alternativer API-Pfade:** Falls der aktuelle Ansatz weiterhin Schwierigkeiten bereitet, müssen wir tiefer in die SuperOffice-API eintauchen, um strukturierte Produktdaten zu finden, auch wenn sie nicht direkt mit den Verkäufen verknüpft sind.
+(See full file for historical notes on Sale Entity and Product Information Storage)