Docs: Final update of SuperOffice Connector README with Noise Reduction, Sync-Runs, and bug resolution notes [31188f42]

This commit is contained in:
2026-03-04 18:41:03 +00:00
parent 434524338d
commit 81fcdc4e8a

View File

@@ -1,55 +1,53 @@
# SuperOffice Connector README
## Overview
This directory contains Python scripts designed to integrate with the SuperOffice CRM API, primarily for data extraction and analysis related to sales and customer product information.
This directory contains Python scripts designed to integrate with the SuperOffice CRM API, primarily for data enrichment and lead generation automation.
## 🚀 Production Deployment (March 2026)
**Status:** ✅ Live (with active workarounds)
**Status:** ✅ Live & Operational
**Environment:** `online3` (Production)
**Tenant:** `Cust26720`
### 1. Architecture & Flow
1. **Trigger:** SuperOffice sends a webhook (`contact.created`, `contact.changed`) to `https://floke-ai.duckdns.org/connector/webhook`.
2. **Reception:** `webhook_app.py` (FastAPI) receives the event, validates the token, and pushes a job to the SQLite queue (`connector_queue.db`).
3. **Processing:** `worker.py` polls the queue, fetches contact details from SuperOffice, and sends them to the **Company Explorer**.
4. **Enrichment:** Company Explorer analyzes the data and returns enrichment info (Vertical, Summary, etc.).
5. **Sync:** `worker.py` patches the data back into SuperOffice (`UserDefinedFields`).
1. **Trigger:** SuperOffice sends a webhook (`contact.created`, `contact.changed`, `person.created`, `person.changed`) to `https://floke-ai.duckdns.org/connector/webhook`.
2. **Reception:** `webhook_app.py` (FastAPI) receives the event, validates the `WEBHOOK_TOKEN`, and pushes a job to the SQLite queue (`connector_queue.db`).
3. **Processing:** `worker.py` (v1.9) polls the queue, filters for relevance, fetches details from SuperOffice, and calls the **Company Explorer** for AI analysis.
4. **Sync:** Results (Vertical, Summary, hyper-personalized Openers) are patched back into SuperOffice `UserDefinedFields`.
### 2. Critical Issues & Workarounds (Lessons Learned)
### 2. Operational Resilience (Lessons Learned)
#### 🛑 A. The "Unhashable Dict" API Bug (Critical)
* **Symptom:** When fetching contact details (`GET /Contact/{id}`), Python crashes with `TypeError: unhashable type: 'dict'` when accessing `UserDefinedFields`.
* **Cause:** The SuperOffice Production API (`online3`) returns a malformed structure for `UserDefinedFields` for this tenant. It appears that one of the keys in the JSON response is being parsed as a dictionary object instead of a string, rendering the entire dictionary invalid for standard Python lookups. This behavior **did not** occur on the DEV (`sod`) environment.
* **Workaround (Active):** The `worker.py` implements a **"Fail Open"** strategy (`safe_get_udfs`).
* It catches the `TypeError`.
* It treats the existing UDFs as **empty**.
* It proceeds with the enrichment and overwrites/patches the fields blindly.
* *Consequence:* We cannot check if a field is already set before writing. We always write.
#### 🛡️ A. Noise Reduction & Loop Prevention
* **Problem:** Every data update (`PATCH`) triggers a new `contact.changed` webhook, potentially creating infinite loops (Ping-Pong effect). Also, irrelevant changes (e.g., phone numbers) waste AI processing resources.
* **Solution:** Strict **Whitelist Filtering** in `worker.py`.
* **Contacts:** Only changes to `name`, `urladdress`, `urls`, `orgnr`, or `userdef_id` (UDFs) trigger processing.
* **Persons:** Only changes to `jobtitle`, `position_id`, or `userdef_id` trigger processing.
* All other events are instantly marked as **`SKIPPED`** (visible in Dashboard).
#### 🔄 B. The "Ping-Pong" Loop (Resolved)
* **Symptom:** Accounts stuck in a loop (`Processing` -> `Completed` -> `Processing` -> ...).
* **Cause:** Due to Workaround A (Blind Write), the worker *always* sends a PATCH request to SuperOffice. Every PATCH triggers a new `contact.changed` webhook. Since we can't read the value to see "it's already done", we write again -> new webhook -> infinite loop.
* **Solution:** Implemented a **Circuit Breaker** in `worker.py`.
* The worker checks the `ChangedByAssociateId` in the webhook payload.
* If the ID matches our API User (**528**), the job is immediately marked as `SUCCESS` and skipped.
#### 📊 B. Dashboard & Sync-Runs
* **Problem:** Continuous webhooks created hundreds of messy dashboard entries for the same account.
* **Solution:** **Sync-Run Clustering**.
* Jobs for the same account occurring within **15 minutes** of each other are grouped into a single "Sync-Run" row.
* **Status Prioritization:** A row shows `COMPLETED` if at least one job was successful, even if subsequent "echo" webhooks were `SKIPPED`.
#### 🔐 C. Environment Variables in Docker
* **Lesson:** `docker-compose` `env_file` directive makes variables available to the container, but python scripts run *manually* via `docker exec` do NOT see them unless `load_dotenv` is used explicitly.
* **Fix:** All scripts in `tools/` now explicitly load the `.env` from the project root. `config.py` defaults to `online3` to prevent accidental dev-connections.
#### 🐞 C. The "Unhashable Dict" Mystery
* **Incident:** During deployment, a `TypeError: unhashable type: 'dict'` occurred. Initial suspicion was a SuperOffice API bug.
* **Resolution:** Deep analysis of raw JSON proved the API is healthy. The error was caused by a bug in the *test/dashboard scripts* (creating a dictionary literal using another dictionary as a key).
* **Lesson:** Always verify raw API output (`debug_raw_response.py`) before blaming the vendor.
### 3. Tooling (New)
### 3. Tooling & Diagnosis
Located in `connector-superoffice/tools/`:
* `create_company.py`: Creates a test company ("Bremer Abenteuerland") in Prod to trigger the webhook flow.
* `verify_enrichment.py`: Checks if the enrichment data (Vertical, Summary) actually landed in SuperOffice (bypassing the UDF crash).
* `debug_raw_response.py`: Saves the raw JSON response from SuperOffice to `raw_api_response.json` for debugging API structure issues.
* `who_am_i.py`: Attempts to identify the current API user (Associate ID).
* `verify_enrichment.py`: Verifies if AI data actually landed in SuperOffice (using raw text search).
* `create_company.py`: Creates a test company in Prod to trigger the flow.
* `full_discovery.py`: Lists all available UDFs and Lists on the current tenant.
* `final_truth_check.py`: Validates JSON structure of API responses.
* `who_am_i.py`: Identifies the API user's Associate ID.
### 4. Open Todos
* [ ] **Support Ticket:** Wait for SuperOffice to fix the `UserDefinedFields` JSON structure on `Cust26720`.
* [ ] **Docker Optimization:** The `connector-superoffice` build takes >8 minutes due to compiling C-extensions. Implement Multi-Stage Build.
* [ ] **Hardcoded ID:** Make the Circuit Breaker ID (`528`) configurable via `.env` (`SO_API_ASSOCIATE_ID`).
* [ ] **List ID Mapping:** Identify the correct List ID for the "Vertical" field (ProgID `SuperOffice:83`). `udlist331` returned 404.
* [ ] **Mailing Identity:** Resolve 500 error on `Associate/Me`. Required for automated "Send As" mailing functionality.
* [ ] **Docker Build:** Optimize `connector-superoffice` build time (>8 mins) using Multi-Stage Dockerfiles to avoid C-compilation on every deploy.
---
@@ -57,34 +55,4 @@ Located in `connector-superoffice/tools/`:
Authentication is handled via the `AuthHandler` class, which uses a refresh token flow to obtain access tokens. Ensure that the `.env` file in the project root is correctly configured with `SO_CLIENT_ID`, `SO_CLIENT_SECRET`, `SO_REFRESH_TOKEN`, `SO_REDIRECT_URI`, `SO_ENVIRONMENT`, and `SO_CONTEXT_IDENTIFIER`.
## Key SuperOffice Entities and Data Model Observations
During the development of reporting functionalities, the following observations were made regarding the SuperOffice data model:
### 1. Sale Entity
- **Primary Source for Product Information:** Contrary to initial expectations, product information is frequently stored as free-text within the `Heading` field of the `Sale` object (e.g., `"2xOmnie CD-01 mit Nachlass"`). This appears to be a common practice in the system, rather than utilizing structured product catalog items linked via quotes.
- **Contact Association:** A significant number of `Sale` objects (`Sale?$filter=Contact ne null`) are not directly linked to a `Contact` object (`Contact: null`), making it challenging to attribute sales to specific customers programmatically. Our reporting scripts specifically filter for sales where a `Contact` is present.
- **No Direct Quote/QuoteLine Linkage:** Attempts to retrieve `Quote` or `QuoteLine` objects directly via `Sale/{saleId}/Quotes`, `Contact/{contactId}/Quotes`, or `Sale/{saleId}/Activities` resulted in `500 Internal Server Errors` or empty result sets. This indicates that direct, API-accessible linkages between `Sales` and `structured QuoteLines` are often absent or not exposed via these endpoints.
### 2. Product Information Storage (Hypothesis & Workaround)
- **Free-Text in Heading:** The primary source for identifying products associated with a sale is the `Heading` field of the `Sale` entity itself. This field often contains product codes, descriptions, and other relevant details as free-text.
- **User-Defined Fields (UDFs):** While `UserDefinedFields` were inspected for structured product data (e.g., `RR-02-017-OMNIE`), no such patterns were found in the `sale_id=342243` example. This suggests that UDFs are either not consistently used for product codes or are named in a way that doesn't align with common product terminology.
## Scripts
### `list_products.py`
- **Purpose:** Fetches and displays a list of all defined product families from the SuperOffice product catalog (`/List/ProductFamily/Items`).
- **Usage:** `python3 list_products.py`
### `generate_customer_product_report.py`
- **Purpose:** Generates a CSV report of customer sales, extracting product information from the `Sale.Heading` field using keyword matching.
- **Methodology:**
1. Retrieves the latest `SALE_LIMIT` (e.g., 1000) `Sale` objects, filtering only those with an associated `Contact` (`$filter=Contact ne null`).
2. Extracts `SaleId`, `CustomerName`, and `SaleHeading` for each relevant sale.
3. Searches the `SaleHeading` for predefined `PRODUCT_KEYWORDS` (e.g., `OMNIE`, `CD-01`, `Service`).
4. Outputs the results to `product_report.csv`.
- **Usage:** `python3 generate_customer_product_report.py`
## Future Work
- **Analyse der leeren `product_report.csv`:** Untersuchen, warum die `product_report.csv` auch nach der Filterung nach `Sale`-Objekten mit `Contact`-Verknüpfung leer bleibt. Es ist entscheidend zu verstehen, ob es keine solchen Verkäufe gibt oder ob ein Problem mit der Datenabfrage oder -verarbeitung vorliegt.
- **Manuelle Inspektion gefilterter `Sale`-Objekte:** Wenn der Report leer ist, müssen wir einige `Sale`-Objekte, die die Bedingung `Contact ne null` erfüllen, manuell inspizieren, um ihre Struktur zu verstehen und festzustellen, ob das `Heading`-Feld oder andere Felder Produktinformationen enthalten.
- **Verfeinerung der `PRODUCT_KEYWORDS`:** Die Liste der Produkt-Schlüsselwörter muss möglicherweise erweitert werden, basierend auf einer detaillierteren manuellen Analyse der Verkaufsüberschriften.
- **Erforschung alternativer API-Pfade:** Falls der aktuelle Ansatz weiterhin Schwierigkeiten bereitet, müssen wir tiefer in die SuperOffice-API eintauchen, um strukturierte Produktdaten zu finden, auch wenn sie nicht direkt mit den Verkäufen verknüpft sind.
(See full file for historical notes on Sale Entity and Product Information Storage)