Files
Brancheneinstufung2/connector-superoffice/README.md

8.1 KiB

SuperOffice Connector README

Overview

This directory contains Python scripts designed to integrate with the SuperOffice CRM API, primarily for data extraction and analysis related to sales and customer product information.

🚀 Production Deployment (March 2026)

Status: Live (with active workarounds) Environment: online3 (Production) Tenant: Cust26720

1. Architecture & Flow

  1. Trigger: SuperOffice sends a webhook (contact.created, contact.changed) to https://floke-ai.duckdns.org/connector/webhook.
  2. Reception: webhook_app.py (FastAPI) receives the event, validates the token, and pushes a job to the SQLite queue (connector_queue.db).
  3. Processing: worker.py polls the queue, fetches contact details from SuperOffice, and sends them to the Company Explorer.
  4. Enrichment: Company Explorer analyzes the data and returns enrichment info (Vertical, Summary, etc.).
  5. Sync: worker.py patches the data back into SuperOffice (UserDefinedFields).

2. Critical Issues & Workarounds (Lessons Learned)

🛑 A. The "Unhashable Dict" API Bug (Critical)

  • Symptom: When fetching contact details (GET /Contact/{id}), Python crashes with TypeError: unhashable type: 'dict' when accessing UserDefinedFields.
  • Cause: The SuperOffice Production API (online3) returns a malformed structure for UserDefinedFields for this tenant. It appears that one of the keys in the JSON response is being parsed as a dictionary object instead of a string, rendering the entire dictionary invalid for standard Python lookups. This behavior did not occur on the DEV (sod) environment.
  • Workaround (Active): The worker.py implements a "Fail Open" strategy (safe_get_udfs).
    • It catches the TypeError.
    • It treats the existing UDFs as empty.
    • It proceeds with the enrichment and overwrites/patches the fields blindly.
    • Consequence: We cannot check if a field is already set before writing. We always write.

🔄 B. The "Ping-Pong" Loop (Resolved)

  • Symptom: Accounts stuck in a loop (Processing -> Completed -> Processing -> ...).
  • Cause: Due to Workaround A (Blind Write), the worker always sends a PATCH request to SuperOffice. Every PATCH triggers a new contact.changed webhook. Since we can't read the value to see "it's already done", we write again -> new webhook -> infinite loop.
  • Solution: Implemented a Circuit Breaker in worker.py.
    • The worker checks the ChangedByAssociateId in the webhook payload.
    • If the ID matches our API User (528), the job is immediately marked as SUCCESS and skipped.

🔐 C. Environment Variables in Docker

  • Lesson: docker-compose env_file directive makes variables available to the container, but python scripts run manually via docker exec do NOT see them unless load_dotenv is used explicitly.
  • Fix: All scripts in tools/ now explicitly load the .env from the project root. config.py defaults to online3 to prevent accidental dev-connections.

3. Tooling (New)

Located in connector-superoffice/tools/:

  • create_company.py: Creates a test company ("Bremer Abenteuerland") in Prod to trigger the webhook flow.
  • verify_enrichment.py: Checks if the enrichment data (Vertical, Summary) actually landed in SuperOffice (bypassing the UDF crash).
  • debug_raw_response.py: Saves the raw JSON response from SuperOffice to raw_api_response.json for debugging API structure issues.
  • who_am_i.py: Attempts to identify the current API user (Associate ID).

4. Open Todos

  • Support Ticket: Wait for SuperOffice to fix the UserDefinedFields JSON structure on Cust26720.
  • Docker Optimization: The connector-superoffice build takes >8 minutes due to compiling C-extensions. Implement Multi-Stage Build.
  • Hardcoded ID: Make the Circuit Breaker ID (528) configurable via .env (SO_API_ASSOCIATE_ID).

Authentication (Legacy)

Authentication is handled via the AuthHandler class, which uses a refresh token flow to obtain access tokens. Ensure that the .env file in the project root is correctly configured with SO_CLIENT_ID, SO_CLIENT_SECRET, SO_REFRESH_TOKEN, SO_REDIRECT_URI, SO_ENVIRONMENT, and SO_CONTEXT_IDENTIFIER.

Key SuperOffice Entities and Data Model Observations

During the development of reporting functionalities, the following observations were made regarding the SuperOffice data model:

1. Sale Entity

  • Primary Source for Product Information: Contrary to initial expectations, product information is frequently stored as free-text within the Heading field of the Sale object (e.g., "2xOmnie CD-01 mit Nachlass"). This appears to be a common practice in the system, rather than utilizing structured product catalog items linked via quotes.
  • Contact Association: A significant number of Sale objects (Sale?$filter=Contact ne null) are not directly linked to a Contact object (Contact: null), making it challenging to attribute sales to specific customers programmatically. Our reporting scripts specifically filter for sales where a Contact is present.
  • No Direct Quote/QuoteLine Linkage: Attempts to retrieve Quote or QuoteLine objects directly via Sale/{saleId}/Quotes, Contact/{contactId}/Quotes, or Sale/{saleId}/Activities resulted in 500 Internal Server Errors or empty result sets. This indicates that direct, API-accessible linkages between Sales and structured QuoteLines are often absent or not exposed via these endpoints.

2. Product Information Storage (Hypothesis & Workaround)

  • Free-Text in Heading: The primary source for identifying products associated with a sale is the Heading field of the Sale entity itself. This field often contains product codes, descriptions, and other relevant details as free-text.
  • User-Defined Fields (UDFs): While UserDefinedFields were inspected for structured product data (e.g., RR-02-017-OMNIE), no such patterns were found in the sale_id=342243 example. This suggests that UDFs are either not consistently used for product codes or are named in a way that doesn't align with common product terminology.

Scripts

list_products.py

  • Purpose: Fetches and displays a list of all defined product families from the SuperOffice product catalog (/List/ProductFamily/Items).
  • Usage: python3 list_products.py

generate_customer_product_report.py

  • Purpose: Generates a CSV report of customer sales, extracting product information from the Sale.Heading field using keyword matching.
  • Methodology:
    1. Retrieves the latest SALE_LIMIT (e.g., 1000) Sale objects, filtering only those with an associated Contact ($filter=Contact ne null).
    2. Extracts SaleId, CustomerName, and SaleHeading for each relevant sale.
    3. Searches the SaleHeading for predefined PRODUCT_KEYWORDS (e.g., OMNIE, CD-01, Service).
    4. Outputs the results to product_report.csv.
  • Usage: python3 generate_customer_product_report.py

Future Work

  • Analyse der leeren product_report.csv: Untersuchen, warum die product_report.csv auch nach der Filterung nach Sale-Objekten mit Contact-Verknüpfung leer bleibt. Es ist entscheidend zu verstehen, ob es keine solchen Verkäufe gibt oder ob ein Problem mit der Datenabfrage oder -verarbeitung vorliegt.
  • Manuelle Inspektion gefilterter Sale-Objekte: Wenn der Report leer ist, müssen wir einige Sale-Objekte, die die Bedingung Contact ne null erfüllen, manuell inspizieren, um ihre Struktur zu verstehen und festzustellen, ob das Heading-Feld oder andere Felder Produktinformationen enthalten.
  • Verfeinerung der PRODUCT_KEYWORDS: Die Liste der Produkt-Schlüsselwörter muss möglicherweise erweitert werden, basierend auf einer detaillierteren manuellen Analyse der Verkaufsüberschriften.
  • Erforschung alternativer API-Pfade: Falls der aktuelle Ansatz weiterhin Schwierigkeiten bereitet, müssen wir tiefer in die SuperOffice-API eintauchen, um strukturierte Produktdaten zu finden, auch wenn sie nicht direkt mit den Verkäufen verknüpft sind.