Files

Floke 6662c9c585 [31588f42] docs: document SuperOffice Service/Ticket module and categories

2026-02-28 18:34:33 +00:00

22 KiB

Raw Blame History

Gemini Code Assistant Context

CRITICAL RULE: DOCUMENTATION PRESERVATION (DO NOT IGNORE)

ES IST STRENGSTENS UNTERSAGT, DOKUMENTATION ZU LÖSCHEN ODER DURCH PLATZHALTER WIE ... (rest of the file) ZU ERSETZEN.

Dies ist in der Vergangenheit mehrfach passiert und hat zu massivem Datenverlust in kritischen Dateien wie MIGRATION_PLAN.md geführt.

Regeln für den Agenten:

Niemals große Textblöcke löschen, es sei denn, der User fordert dies explizit an.
Immer git diff prüfen, bevor ein Commit erstellt wird. Wenn eine Dokumentationsdatei 100 Zeilen verliert, ist das fast immer ein Fehler.
Beim Aktualisieren von Dokumentation: Nur neue Informationen hinzufügen oder veraltete präzise korrigieren. Niemals den Rest der Datei überschreiben.
Wenn du eine Datei "restoren" musst, nutze git log -p <filename> und stelle sicher, dass du wirklich alles wiederherstellst.

Wichtige Hinweise

Projektdokumentation: Die primäre und umfassendste Dokumentation für dieses Projekt befindet sich in der Datei readme.md. Bitte ziehen Sie diese Datei für ein detailliertes Verständnis der Architektur und der einzelnen Module zu Rate.
Git-Repository: Dieses Projekt wird über ein Git-Repository verwaltet. Alle Änderungen am Code werden versioniert. Beachten Sie den Abschnitt "Git Workflow & Conventions" für unsere Arbeitsregeln.
- WICHTIG: Der AI-Agent kann Änderungen committen, aber aus Sicherheitsgründen oft nicht git push ausführen. Bitte führen Sie git push manuell aus, wenn der Agent dies meldet.

Git Workflow & Conventions

Den Arbeitstag abschließen mit `#fertig`

Um einen Arbeitsschritt oder einen Task abzuschließen, verwenden Sie den Befehl #fertig.

WICHTIG: Verwenden Sie nicht /fertig oder nur fertig. Nur der Befehl mit der Raute (#) wird korrekt erkannt.

Wenn Sie #fertig eingeben, führt der Agent folgende Schritte aus:

Analyse: Der Agent prüft, ob seit dem letzten Commit Änderungen am Code vorgenommen wurden.
Zusammenfassung: Er generiert eine automatische Arbeitszusammenfassung basierend auf den Code-Änderungen.
Status-Update: Der Agent führt das Skript python3 dev_session.py --report-status im Hintergrund aus.
- Die in der aktuellen Session investierte Zeit wird berechnet und in Notion gespeichert.
- Ein neuer Statusbericht mit der Zusammenfassung wird an den Notion-Task angehängt.
- Der Status des Tasks in Notion wird auf "Done" (oder einen anderen passenden Status) gesetzt.
Commit & Push: Wenn Code-Änderungen vorhanden sind, wird ein Commit erstellt und ein git push interaktiv angefragt.

Project Overview

This project is a Python-based system for automated company data enrichment and lead generation. It focuses on identifying B2B companies with high potential for robotics automation (Cleaning, Transport, Security, Service).

The system architecture has evolved from a CLI-based toolset to a modern web application (company-explorer) backed by Docker containers.

Current Status (Jan 15, 2026) - Company Explorer (Robotics Edition v0.5.0)

1. Contacts Management (v0.5)

Full CRUD: Integrated Contact Management system with direct editing capabilities.
Global List View: Dedicated view for all contacts across all companies with search and filter.
Data Model: Supports advanced fields like Academic Title, Role Interpretation (Decision Maker vs. User), and Marketing Automation Status.
Bulk Import: CSV-based bulk import for contacts that automatically creates missing companies and prevents duplicates via email matching.

2. UI/UX Modernization

Light/Dark Mode: Full theme support with toggle.
Grid Layout: Unified card-based layout for both Company and Contact lists.
Mobile Responsiveness: Optimized Inspector overlay and navigation for mobile devices.
Tabbed Inspector: Clean separation between Company Overview and Contact Management within the details pane.

3. Advanced Configuration (Settings)

Industry Verticals: Database-backed configuration for target industries (Description, Focus Flag, Primary Product).
Job Role Mapping: Configurable patterns (Regex/Text) to map job titles on business cards to internal roles (e.g., "CTO" -> "Innovation Driver").
Robotics Categories: Existing AI reasoning logic remains configurable via the UI.

4. Robotics Potential Analysis (v2.3)

Chain-of-Thought Logic: The AI analysis (ClassificationService) uses multi-step reasoning to evaluate physical infrastructure.
Provider vs. User: Strict differentiation logic implemented.

5. Web Scraping & Legal Data (v2.2)

Impressum Scraping: 2-Hop Strategy and Root Fallback logic.
Manual Overrides: Users can manually correct Wikipedia, Website, and Impressum URLs directly in the UI.

Lessons Learned & Best Practices

Numeric Extraction (German Locale):
- Problem: "1.005 Mitarbeiter" was extracted as "1" (treating dot as decimal).
- Solution: Implemented context-aware logic. If a number has a dot followed by exactly 3 digits (and no comma), it is treated as a thousands separator.
- Revenue: For revenue (is_revenue=True), dots are generally treated as decimals (e.g. "375.6 Mio") unless unambiguous multiple dots exist. Billion/Mrd is converted to 1000 Million.
The Wolfra/Greilmeier/Erding Fixes (Advanced Metric Parsing):
- Problem: Simple regex parsers fail on complex sentences with multiple numbers, concatenated years, or misleading prefixes.
- Solution (Hybrid Extraction & Regression Testing):
  1. LLM Guidance: The LLM provides an expected_value (e.g., "8.000 m²").
  2. Robust Python Parser (MetricParser): This parser aggressively cleans the expected_value (stripping units like "m²") to get a numerical target. It then intelligently searches the full text for this target, ignoring other numbers (like "2" in "An 2 Standorten").
  3. Specific Bug Fixes:
    - Year-Suffix: Logic to detect and remove trailing years from concatenated numbers (e.g., "802020" -> "80").
    - Year-Prefix: Logic to ignore year-like numbers (1900-2100) if other, more likely candidates exist in the text.
    - Sentence Truncation: Removed overly aggressive logic that cut off sentences after a hyphen, which caused metrics at the end of a phrase to be missed.
- Safeguard: These specific cases are now locked in via test_metric_parser.py to prevent future regressions.
LLM JSON Stability:
- Problem: LLMs often wrap JSON in Markdown blocks (```json), causing json.loads() to fail.
- Solution: ALWAYS use a clean_json_response helper that strips markers before parsing. Never trust raw LLM output.
LLM Structure Inconsistency:
- Problem: Even with json_mode=True, models sometimes wrap the result in a list [...] instead of a flat object {...}, breaking frontend property access.
- Solution: Implement a check: if isinstance(result, list): result = result[0].
Scraping Navigation:
- Problem: Searching for "Impressum" only on the scraped URL (which might be a subpage found via Google) often fails.
- Solution: Always implement a fallback to the Root Domain AND a 2-Hop check via the "Kontakt" page.
Frontend State Management:
- Problem: Users didn't see when a background job finished.
- Solution: Implementing a polling mechanism (setInterval) tied to a isProcessing state is superior to static timeouts for long-running AI tasks.
Hyper-Personalized Marketing Engine (v3.2) - "Deep Persona Injection":
- Problem: Marketing texts were too generic and didn't reflect the specific psychological or operative profile of the different target roles (e.g., CFO vs. Facility Manager).
- Solution (Deep Sync & Prompt Hardening):
  1. Extended Schema: Added description, convincing_arguments, and kpis to the Persona database model to store richer profile data.
  2. Notion Master Sync: Updated the synchronization logic to pull these deep insights directly from the Notion "Personas / Roles" database.
  3. Role-Centric Prompts: The MarketingMatrix generator was re-engineered to inject the persona's "Mindset" and "KPIs" into the prompt.
- Example (Healthcare):
  - Infrastructure Lead: Focuses now on "IT Security", "DSGVO Compliance", and "WLAN integration".
  - Economic Buyer (CFO): Focuses on "ROI Amortization", "Reduction of Overtime", and "Flexible Financing (RaaS)".
- Verification: Verified that the transition from a company-specific Opener (e.g., observing staff shortages at Klinikum Erding) to the Role-specific Intro (e.g., pitching transport robots to reduce walking distances for nursing directors) is seamless and logical.

Metric Parser - Regression Tests

To ensure the stability and accuracy of the metric extraction logic, a dedicated test suite (/company-explorer/backend/tests/test_metric_parser.py) has been created. It covers the following critical, real-world bug fixes:

test_wolfra_concatenated_year_bug:
- Problem: A number and year were concatenated (e.g., "802020").
- Test: Ensures the parser correctly identifies and strips the trailing year, extracting 80.
test_erding_year_prefix_bug:
- Problem: A year appeared before the actual metric in the sentence (e.g., "2022 ... 200.000 Besucher").
- Test: Verifies that the parser's "Smart Year Skip" logic ignores the year and correctly extracts 200000.
test_greilmeier_multiple_numbers_bug:
- Problem: The text contained multiple numbers ("An 2 Standorten ... 8.000 m²"), and the parser incorrectly picked the first one.
- Test: Confirms that when an expected_value (like "8.000 m²") is provided, the parser correctly cleans it and extracts the corresponding number (8000), ignoring other irrelevant numbers.

These tests are crucial for preventing regressions as the parser logic evolves.

Notion Maintenance & Data Sync

Since the "Golden Record" for Industry Verticals (Pains, Gains, Products) resides in Notion, specific tools are available to read and sync this data.

Location: /app/company-explorer/backend/scripts/notion_maintenance/

Prerequisites:

Ensure .env is loaded with NOTION_API_KEY and correct DB IDs.

Key Scripts:

check_relations.py (Reader - Deep):
- Purpose: Reads Verticals and resolves linked Product Categories (Relation IDs -> Names). Essential for verifying the "Primary/Secondary Product" logic.
- Usage: python3 check_relations.py
update_notion_full.py (Writer - Batch):
- Purpose: Batch updates Pains and Gains for multiple verticals. Use this as a template when refining the messaging strategy.
- Usage: Edit the dictionary in the script, then run python3 update_notion_full.py.
list_notion_structure.py (Schema Discovery):
- Purpose: Lists all property keys and page titles. Use this to debug schema changes (e.g. if a column was renamed).
  - Usage: python3 list_notion_structure.py
Next Steps (Updated Feb 27, 2026)
- Notion Content: Finalize "Pains" and "Gains" for all 25 verticals in the Notion master database.
- Intelligence: Run generate_matrix.py in the Company Explorer backend to populate the matrix for all new English vertical names.
- Automation: Register the production webhook (requires admin-webhooks rights) to enable real-time CRM sync without manual job injection.
- Execution: Connect the "Sending Engine" (the actual email dispatch logic) to the SuperOffice fields.
- Monitoring: Monitor the 'Atomic PATCH' logs in production for any 400 errors regarding field length or specific character sets.
Company Explorer Access & Debugging

The Company Explorer is the central intelligence engine.

Core Paths:
- Database: /app/companies_v3_fixed_2.db (SQLite)
- Backend Code: /app/company-explorer/backend/
- Logs: /app/logs_debug/company_explorer_debug.log
Accessing Data: To inspect live data without starting the full stack, use sqlite3 directly or the helper scripts (if environment permits).
- Direct SQL: sqlite3 /app/companies_v3_fixed_2.db "SELECT * FROM companies WHERE name LIKE '%Firma%';"
- Python (requires env): The app runs in a Docker container. When debugging from outside (CLI agent), Python dependencies like sqlalchemy might be missing in the global scope. Prefer sqlite3 for quick checks.
Key Endpoints (Internal API :8000):
- POST /api/provision/superoffice-contact: Triggers the text generation logic.
- GET /api/companies/{id}: Full company profile including enrichment data.
Troubleshooting:
- "BaseModel" Error: Usually a mix-up between Pydantic and SQLAlchemy Base. Check imports in database.py.
- Missing Dependencies: The CLI agent runs in /app but not necessarily inside the container's venv. Use standard tools (grep, sqlite3) where possible.

Critical Debugging Session (Feb 21, 2026) - Re-Stabilizing the Analysis Engine

A critical session was required to fix a series of cascading failures in the ClassificationService. The key takeaways are documented here to prevent future issues.

The "Phantom" NameError:
- Symptom: The application crashed with a NameError: name 'joinedload' is not defined, even though the import was correctly added to classification.py.
- Root Cause: The uvicorn server's hot-reload mechanism within the Docker container did not reliably pick up file changes made from outside the container. A simple docker-compose restart was insufficient to clear the process's cached state.
- Solution: After any significant code change, especially to imports or core logic, a forced-recreation of the container is mandatory.
```
# Correct Way to Apply Changes:
docker-compose up -d --build --force-recreate company-explorer
```
The "Invisible" Logs:
- Symptom: No debug logs were being written, making it impossible to trace the execution flow.
- Root Cause: The LOG_DIR path in /company-explorer/backend/config.py was misconfigured (/app/logs_debug) and did not point to the actual, historical log directory (/app/Log_from_docker).
- Solution: Configuration paths must be treated as absolute and verified. Correcting the LOG_DIR path immediately resolved the issue.
Inefficient Debugging Loop:
- Symptom: The cycle of triggering a background job via API, waiting, and then manually checking logs was slow and inefficient.
- Root Cause: Lack of a tool to test the core application logic in isolation.
- Solution: The creation of a dedicated, interactive test script (/company-explorer/backend/scripts/debug_single_company.py). This script allows running the entire analysis for a single company in the foreground, providing immediate and detailed feedback. This pattern is invaluable for complex, multi-step processes and should be a standard for future development.

Production Migration & Multi-Campaign Support (Feb 27, 2026)

The system has been fully migrated to the SuperOffice production environment (online3.superoffice.com, tenant Cust26720).

1. Final UDF Mappings (Production)

These ProgIDs are verified and active for the production tenant:

Field Purpose	Entity	ProgID	Notes
MA Subject	Person	`SuperOffice:19`
MA Intro	Person	`SuperOffice:20`
MA Social Proof	Person	`SuperOffice:21`
MA Unsubscribe	Person	`SuperOffice:22`	URL format
MA Campaign	Person	`SuperOffice:23`	List field (uses `:DisplayText`)
Vertical	Contact	`SuperOffice:83`	List field (mapped via JSON)
AI Summary	Contact	`SuperOffice:84`	Truncated to 132 chars
AI Last Update	Contact	`SuperOffice:85`	Format: `[D:MM/DD/YYYY HH:MM:SS]`
Opener Primary	Contact	`SuperOffice:86`
Opener Secondary	Contact	`SuperOffice:87`
Last Outreach	Contact	`SuperOffice:88`

2. Vertical ID Mapping (Production)

The full list of 25 verticals with their internal SuperOffice IDs (List udlist331): Automotive - Dealer: 1613, Corporate - Campus: 1614, Energy - Grid & Utilities: 1615, Energy - Solar/Wind: 1616, Healthcare - Care Home: 1617, Healthcare - Hospital: 1618, Hospitality - Gastronomy: 1619, Hospitality - Hotel: 1620, Industry - Manufacturing: 1621, Infrastructure - Communities: 1622, Infrastructure - Public: 1623, Infrastructure - Transport: 1624, Infrastructure - Parking: 1625, Leisure - Entertainment: 1626, Leisure - Fitness: 1627, Leisure - Indoor Active: 1628, Leisure - Outdoor Park: 1629, Leisure - Wet & Spa: 1630, Logistics - Warehouse: 1631, Others: 1632, Reinigungsdienstleister: 1633, Retail - Food: 1634, Retail - Non-Food: 1635, Retail - Shopping Center: 1636, Tech - Data Center: 1637.

3. Technical Lessons Learned (SO REST API)

Atomic PATCH (Stability): Bundling all contact updates into a single PATCH request to the /Contact/{id} endpoint is far more stable than sequential UDF updates. If one field fails (e.g. invalid property), the whole transaction might roll back or partially fail—proactive validation is key.
Website Sync (Urls Array): Updating the website via REST requires manipulating the Urls array property. Simple field assignment to UrlAddress fails during PATCH.
- Correct Format: "Urls": [{"Value": "https://example.com", "Description": "AI Discovered"}].
List Resolution (:DisplayText): To get the clean string value of a list field (like Campaign Name) without extra API calls, use the pseudo-field ProgID:DisplayText in the $select parameter.
Field Length Limits: Standard SuperOffice text UDFs are limited to approx. 140-254 characters. AI-generated summaries must be truncated (e.g. 132 chars) to avoid 400 Bad Request errors.
Docker env_file Importance: For production, mapping individual variables in docker-compose.yml is error-prone. Using env_file: .env ensures all services stay synchronized with the latest UDF IDs and mappings.
Production URL Schema: The production API is strictly hosted on online3.superoffice.com (for this tenant), while OAuth remains at online.superoffice.com.

4. Campaign Trigger Logic

The worker.py (v1.8) now extracts the campaign_tag from SuperOffice:23:DisplayText. This tag is passed to the Company Explorer's provisioning API. If a matching entry exists in the MarketingMatrix for that tag, specific texts are used; otherwise, it falls back to the "standard" Kaltakquise texts.

5. SuperOffice Authentication (Critical Update Feb 28, 2026)

Problem: Authentication failures ("Invalid refresh token" or "Invalid client_id") occurred because standard load_dotenv() did not override stale environment variables present in the shell process.

Solution: Always use load_dotenv(override=True) in Python scripts to force loading the actual values from the .env file.

Correct Authentication Pattern (Python):

from dotenv import load_dotenv
import os

# CRITICAL: override=True ensures we read from .env even if env vars are already set
load_dotenv(override=True)

client_id = os.getenv("SO_CLIENT_ID")
# ...

Known Working Config (Production):

Environment: online3
Tenant: Cust26720
Token Logic: The AuthHandler implementation in health_check_so.py is the reference standard. Avoid using legacy superoffice_client.py without verifying it uses override=True.

6. Sales & Opportunities (Roboplanet Specifics)

When creating sales via API, specific constraints apply due to the shared tenant with Wackler:

SaleTypeId: MUST be 14 (GE:"Roboplanet Verkauf";) to ensure the sale is assigned to the correct business unit.
- Alternative: ID 16 (GE:"Roboplanet Teststellung";) for trials.
Mandatory Fields:
- Saledate (Estimated Date): Must be provided in ISO format (e.g., YYYY-MM-DDTHH:MM:SSZ).
- Person: Highly recommended linking to a specific person, not just the company.
Context: Avoid creating sales on the parent company "Wackler Service Group" (ID 3). Always target the specific lead company.

7. Service & Tickets (Anfragen)

SuperOffice Tickets represent the support and request system. Like Sales, they are organized to allow separation between Roboplanet and Wackler.

Entity Name: ticket
Roboplanet Specific Categories (CategoryId):
- ID 46: GE:"Lead Roboplanet";
- ID 47: GE:"Vertriebspartner Roboplanet";
- ID 48: GE:"Weitergabe Roboplanet";
- Hierarchical: Roboplanet/Support (often used for technical issues).
Key Fields:
- ticketId: Internal ID.
- title: The subject of the request.
- contactId / personId: Links to company and contact person.
- ticketStatusId: 1 (Unbearbeitet), 2 (In Arbeit), 3 (Bearbeitet).
- ownedBy: Often "ROBO" for Roboplanet staff.
Cross-Links: Tickets can be linked to saleId (to track support during a sale) or projectId.

This is the core logic used to generate the company-specific opener.

22 KiB Raw Blame History