[2ff88f42] Full End-to-End integration: Webhooks, Auto-Enrichment, Notion-Sync, UI updates and new Connector Architecture

This commit is contained in:
2026-02-19 16:05:52 +00:00
parent 262add256f
commit 17346b3fcb
21 changed files with 1107 additions and 203 deletions

View File

@@ -1 +1 @@
{"task_id": "30b88f42-8544-80ea-88a7-e7c03e749c30", "token": "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", "session_start_time": "2026-02-18T14:32:47.873998"}
{"task_id": "2ff88f42-8544-8000-8314-c9013414d1d0", "token": "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", "session_start_time": "2026-02-19T08:32:53.260193"}

View File

@@ -7,40 +7,48 @@ Automatisierte Anreicherung von B2B-Firmendaten im CRM mit KI-generierten Insigh
---
## 1. Architektur: Der "SuperOffice Connector"
## 1. Architektur: "Event-Driven Messaging" (v2.0)
Wir implementieren einen dedizierten Microservice (`connector-superoffice`), der als Brücke fungiert.
Wir haben die Architektur von einem Polling-Modell auf ein **Event-gesteuertes Webhook-Modell** umgestellt. Dies entspricht modernen Best Practices und ist Voraussetzung für eine Skalierung auf 16.000+ Firmen.
### Das Prinzip: "Gehirn & Muskel"
* **Das Gehirn (Company Explorer):** Hier liegt die gesamte Intelligenz. Die Datenbank speichert die Firmendaten, die Signale und die **Marketing-Matrix** (Textbausteine). Er entscheidet, *welcher* Text für *welchen* Kontakt generiert wird.
* **Der Muskel (Connector):** Er ist ein "dummer" Bote. Er nimmt Events entgegen, fragt das Gehirn nach Anweisungen und führt diese in SuperOffice aus.
```mermaid
graph LR
subgraph CRM ["SuperOffice CRM (Cloud/On-Prem)"]
SO_DB[("Stammdaten<br/>(Contact)")]
SO_UDF[("KI-Felder<br/>(UDFs)")]
graph TD
subgraph "SuperOffice CRM"
SO_Contact["👤 Contact / Person"]
SO_Webhook["🚀 Webhook<br/>(contact.changed)"]
end
subgraph Middleware ["Integration Layer (Docker)"]
Connector["🔄 SuperOffice Connector<br/>(Python Service)"]
subgraph "Connector (Docker)"
WebhookApp["📥 Webhook Receiver<br/>(FastAPI :8002)"]
Queue[("📦 Job Queue<br/>(SQLite/Redis)")]
Worker["👷‍♂️ Worker Process"]
end
subgraph Intelligence ["GTM Engine"]
CE_API["⚡ Company Explorer API"]
CE_DB[("Enrichment DB")]
subgraph "Company Explorer (Docker)"
CE_API["🧠 Provisioning API<br/>(:8000)"]
MatrixDB[("📚 Marketing Matrix<br/>(SQLite)")]
end
%% Data Flow
SO_DB -->|"1. Pull (Delta/Initial)"| Connector
Connector -->|"2. Push (Import)"| CE_API
CE_API -.->|"3. Enrichment Process"| CE_DB
CE_DB -->|"4. Result"| CE_API
Connector -->|"5. Poll Results"| CE_API
Connector -->|"6. Write-Back (Update UDFs)"| SO_UDF
%% Styling
style CRM fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style Middleware fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style Intelligence fill:#fff3e0,stroke:#ef6c00,stroke-width:2px
SO_Contact -->|1. Änderung| SO_Webhook
SO_Webhook -->|2. POST Event| WebhookApp
WebhookApp -->|3. Enqueue Job| Queue
Worker -->|4. Fetch Job| Queue
Worker -->|5. Request: 'Provision Me'| CE_API
CE_API -->|6. Lookup Logic| MatrixDB
CE_API -->|7. Return: Final Texts| Worker
Worker -->|8. Write-Back UDFs| SO_Contact
```
**Technische Kommunikation:**
Da beide Services (`connector-superoffice` und `company-explorer`) im selben Docker-Netzwerk laufen, erfolgt die Kommunikation direkt und latenzfrei über den internen Docker-DNS (`http://company-explorer:8000`). Es gibt keinen Umweg über das öffentliche Internet.
## 2. Datenmodell & Erweiterung
## 2. Datenmodell & Erweiterung
Um die CRM-Daten sauber zu halten, schreiben wir **niemals** in Standardfelder (wie `Name`, `Department`), sondern ausschließlich in dedizierte, benutzerdefinierte Felder (**User Defined Fields - UDFs**).
@@ -61,6 +69,17 @@ Folgende Felder sollten am Objekt `Company` (bzw. `Contact` in SuperOffice-Termi
| :--- | :--- | :--- |
| `external_crm_id` | String/Int | Speichert die `ContactId` aus SuperOffice zur eindeutigen Zuordnung (Primary Key Mapping). |
### 2.2. Data Integrity: "The Double Truth"
Um die Datenqualität zu sichern, pflegen wir für Stammdaten (Name, Website) zwei Wahrheiten:
1. **CRM Truth:** Was aktuell in SuperOffice steht (oft manuell gepflegt, potenziell veraltet).
2. **Explorer Truth:** Was der Company Explorer im Web gefunden hat.
**Synchronisation:**
* Bei jedem Webhook-Event sendet der Connector die aktuellen CRM-Werte (`crm_name`, `crm_website`) an den Company Explorer.
* Der Company Explorer speichert diese und berechnet einen **`data_mismatch_score`** (0.0 = Match, 1.0 = Mismatch).
* **UI:** Im Frontend wird dieser Score visualisiert, sodass Abweichungen sofort erkennbar sind.
## 2.1. Mapping of CRM Concepts (SuperOffice vs. D365)
Um die Integration effizient zu gestalten, wurde eine strategische Entscheidung bezüglich der Abbildung von Kern-CRM-Konzepten getroffen:
@@ -88,15 +107,16 @@ Um die Integration effizient zu gestalten, wurde eine strategische Entscheidung
2. Für jeden Treffer sendet er ein Update an die SuperOffice API (`PUT /Contact/{id}`).
3. Es werden nur die oben definierten UDFs aktualisiert.
### Phase 3: Continuous Sync (Polling)
*Ziel: Neue Firmen automatisch verarbeiten.*
### Phase 3: Continuous Sync (Event-Driven)
*Ziel: Skalierbare, echtzeitnahe Verarbeitung.*
1. **Zyklus:** Alle 60 Minuten.
2. **Logik:** "Gib mir alle Firmen, die seit `LAST_RUN` erstellt oder geändert wurden."
3. **Aktion:** Neue Firmen werden automatisch zur Analyse geschickt.
4. **Vorteil:** Keine komplexe Firewall-Konfiguration für eingehende Webhooks nötig (Outbound-Only Traffic).
---
1. **Auslöser:** Ein `contact.changed` oder `person.changed` Event in SuperOffice (z.B. User setzt Status auf `Init`).
2. **Transport:** SuperOffice Webhook -> Connector `POST /webhook` -> Queue.
3. **Verarbeitung:** Der `Worker` holt den Job, ruft die **Provisioning API** des Company Explorer auf.
4. **Vorteil:**
* **Kein unnötiger Traffic:** Wir verarbeiten nur Kontakte, bei denen wirklich etwas passiert.
* **Echtzeit:** Änderungen sind sofort wirksam.
* **SO-Konformität:** Wir nutzen den offiziellen, effizienten Weg für Integrationen.
## 4. Sicherheit & Authentifizierung

View File

@@ -49,6 +49,14 @@ Um externen Diensten (wie der `lead-engine`) eine einfache und robuste Anbindung
| **`company_explorer_connector.py`** | **NEU:** Ein zentrales Python-Skript, das als "offizieller" Client-Wrapper für die API des Company Explorers dient. Es kapselt die Komplexität der asynchronen Enrichment-Prozesse. |
| **`handle_company_workflow()`** | Die Kernfunktion des Connectors. Sie implementiert den vollständigen "Find-or-Create-and-Enrich"-Workflow: <br> 1. **Prüfen:** Stellt fest, ob ein Unternehmen bereits existiert. <br> 2. **Erstellen:** Legt das Unternehmen an, falls es neu ist. <br> 3. **Anstoßen:** Startet den asynchronen `discover`-Prozess. <br> 4. **Warten (Polling):** Überwacht den Status des Unternehmens, bis eine Website gefunden wurde. <br> 5. **Analysieren:** Startet den asynchronen `analyze`-Prozess. <br> **Vorteil:** Bietet dem aufrufenden Dienst eine einfache, quasi-synchrone Schnittstelle und stellt sicher, dass die Prozessschritte in der korrekten Reihenfolge ausgeführt werden. |
### D. Provisioning API (Internal)
Für die nahtlose Integration mit dem SuperOffice Connector wurde ein dedizierter Endpunkt geschaffen:
| Endpunkt | Methode | Zweck |
| :--- | :--- | :--- |
| `/api/provision/superoffice-contact` | POST | Liefert "Enrichment-Pakete" (Texte, Status) für einen gegebenen CRM-Kontakt. Greift auf `MarketingMatrix` zu. |
## 3. Umgang mit Shared Code (`helpers.py` & Co.)
Wir kapseln das neue Projekt vollständig ab ("Fork & Clean").
@@ -105,6 +113,15 @@ Wir kapseln das neue Projekt vollständig ab ("Fork & Clean").
* `pattern` (String - Regex für Jobtitles)
* `role` (String - Zielrolle)
### Tabelle `marketing_matrix` (NEU v2.1)
* **Zweck:** Speichert statische, genehmigte Marketing-Texte (Notion Sync).
* `id` (PK)
* `industry_id` (FK -> industries.id)
* `role_id` (FK -> job_role_mappings.id)
* `subject` (Text)
* `intro` (Text)
* `social_proof` (Text)
## 7. Historie & Fixes (Jan 2026)
* **[CRITICAL] v0.7.4: Service Restoration & Logic Fix (Jan 24, 2026)**

View File

@@ -32,7 +32,7 @@ setup_logging()
import logging
logger = logging.getLogger(__name__)
from .database import init_db, get_db, Company, Signal, EnrichmentData, RoboticsCategory, Contact, Industry, JobRoleMapping, ReportedMistake
from .database import init_db, get_db, Company, Signal, EnrichmentData, RoboticsCategory, Contact, Industry, JobRoleMapping, ReportedMistake, MarketingMatrix
from .services.deduplication import Deduplicator
from .services.discovery import DiscoveryService
from .services.scraping import ScraperService
@@ -42,8 +42,7 @@ from .services.classification import ClassificationService
app = FastAPI(
title=settings.APP_NAME,
version=settings.VERSION,
description="Backend for Company Explorer (Robotics Edition)",
root_path="/ce"
description="Backend for Company Explorer (Robotics Edition)"
)
app.add_middleware(
@@ -65,6 +64,7 @@ class CompanyCreate(BaseModel):
city: Optional[str] = None
country: str = "DE"
website: Optional[str] = None
crm_id: Optional[str] = None
class BulkImportRequest(BaseModel):
names: List[str]
@@ -84,6 +84,20 @@ class ReportMistakeRequest(BaseModel):
quote: Optional[str] = None
user_comment: Optional[str] = None
class ProvisioningRequest(BaseModel):
so_contact_id: int
so_person_id: Optional[int] = None
crm_name: Optional[str] = None
crm_website: Optional[str] = None
class ProvisioningResponse(BaseModel):
status: str
company_name: str
website: Optional[str] = None
vertical_name: Optional[str] = None
role_name: Optional[str] = None
texts: Dict[str, Optional[str]] = {}
# --- Events ---
@app.on_event("startup")
def on_startup():
@@ -100,6 +114,141 @@ def on_startup():
def health_check(username: str = Depends(authenticate_user)):
return {"status": "ok", "version": settings.VERSION, "db": settings.DATABASE_URL}
@app.post("/api/provision/superoffice-contact", response_model=ProvisioningResponse)
def provision_superoffice_contact(
req: ProvisioningRequest,
background_tasks: BackgroundTasks,
db: Session = Depends(get_db),
username: str = Depends(authenticate_user)
):
# 1. Find Company (via SO ID)
company = db.query(Company).filter(Company.crm_id == str(req.so_contact_id)).first()
if not company:
# AUTO-CREATE Logic
if not req.crm_name:
# Cannot create without name. Should ideally not happen if Connector does its job.
raise HTTPException(400, "Cannot create company: crm_name missing")
company = Company(
name=req.crm_name,
crm_id=str(req.so_contact_id),
crm_name=req.crm_name,
crm_website=req.crm_website,
status="NEW"
)
db.add(company)
db.commit()
db.refresh(company)
logger.info(f"Auto-created company {company.name} from SuperOffice request.")
# Trigger Discovery
background_tasks.add_task(run_discovery_task, company.id)
return ProvisioningResponse(
status="processing",
company_name=company.name
)
# 1b. Check Status & Progress
# If NEW or DISCOVERED, we are not ready to provide texts.
if company.status in ["NEW", "DISCOVERED"]:
# If we have a website, ensure analysis is triggered
if company.status == "DISCOVERED" or (company.website and company.website != "k.A."):
background_tasks.add_task(run_analysis_task, company.id)
elif company.status == "NEW":
# Ensure discovery runs
background_tasks.add_task(run_discovery_task, company.id)
return ProvisioningResponse(
status="processing",
company_name=company.name
)
# 1c. Update CRM Snapshot Data (The Double Truth)
changed = False
if req.crm_name:
company.crm_name = req.crm_name
changed = True
if req.crm_website:
company.crm_website = req.crm_website
changed = True
# Simple Mismatch Check
if company.website and company.crm_website:
def norm(u): return str(u).lower().replace("https://", "").replace("http://", "").replace("www.", "").strip("/")
if norm(company.website) != norm(company.crm_website):
company.data_mismatch_score = 0.8 # High mismatch
changed = True
else:
if company.data_mismatch_score != 0.0:
company.data_mismatch_score = 0.0
changed = True
if changed:
company.updated_at = datetime.utcnow()
db.commit()
# 2. Find Contact (Person)
if req.so_person_id is None:
# Just a company sync, no texts needed
return ProvisioningResponse(
status="success",
company_name=company.name,
website=company.website,
vertical_name=company.industry_ai
)
person = db.query(Contact).filter(Contact.so_person_id == req.so_person_id).first()
# 3. Determine Role
role_name = None
if person and person.role:
role_name = person.role
elif req.job_title:
# Simple classification fallback
mappings = db.query(JobRoleMapping).all()
for m in mappings:
# Check pattern type (Regex vs Simple) - simplified here
pattern_clean = m.pattern.replace("%", "").lower()
if pattern_clean in req.job_title.lower():
role_name = m.role
break
# 4. Determine Vertical (Industry)
vertical_name = company.industry_ai
# 5. Fetch Texts from Matrix
texts = {"subject": None, "intro": None, "social_proof": None}
if vertical_name and role_name:
industry_obj = db.query(Industry).filter(Industry.name == vertical_name).first()
if industry_obj:
# Find any mapping for this role to query the Matrix
# (Assuming Matrix is linked to *one* canonical mapping for this role string)
role_ids = [m.id for m in db.query(JobRoleMapping).filter(JobRoleMapping.role == role_name).all()]
if role_ids:
matrix_entry = db.query(MarketingMatrix).filter(
MarketingMatrix.industry_id == industry_obj.id,
MarketingMatrix.role_id.in_(role_ids)
).first()
if matrix_entry:
texts["subject"] = matrix_entry.subject
texts["intro"] = matrix_entry.intro
texts["social_proof"] = matrix_entry.social_proof
return ProvisioningResponse(
status="success",
company_name=company.name,
website=company.website,
vertical_name=vertical_name,
role_name=role_name,
texts=texts
)
@app.get("/api/companies")
def list_companies(
skip: int = 0,
@@ -234,6 +383,7 @@ def create_company(company: CompanyCreate, db: Session = Depends(get_db), userna
city=company.city,
country=company.country,
website=company.website,
crm_id=company.crm_id,
status="NEW"
)
db.add(new_company)
@@ -665,10 +815,23 @@ def run_analysis_task(company_id: int):
# --- Serve Frontend ---
static_path = "/frontend_static"
if not os.path.exists(static_path):
static_path = os.path.join(os.path.dirname(__file__), "../static")
# Local dev fallback
static_path = os.path.join(os.path.dirname(__file__), "../../frontend/dist")
if not os.path.exists(static_path):
static_path = os.path.join(os.path.dirname(__file__), "../static")
logger.info(f"Static files path: {static_path} (Exists: {os.path.exists(static_path)})")
if os.path.exists(static_path):
@app.get("/")
async def serve_index():
return FileResponse(os.path.join(static_path, "index.html"))
app.mount("/", StaticFiles(directory=static_path, html=True), name="static")
else:
@app.get("/")
def root_no_frontend():
return {"message": "Company Explorer API is running, but frontend was not found.", "path_tried": static_path}
if __name__ == "__main__":
import uvicorn

View File

@@ -93,6 +93,10 @@ class Contact(Base):
job_title = Column(String) # Visitenkarten-Titel
language = Column(String, default="De") # "De", "En"
# SuperOffice Mapping
so_contact_id = Column(Integer, nullable=True, index=True) # SuperOffice Contact ID (Company)
so_person_id = Column(Integer, nullable=True, unique=True, index=True) # SuperOffice Person ID
role = Column(String) # Operativer Entscheider, etc.
status = Column(String, default="") # Marketing Status
@@ -248,6 +252,30 @@ class ReportedMistake(Base):
company = relationship("Company", back_populates="reported_mistakes")
class MarketingMatrix(Base):
"""
Stores the static marketing texts for Industry x Role combinations.
Source: Notion (synced).
"""
__tablename__ = "marketing_matrix"
id = Column(Integer, primary_key=True, index=True)
# The combination keys
industry_id = Column(Integer, ForeignKey("industries.id"), nullable=False)
role_id = Column(Integer, ForeignKey("job_role_mappings.id"), nullable=False)
# The Content
subject = Column(Text, nullable=True)
intro = Column(Text, nullable=True)
social_proof = Column(Text, nullable=True)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
industry = relationship("Industry")
role = relationship("JobRoleMapping")
# ==============================================================================
# UTILS
# ==============================================================================

View File

@@ -72,6 +72,10 @@ def extract_select(prop):
if not prop or "select" not in prop or not prop["select"]: return ""
return prop["select"]["name"]
def extract_number(prop):
if not prop or "number" not in prop: return None
return prop["number"]
def sync():
logger.info("--- Starting Enhanced Sync ---")
@@ -131,6 +135,16 @@ def sync():
ind.pains = extract_rich_text(props.get("Pains"))
ind.gains = extract_rich_text(props.get("Gains"))
# Metrics & Scraper Config (NEW)
ind.metric_type = extract_select(props.get("Metric Type"))
ind.min_requirement = extract_number(props.get("Min. Requirement"))
ind.whale_threshold = extract_number(props.get("Whale Threshold"))
ind.proxy_factor = extract_number(props.get("Proxy Factor"))
ind.scraper_search_term = extract_rich_text(props.get("Scraper Search Term"))
ind.scraper_keywords = extract_rich_text(props.get("Scraper Keywords"))
ind.standardization_logic = extract_rich_text(props.get("Standardization Logic"))
# Status / Priority
prio = extract_select(props.get("Priorität"))
if not prio: prio = extract_select(props.get("Freigegeben"))

View File

@@ -194,9 +194,6 @@ export function RoboticsSettings({ isOpen, onClose, apiBase }: RoboticsSettingsP
</span>
)}
</div>
<div className="flex flex-wrap gap-2 mt-1">
{ind.status_notion && <span className="text-[10px] border border-slate-300 dark:border-slate-700 px-1.5 rounded text-slate-500">{ind.status_notion}</span>}
</div>
</div>
<div className="text-right">
<div className="flex items-center gap-1.5 justify-end">
@@ -236,7 +233,16 @@ export function RoboticsSettings({ isOpen, onClose, apiBase }: RoboticsSettingsP
<div><span className="block text-slate-400 font-bold uppercase">Whale &gt;</span><span className="text-slate-700 dark:text-slate-200">{ind.whale_threshold || "-"}</span></div>
<div><span className="block text-slate-400 font-bold uppercase">Min Req</span><span className="text-slate-700 dark:text-slate-200">{ind.min_requirement || "-"}</span></div>
<div><span className="block text-slate-400 font-bold uppercase">Unit</span><span className="text-slate-700 dark:text-slate-200 truncate">{ind.scraper_search_term || "-"}</span></div>
<div><span className="block text-slate-400 font-bold uppercase">Product</span><span className="text-slate-700 dark:text-slate-200 truncate">{roboticsCategories.find(c => c.id === ind.primary_category_id)?.name || "-"}</span></div>
<div>
<span className="block text-slate-400 font-bold uppercase">Product</span>
<span className="text-slate-700 dark:text-slate-200 truncate">{roboticsCategories.find(c => c.id === ind.primary_category_id)?.name || "-"}</span>
{ind.secondary_category_id && (
<div className="mt-1 pt-1 border-t border-slate-100 dark:border-slate-800">
<span className="block text-orange-400 font-bold uppercase text-[9px]">Sec. Prod</span>
<span className="text-slate-700 dark:text-slate-200 truncate">{roboticsCategories.find(c => c.id === ind.secondary_category_id)?.name || "-"}</span>
</div>
)}
</div>
</div>
{ind.scraper_keywords && <div className="text-[10px]"><span className="text-slate-400 font-bold uppercase mr-2">Keywords:</span><span className="text-slate-600 dark:text-slate-400 font-mono">{ind.scraper_keywords}</span></div>}
{ind.standardization_logic && <div className="text-[10px]"><span className="text-slate-400 font-bold uppercase mr-2">Standardization:</span><span className="text-slate-600 dark:text-slate-400 font-mono">{ind.standardization_logic}</span></div>}

View File

@@ -0,0 +1,11 @@
import sys
import os
# Add backend to path so imports work
sys.path.append(os.path.join(os.getcwd(), "company-explorer"))
from backend.database import init_db
print("Initializing Database Schema...")
init_db()
print("Done.")

View File

@@ -0,0 +1,39 @@
import sqlite3
import os
DB_PATH = "/app/companies_v3_fixed_2.db"
# If running outside container, adjust path
if not os.path.exists(DB_PATH):
DB_PATH = "companies_v3_fixed_2.db"
def upgrade():
print(f"Upgrading database at {DB_PATH}...")
conn = sqlite3.connect(DB_PATH)
cursor = conn.cursor()
# 2. Add Columns to Contact
try:
cursor.execute("ALTER TABLE contacts ADD COLUMN so_contact_id INTEGER")
print("✅ Added column: so_contact_id")
except sqlite3.OperationalError as e:
if "duplicate column" in str(e):
print(" Column so_contact_id already exists")
else:
print(f"❌ Error adding so_contact_id: {e}")
try:
cursor.execute("ALTER TABLE contacts ADD COLUMN so_person_id INTEGER")
print("✅ Added column: so_person_id")
except sqlite3.OperationalError as e:
if "duplicate column" in str(e):
print(" Column so_person_id already exists")
else:
print(f"❌ Error adding so_person_id: {e}")
conn.commit()
conn.close()
print("Upgrade complete.")
if __name__ == "__main__":
upgrade()

View File

@@ -0,0 +1,25 @@
FROM python:3.11-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy source code
COPY . .
# Expose port for Webhook
EXPOSE 8000
# Make sure scripts are executable
RUN chmod +x start.sh
# Start both worker and webhook
CMD ["./start.sh"]

View File

@@ -1,178 +1,90 @@
# SuperOffice Connector ("The Butler") - GTM Engine
# SuperOffice Connector ("The Muscle") - GTM Engine
Dies ist der Microservice zur Anbindung von **SuperOffice CRM** an die **GTM-Engine**.
Der Connector agiert als "Butler": Er bereitet im Hintergrund hyper-personalisierte Marketing-Texte vor und legt sie im CRM ab, damit der Vertrieb sie mit minimalem Aufwand versenden kann.
Dies ist der "dumme" Microservice zur Anbindung von **SuperOffice CRM** an die **Company Explorer Intelligence**.
Der Connector agiert als reiner Bote ("Muscle"): Er nimmt Webhook-Events entgegen, fragt das "Gehirn" (Company Explorer) nach Instruktionen und führt diese im CRM aus.
## 1. Architektur: Das "Butler-Modell"
## 1. Architektur: "The Intelligent Hub & The Loyal Messenger"
Wir haben uns gegen eine komplexe Automatisierung *innerhalb* von SuperOffice (via CRMScript-Trigger) entschieden, da die DEV-Umgebung limitiert ist und die Logik extern flexibler ist.
Wir haben uns für eine **Event-gesteuerte Architektur** entschieden, um Skalierbarkeit und Echtzeit-Verarbeitung zu gewährleisten.
**Der Datenfluss:**
1. **Master-Daten (Notion):** Pains & Gains werden pro Vertical in Notion gepflegt.
2. **Text-Matrix (Lokal):** Das Skript `build_matrix.py` "multipliziert" die Vertical-Daten mit den Rollen-Daten, generiert via Gemini die finalen Texte und speichert sie in einer lokalen SQLite DB (`marketing_matrix.db`).
3. **Polling & Injection (Daemon):** Das Skript `polling_daemon_final.py` läuft periodisch, prüft auf Änderungen an Kontakten in SuperOffice und "injiziert" die passenden Texte aus der lokalen Matrix-DB in die UDF-Felder.
1. **Auslöser:** User ändert in SuperOffice einen Kontakt (z.B. Status -> `Init`).
2. **Transport:** SuperOffice sendet ein `POST` Event an unseren Webhook-Endpunkt (`:8003/webhook`).
3. **Queueing:** Der `Webhook Receiver` validiert das Event und legt es sofort in eine lokale `SQLite`-Queue (`connector_queue.db`).
4. **Verarbeitung:** Ein separater `Worker`-Prozess holt den Job ab.
5. **Provisioning:** Der Worker fragt den **Company Explorer** (`POST /api/provision/superoffice-contact`): "Was soll ich mit Person ID 123 tun?".
6. **Write-Back:** Der Company Explorer liefert das fertige Text-Paket (Subject, Intro, Proof) zurück. Der Worker schreibt dies via REST API in die UDF-Felder von SuperOffice.
```mermaid
graph TD
subgraph "A: Strategie & Content"
NotionDB["🏢 Notion DB<br/>(Industries & Roles)"]
end
## 2. Kern-Komponenten
subgraph "B: Lokale Engine (Python)"
MatrixBuilder["⚙️ build_matrix.py"]
GeminiAPI["🧠 Gemini API"]
MatrixDB[("💾 marketing_matrix.db")]
PollingDaemon["🤖 polling_daemon_final.py"]
end
* **`webhook_app.py` (FastAPI):**
* Lauscht auf Port `8000` (Extern: `8003`).
* Nimmt Events entgegen, prüft Token (`WEBHOOK_SECRET`).
* Schreibt Jobs in die Queue.
* Endpunkt: `POST /webhook`.
subgraph "C: CRM-System"
SuperOfficeAPI["☁️ SuperOffice API"]
Contact["👤 Contact / Person<br/>(mit UDFs)"]
end
* **`queue_manager.py` (SQLite):**
* Verwaltet die lokale Job-Queue.
* Status: `PENDING` -> `PROCESSING` -> `COMPLETED` / `FAILED`.
* Persistiert Jobs auch bei Container-Neustart.
NotionDB -->|1. Liest Pains/Gains| MatrixBuilder
MatrixBuilder -->|2. Promptet| GeminiAPI
GeminiAPI -->|3. Generiert Texte| MatrixBuilder
MatrixBuilder -->|4. Speichert in DB| MatrixDB
PollingDaemon -->|5. Prüft Änderungen| SuperOfficeAPI
PollingDaemon -->|6. Holt passende Texte| MatrixDB
PollingDaemon -->|7. Schreibt in UDFs| SuperOfficeAPI
SuperOfficeAPI -->|8. Aktualisiert| Contact
```
## 2. Kern-Komponenten (Skripte)
* **`worker.py`:**
* Läuft als Hintergrundprozess.
* Pollt die Queue alle 5 Sekunden.
* Kommuniziert mit Company Explorer (Intern: `http://company-explorer:8000`) und SuperOffice API.
* Behandelt Fehler und Retries.
* **`superoffice_client.py`:**
* Das Herzstück. Eine Python-Klasse, die die gesamte Kommunikation mit der SuperOffice API kapselt (Auth, GET, PUT, Search).
* Beherrscht Paginierung, um auch große Datensätze zu verarbeiten.
* **`build_matrix.py`:**
* Der "Content Generator".
* Liest Pains/Gains aus Notion.
* Iteriert über definierte Verticals und Rollen.
* Nutzt Gemini, um die spezifischen Textbausteine (`Subject`, `Intro`, `SocialProof`) zu erstellen.
* Speichert das Ergebnis in der `marketing_matrix.db`.
* **`polling_daemon_final.py`:**
* Der Automatisierungs-Motor.
* Läuft in einer Schleife (z.B. alle 15 Minuten) nur zu Geschäftszeiten (Mo-Fr, 7-18 Uhr).
* Fragt SuperOffice nach kürzlich geänderten Kontakten.
* Nutzt ein **Hash-System**, um zu prüfen, ob sich eine *relevante* Eigenschaft (Vertical oder Rolle) geändert hat, um unnötige API-Schreibvorgänge zu vermeiden.
* Wenn eine Änderung erkannt wird, stößt er den Update-Prozess an.
* **`normalize_persona.py`:**
* Ein Hilfs-Skript (Classifier), um Jobtitel auf unsere 4 Archetypen (Operativ, Infrastruktur, Wirtschaftlich, Innovation) zu mappen.
* Kapselt die SuperOffice REST API (Auth, GET, PUT).
* Verwaltet Refresh Tokens.
## 3. Setup & Konfiguration
### Installation
```bash
# In einer Python 3.11+ Umgebung:
pip install -r requirements.txt
```
### Docker Service
Der Service läuft im Container `connector-superoffice`.
Startet via `start.sh` sowohl den Webserver als auch den Worker.
### Konfiguration (`.env`)
Der Connector benötigt eine `.env` Datei im Root-Verzeichnis mit folgendem Inhalt:
```ini
# Gemini API Keys (DEV für Tests, PROD für Live-Tools)
GEMINI_API_KEY_DEV="AIza..."
GEMINI_API_KEY_PROD="AIza..."
GEMINI_API_KEY=${GEMINI_API_KEY_DEV} # Setzt den Default
Der Connector benötigt folgende Variablen (in `docker-compose.yml` gesetzt):
# Notion
NOTION_API_KEY="secret_..."
# SuperOffice Client Configuration
SO_CLIENT_ID="<Client ID / App ID>"
SO_CLIENT_SECRET="<Client Secret>"
SO_CONTEXT_IDENTIFIER="CustXXXXX"
SO_REFRESH_TOKEN="<Refresh Token>"
SO_ENVIRONMENT="sod" # oder "online" für Produktion
```yaml
environment:
API_USER: "admin"
API_PASSWORD: "..."
COMPANY_EXPLORER_URL: "http://company-explorer:8000" # Interne Docker-Adresse
WEBHOOK_SECRET: "changeme" # Muss mit SO-Webhook Config übereinstimmen
# Plus die SuperOffice Credentials (Client ID, Secret, Refresh Token)
```
## 4. SuperOffice Konfiguration (Admin)
## 4. API-Schnittstelle (Intern)
Folgende **Benutzerdefinierte Felder (UDFs)** müssen angelegt sein. Die ProgIds sind umgebungs-spezifisch und müssen pro System (DEV/PROD) geprüft werden.
Der Connector ruft den Company Explorer auf und liefert dabei **Live-Daten** aus dem CRM für den "Double Truth" Abgleich:
| Objekt | Feldname (Label) | Typ | ProgId (DEV) | ProgId (PROD) |
| :--- | :--- | :--- | :--- | :--- |
| Firma | Contact Vertical | Liste | `SuperOffice:5` | *tbd* |
| Firma | AI Challenge Satz | Text | `SuperOffice:6` | *tbd* |
| Person | Person Role | Liste | `SuperOffice:3` | *tbd* |
| Person | Marketing Subject | Text | `SuperOffice:5` | *tbd* |
| Person | Marketing Intro | Text | `SuperOffice:6` | *tbd* |
| Person | Marketing Proof | Text | `SuperOffice:7` | *tbd* |
| Person | AI Copy Hash | Text | `SuperOffice:8` | *tbd* |
**Request:** `POST /api/provision/superoffice-contact`
```json
{
"so_contact_id": 12345,
"so_person_id": 67890,
"crm_name": "RoboPlanet GmbH",
"crm_website": "www.roboplanet.de",
"job_title": "Geschäftsführer"
}
```
### B. Listen-Werte & IDs (DEV Environment)
**Response:**
```json
{
"status": "success",
"texts": {
"subject": "Optimierung Ihrer Logistik...",
"intro": "Als Logistikleiter kennen Sie...",
"social_proof": "Wir helfen bereits Firma X..."
}
}
```
Folgende IDs werden in den Skripten als Referenz genutzt. Diese müssen für PROD neu validiert werden.
## 5. Offene To-Dos (Roadmap)
**MA Status (`x_ma_status`):**
* `Init`
* `Ready_to_Craft`
* `Ready_to_Send`
* `Sent_Week1`
* `...`
**Rollen (`x_person_role` / `SuperOffice:3`):**
| ID | Name |
| :--- | :--- |
| `19` | Operativer Entscheider |
| `20` | Infrastruktur-Verantwortlicher |
| `21` | Wirtschaftlicher Entscheider |
| `22` | Innovations-Treiber |
**Verticals (`x_contact_vertical` / `SuperOffice:5`):**
| ID | Name |
| :--- | :--- |
| `23` | Logistics - Warehouse |
| `24` | Healthcare - Hospital |
| `25` | Infrastructure - Transport |
| `26` | Leisure - Indoor Active |
| `...` | *tbd* |
## 5. "Gotchas" & Lessons Learned (Update Feb 16, 2026)
* **API-URL:** Der `sod` Tenant `Cust55774` ist nur über `https://app-sod.superoffice.com` erreichbar.
* **Listen-IDs:** Die API gibt IDs von Listenfeldern im Format `[I:26]` zurück. Der String muss vor der DB-Abfrage auf den Integer `26` geparst werden.
* **Write-Back (Stammfelder):**
* **UrlAddress & Phones:** Das einfache Root-Feld `UrlAddress` ist beim `PUT` oft schreibgeschützt. Um die Website oder Telefonnummern zu setzen, muss die entsprechende Liste (`Urls` oder `Phones`) als Array von Objekten gesendet werden (z.B. `{"Value": "...", "Description": "..."}`).
* **Mandatory Fields:** Beim Update eines `Contact` Objekts müssen Pflichtfelder wie `Name` und `Number2` (oder `Number1`) zwingend im Payload enthalten sein, sonst schlägt die Validierung serverseitig fehl.
* **Full Object PUT:** SuperOffice REST überschreibt das gesamte Objekt. Felder, die im `PUT`-Payload fehlen, werden im CRM geleert. Es empfiehlt sich, das Objekt erst per `GET` zu laden, die Änderungen vorzunehmen und dann das gesamte Objekt zurückzusenden.
* **Dev-System Limits:** Die Textfelder im DEV-System sind auf 40 Zeichen limitiert.
* **Y-Tabellen & Trigger:** Direkter Zugriff auf Zusatz-Tabellen und CRMScript-Trigger sind im SOD-DEV Mandanten blockiert.
## 6. Umbau-Plan für Skalierbarkeit (19.02.2026)
Die aktuelle Implementierung ist transaktionsbasiert und nicht für die Massenverarbeitung ausgelegt. Der folgende Plan beschreibt den notwendigen Umbau, um die Schnittstelle performant und robust für den Batch-Betrieb zu machen.
### 6.1. Kernproblem: Transaktion vs. Batch
* **IST-Zustand:** Ein geänderter Kontakt löst eine Kette von individuellen API-Calls aus (Lesen, Hashen, Schreiben).
* **SOLL-Zustand:** Ein periodischer Job sammelt alle Änderungen und verarbeitet sie in einem einzigen, optimierten Durchlauf.
### 6.2. Maßnahmen & Technische Umsetzung
1. **Batch-fähige Client-Methoden in `superoffice_client.py`:**
* Die SuperOffice API unterstützt Batch-Operationen via `/$batch`-Endpunkt.
* **Aufgabe:** Implementierung einer neuen Methode `execute_batch(requests)` im Client. Diese Methode nimmt eine Liste von einzelnen API-Anfragen (z.B. mehrere `PUT`-Operationen) entgegen, bündelt sie in einen einzelnen `POST`-Request an den `/$batch`-Endpunkt und verarbeitet die gesammelte Antwort.
2. **Einführung eines "Change-Tracking"-Mechanismus:**
* **Problem:** Wir müssen effizient erkennen, welche Kontakte seit dem letzten Lauf aktualisiert werden müssen.
* **Aufgabe:** Wir erweitern unsere lokale Company-Explorer-Datenbank um ein Feld `needs_sync (boolean)` oder `last_so_sync (datetime)`. Wenn eine Änderung im CE stattfindet, wird der Flag gesetzt.
* Der `polling_daemon` wird umgebaut: Statt einzelne Kontakte zu vergleichen, sammelt er alle Kontakte, bei denen `needs_sync` auf `true` steht.
3. **Neuer Workflow im `polling_daemon_final.py`:**
* **Phase 1: Sammeln:** Der Daemon fragt die lokale DB ab: `SELECT * FROM contacts WHERE needs_sync = true`.
* **Phase 2: Batch-Request bauen:** Für jeden Kontakt in der Liste wird der notwendige `PUT`-Request für die SuperOffice API vorbereitet (aber noch nicht gesendet).
* **Phase 3: Ausführen:** Alle vorbereiteten Requests werden an die neue `execute_batch()`-Methode im Client übergeben.
* **Phase 4: Ergebnisverarbeitung & Fehler-Handling:** Die Batch-Antwort von SuperOffice wird analysiert.
* Für jeden erfolgreich verarbeiteten Kontakt wird `needs_sync` in der lokalen DB auf `false` gesetzt.
* Fehlgeschlagene Updates werden detailliert geloggt (inkl. Kontakt-ID und Fehlermeldung), ohne den gesamten Batch abzubrechen.
4. **Konfliktlösungs-Strategie definieren:**
* **Regel:** "SuperOffice hat Priorität."
* **Umsetzung:** Der Polling-Daemon wird zusätzlich regelmäßig (z.B. alle 6 Stunden) einen Hash-Abgleich für eine größere Menge an Kontakten durchführen. Stellt er eine Diskrepanz fest (weil ein Kontakt manuell in SO geändert wurde), wird die Änderung aus SuperOffice in den Company Explorer übernommen und der `needs_sync`-Flag für diesen Kontakt *nicht* gesetzt. Dies verhindert, dass manuelle Änderungen überschrieben werden.
Dieser Umbau ist eine Voraussetzung für den produktiven Einsatz und stellt sicher, dass die Schnittstelle auch bei tausenden von Kontakten stabil und performant bleibt.
* [ ] **UDF-Mapping:** Aktuell sind die `ProgId`s (z.B. `SuperOffice:5`) im Code (`worker.py`) hartkodiert. Dies muss in eine Config ausgelagert werden.
* [ ] **Fehlerbehandlung:** Was passiert, wenn der Company Explorer "404 Not Found" meldet? (Aktuell: Log Warning & Skip).
* [ ] **Redis:** Bei sehr hoher Last (>100 Events/Sekunde) sollte die SQLite-Queue durch Redis ersetzt werden.

View File

@@ -0,0 +1,106 @@
import sqlite3
import json
from datetime import datetime, timedelta
import os
DB_PATH = os.getenv("DB_PATH", "connector_queue.db")
class JobQueue:
def __init__(self):
self._init_db()
def _init_db(self):
with sqlite3.connect(DB_PATH) as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS jobs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_type TEXT,
payload TEXT,
status TEXT DEFAULT 'PENDING',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
error_msg TEXT,
next_try_at TIMESTAMP
)
""")
# Migration for existing DBs
try:
conn.execute("ALTER TABLE jobs ADD COLUMN next_try_at TIMESTAMP")
except sqlite3.OperationalError:
pass
def add_job(self, event_type: str, payload: dict):
with sqlite3.connect(DB_PATH) as conn:
conn.execute(
"INSERT INTO jobs (event_type, payload, status) VALUES (?, ?, ?)",
(event_type, json.dumps(payload), 'PENDING')
)
def get_next_job(self):
"""
Atomically fetches the next pending job where next_try_at is reached.
"""
job = None
with sqlite3.connect(DB_PATH) as conn:
conn.row_factory = sqlite3.Row
cursor = conn.cursor()
# Lock the job
cursor.execute("BEGIN EXCLUSIVE")
try:
cursor.execute("""
SELECT id, event_type, payload, created_at
FROM jobs
WHERE status = 'PENDING'
AND (next_try_at IS NULL OR next_try_at <= datetime('now'))
ORDER BY created_at ASC
LIMIT 1
""")
row = cursor.fetchone()
if row:
job = dict(row)
# Mark as processing
cursor.execute(
"UPDATE jobs SET status = 'PROCESSING', updated_at = datetime('now') WHERE id = ?",
(job['id'],)
)
conn.commit()
else:
conn.rollback() # No job found
except Exception:
conn.rollback()
raise
if job:
job['payload'] = json.loads(job['payload'])
return job
def retry_job_later(self, job_id, delay_seconds=60):
next_try = datetime.utcnow() + timedelta(seconds=delay_seconds)
with sqlite3.connect(DB_PATH) as conn:
conn.execute(
"UPDATE jobs SET status = 'PENDING', next_try_at = ?, updated_at = datetime('now') WHERE id = ?",
(next_try, job_id)
)
def complete_job(self, job_id):
with sqlite3.connect(DB_PATH) as conn:
conn.execute(
"UPDATE jobs SET status = 'COMPLETED', updated_at = datetime('now') WHERE id = ?",
(job_id,)
)
def fail_job(self, job_id, error_msg):
with sqlite3.connect(DB_PATH) as conn:
conn.execute(
"UPDATE jobs SET status = 'FAILED', error_msg = ?, updated_at = datetime('now') WHERE id = ?",
(str(error_msg), job_id)
)
def get_stats(self):
with sqlite3.connect(DB_PATH) as conn:
cursor = conn.cursor()
cursor.execute("SELECT status, COUNT(*) FROM jobs GROUP BY status")
return dict(cursor.fetchall())

View File

@@ -0,0 +1,60 @@
import sys
import os
from superoffice_client import SuperOfficeClient
# Configuration
WEBHOOK_NAME = "Gemini Connector Hook"
TARGET_URL = "https://floke-ai.duckdns.org/connector/webhook?token=changeme" # Token match .env
EVENTS = [
"contact.created",
"contact.changed",
"person.created",
"person.changed"
]
def register():
print("🚀 Initializing SuperOffice Client...")
try:
client = SuperOfficeClient()
except Exception as e:
print(f"❌ Failed to connect: {e}")
return
print("🔎 Checking existing webhooks...")
webhooks = client._get("Webhook")
if webhooks and 'value' in webhooks:
for wh in webhooks['value']:
if wh['Name'] == WEBHOOK_NAME:
print(f"⚠️ Webhook '{WEBHOOK_NAME}' already exists (ID: {wh['WebhookId']}).")
# Check if URL matches
if wh['TargetUrl'] != TARGET_URL:
print(f" ⚠️ URL Mismatch! Deleting old webhook...")
# Warning: _delete is not implemented in generic client yet, skipping auto-fix
print(" Please delete it manually via API or extend client.")
return
print(f"✨ Registering new webhook: {WEBHOOK_NAME}")
payload = {
"Name": WEBHOOK_NAME,
"Events": EVENTS,
"TargetUrl": TARGET_URL,
"Secret": "changeme", # Used for signature calculation by SO
"State": "Active",
"Type": "Webhook"
}
try:
# Note: _post is defined in your client, returns JSON
res = client._post("Webhook", payload)
if res and "WebhookId" in res:
print(f"✅ SUCCESS! Webhook created with ID: {res['WebhookId']}")
print(f" Target: {res['TargetUrl']}")
else:
print(f"❌ Creation failed. Response: {res}")
except Exception as e:
print(f"❌ Error during registration: {e}")
if __name__ == "__main__":
register()

View File

@@ -4,3 +4,7 @@ cryptography
pyjwt
xmltodict
holidays
fastapi
uvicorn
schedule
sqlalchemy

View File

@@ -0,0 +1,6 @@
#!/bin/bash
# Start Worker in background
python worker.py &
# Start Webhook Server in foreground
uvicorn webhook_app:app --host 0.0.0.0 --port 8000

View File

@@ -9,12 +9,19 @@ class SuperOfficeClient:
"""A client for interacting with the SuperOffice REST API."""
def __init__(self):
self.client_id = os.getenv("SO_CLIENT_ID") or os.getenv("SO_SOD")
self.client_secret = os.getenv("SO_CLIENT_SECRET")
self.refresh_token = os.getenv("SO_REFRESH_TOKEN")
self.redirect_uri = os.getenv("SO_REDIRECT_URI", "http://localhost")
self.env = os.getenv("SO_ENVIRONMENT", "sod")
self.cust_id = os.getenv("SO_CONTEXT_IDENTIFIER", "Cust55774") # Fallback for your dev
# Helper to strip quotes if Docker passed them literally
def get_clean_env(key, default=None):
val = os.getenv(key)
if val and val.strip(): # Check if not empty string
return val.strip('"').strip("'")
return default
self.client_id = get_clean_env("SO_CLIENT_ID") or get_clean_env("SO_SOD")
self.client_secret = get_clean_env("SO_CLIENT_SECRET")
self.refresh_token = get_clean_env("SO_REFRESH_TOKEN")
self.redirect_uri = get_clean_env("SO_REDIRECT_URI", "http://localhost")
self.env = get_clean_env("SO_ENVIRONMENT", "sod")
self.cust_id = get_clean_env("SO_CONTEXT_IDENTIFIER", "Cust55774") # Fallback for your dev
if not all([self.client_id, self.client_secret, self.refresh_token]):
raise ValueError("SuperOffice credentials missing in .env file.")
@@ -34,6 +41,7 @@ class SuperOfficeClient:
def _refresh_access_token(self):
"""Refreshes and returns a new access token."""
url = f"https://{self.env}.superoffice.com/login/common/oauth/tokens"
print(f"DEBUG: Refresh URL: '{url}' (Env: '{self.env}')") # DEBUG
data = {
"grant_type": "refresh_token",
"client_id": self.client_id,

View File

@@ -0,0 +1,56 @@
from fastapi import FastAPI, Request, HTTPException, BackgroundTasks
import logging
import os
import json
from queue_manager import JobQueue
# Logging Setup
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("connector-webhook")
app = FastAPI(title="SuperOffice Connector Webhook", version="2.0")
queue = JobQueue()
WEBHOOK_SECRET = os.getenv("WEBHOOK_SECRET", "changeme")
@app.post("/webhook")
async def receive_webhook(request: Request, background_tasks: BackgroundTasks):
"""
Endpoint for SuperOffice Webhooks.
"""
# 1. Verify Secret (Basic Security)
# SuperOffice puts signature in headers, but for custom webhook we might just use query param or header
# Let's assume for now a shared secret in header 'X-SuperOffice-Signature' or similar
# Or simply a secret in the URL: /webhook?token=...
token = request.query_params.get("token")
if token != WEBHOOK_SECRET:
logger.warning(f"Invalid webhook token attempt: {token}")
raise HTTPException(403, "Invalid Token")
try:
payload = await request.json()
logger.info(f"Received webhook payload: {payload}")
event_type = payload.get("Event", "unknown")
# Add to local Queue
queue.add_job(event_type, payload)
return {"status": "queued"}
except Exception as e:
logger.error(f"Error processing webhook: {e}", exc_info=True)
raise HTTPException(500, "Internal Server Error")
@app.get("/health")
def health():
return {"status": "ok"}
@app.get("/stats")
def stats():
return queue.get_stats()
if __name__ == "__main__":
import uvicorn
uvicorn.run("webhook_app:app", host="0.0.0.0", port=8000, reload=True)

View File

@@ -0,0 +1,280 @@
import time
import logging
import os
import requests
import json
from queue_manager import JobQueue
from superoffice_client import SuperOfficeClient
# Setup Logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger("connector-worker")
# Config
COMPANY_EXPLORER_URL = os.getenv("COMPANY_EXPLORER_URL", "http://company-explorer:8000")
POLL_INTERVAL = 5 # Seconds
# UDF Mapping (DEV) - Should be moved to config later
UDF_MAPPING = {
"subject": "SuperOffice:5",
"intro": "SuperOffice:6",
"social_proof": "SuperOffice:7"
}
def process_job(job, so_client: SuperOfficeClient):
"""
Core logic for processing a single job.
"""
logger.info(f"Processing Job {job['id']} ({job['event_type']})")
payload = job['payload']
event_low = job['event_type'].lower()
# 1. Extract IDs from Webhook Payload
person_id = None
contact_id = None
if "PersonId" in payload:
person_id = int(payload["PersonId"])
elif "PrimaryKey" in payload and "person" in event_low:
person_id = int(payload["PrimaryKey"])
if "ContactId" in payload:
contact_id = int(payload["ContactId"])
elif "PrimaryKey" in payload and "contact" in event_low:
contact_id = int(payload["PrimaryKey"])
# Fallback/Deep Lookup
if not contact_id and person_id:
person_data = so_client.get_person(person_id)
if person_data and "Contact" in person_data:
contact_id = person_data["Contact"].get("ContactId")
if not contact_id:
raise ValueError(f"Could not identify ContactId in payload: {payload}")
logger.info(f"Target: Person {person_id}, Contact {contact_id}")
# --- Cascading Logic ---
# If a company changes, we want to update all its persons eventually.
# We do this by adding "person.changed" jobs for each person to the queue.
if "contact" in event_low and not person_id:
logger.info(f"Company event detected. Triggering cascade for all persons of Contact {contact_id}.")
try:
persons = so_client.search(f"Person?$filter=contact/contactId eq {contact_id}")
if persons:
q = JobQueue()
for p in persons:
p_id = p.get("PersonId")
if p_id:
logger.info(f"Cascading: Enqueueing job for Person {p_id}")
q.add_job("person.changed", {"PersonId": p_id, "ContactId": contact_id, "Source": "Cascade"})
except Exception as e:
logger.warning(f"Failed to cascade to persons for contact {contact_id}: {e}")
# 1b. Fetch full contact details for 'Double Truth' check (Master Data Sync)
crm_name = None
crm_website = None
try:
contact_details = so_client.get_contact(contact_id)
if contact_details:
crm_name = contact_details.get("Name")
crm_website = contact_details.get("UrlAddress")
if not crm_website and "Urls" in contact_details and contact_details["Urls"]:
crm_website = contact_details["Urls"][0].get("Value")
except Exception as e:
logger.warning(f"Failed to fetch contact details for {contact_id}: {e}")
# 2. Call Company Explorer Provisioning API
ce_url = f"{COMPANY_EXPLORER_URL}/api/provision/superoffice-contact"
ce_req = {
"so_contact_id": contact_id,
"so_person_id": person_id,
"job_title": payload.get("JobTitle"),
"crm_name": crm_name,
"crm_website": crm_website
}
ce_auth = (os.getenv("API_USER", "admin"), os.getenv("API_PASSWORD", "gemini"))
try:
resp = requests.post(ce_url, json=ce_req, auth=ce_auth)
if resp.status_code == 404:
logger.warning(f"Company Explorer returned 404. Retrying later.")
return "RETRY"
resp.raise_for_status()
provisioning_data = resp.json()
if provisioning_data.get("status") == "processing":
logger.info(f"Company Explorer is processing {provisioning_data.get('company_name', 'Unknown')}. Re-queueing job.")
return "RETRY"
if provisioning_data.get("status") == "processing":
logger.info(f"Company Explorer is processing {provisioning_data.get('company_name', 'Unknown')}. Re-queueing job.")
return "RETRY"
except requests.exceptions.RequestException as e:
raise Exception(f"Company Explorer API failed: {e}")
logger.info(f"CE Response for Contact {contact_id}: {json.dumps(provisioning_data)}") # DEBUG
# 2b. Sync Vertical to SuperOffice (Company Level)
vertical_name = provisioning_data.get("vertical_name")
if vertical_name:
# Mappings from README
VERTICAL_MAP = {
"Logistics - Warehouse": 23,
"Healthcare - Hospital": 24,
"Infrastructure - Transport": 25,
"Leisure - Indoor Active": 26
}
vertical_id = VERTICAL_MAP.get(vertical_name)
if vertical_id:
logger.info(f"Identified Vertical '{vertical_name}' -> ID {vertical_id}")
try:
# Check current value to avoid loops
current_contact = so_client.get_contact(contact_id)
current_udfs = current_contact.get("UserDefinedFields", {})
current_val = current_udfs.get("SuperOffice:5", "")
# Normalize SO list ID format (e.g., "[I:26]" -> "26")
if current_val and current_val.startswith("[I:"):
current_val = current_val.split(":")[1].strip("]")
if str(current_val) != str(vertical_id):
logger.info(f"Updating Contact {contact_id} Vertical: {current_val} -> {vertical_id}")
so_client.update_entity_udfs(contact_id, "Contact", {"SuperOffice:5": str(vertical_id)})
else:
logger.info(f"Vertical for Contact {contact_id} already in sync ({vertical_id}).")
except Exception as e:
logger.error(f"Failed to sync vertical for Contact {contact_id}: {e}")
else:
logger.warning(f"Vertical '{vertical_name}' not found in internal mapping.")
# 2c. Sync Website (Company Level)
# TEMPORARILY DISABLED TO PREVENT LOOP (SO API Read-after-Write latency or field mapping issue)
"""
website = provisioning_data.get("website")
if website and website != "k.A.":
try:
# Re-fetch contact to ensure we work on latest version (Optimistic Concurrency)
contact_data = so_client.get_contact(contact_id)
current_url = contact_data.get("UrlAddress", "")
# Normalize for comparison
def norm(u): return str(u).lower().replace("https://", "").replace("http://", "").strip("/") if u else ""
if norm(current_url) != norm(website):
logger.info(f"Updating Website for Contact {contact_id}: {current_url} -> {website}")
# Update Urls collection (Rank 1)
new_urls = []
if "Urls" in contact_data:
found = False
for u in contact_data["Urls"]:
if u.get("Rank") == 1:
u["Value"] = website
found = True
new_urls.append(u)
if not found:
new_urls.append({"Value": website, "Rank": 1, "Description": "Website"})
contact_data["Urls"] = new_urls
else:
contact_data["Urls"] = [{"Value": website, "Rank": 1, "Description": "Website"}]
# Also set main field if empty
if not current_url:
contact_data["UrlAddress"] = website
# Write back full object
so_client._put(f"Contact/{contact_id}", contact_data)
else:
logger.info(f"Website for Contact {contact_id} already in sync.")
except Exception as e:
logger.error(f"Failed to sync website for Contact {contact_id}: {e}")
"""
# 3. Update SuperOffice (Only if person_id is present)
if not person_id:
logger.info("Sync complete (Company only). No texts to write back.")
return "SUCCESS"
texts = provisioning_data.get("texts", {})
if not any(texts.values()):
logger.info("No texts returned from Matrix (yet). Skipping write-back.")
return "SUCCESS"
udf_update = {}
if texts.get("subject"): udf_update[UDF_MAPPING["subject"]] = texts["subject"]
if texts.get("intro"): udf_update[UDF_MAPPING["intro"]] = texts["intro"]
if texts.get("social_proof"): udf_update[UDF_MAPPING["social_proof"]] = texts["social_proof"]
if udf_update:
# Loop Prevention
try:
current_person = so_client.get_person(person_id)
current_udfs = current_person.get("UserDefinedFields", {})
needs_update = False
for key, new_val in udf_update.items():
if current_udfs.get(key, "") != new_val:
needs_update = True
break
if needs_update:
logger.info(f"Applying update to Person {person_id} (Changes detected).")
success = so_client.update_entity_udfs(person_id, "Person", udf_update)
if not success:
raise Exception("Failed to update SuperOffice UDFs")
else:
logger.info(f"Skipping update for Person {person_id}: Values match (Loop Prevention).")
except Exception as e:
logger.error(f"Error during pre-update check: {e}")
raise
logger.info("Job successfully processed.")
return "SUCCESS"
def run_worker():
queue = JobQueue()
# Initialize SO Client with retry
so_client = None
while not so_client:
try:
so_client = SuperOfficeClient()
except Exception as e:
logger.critical(f"Failed to initialize SuperOffice Client: {e}. Retrying in 30s...")
time.sleep(30)
logger.info("Worker started. Polling queue...")
while True:
try:
job = queue.get_next_job()
if job:
try:
result = process_job(job, so_client)
if result == "RETRY":
queue.retry_job_later(job['id'], delay_seconds=120)
else:
queue.complete_job(job['id'])
except Exception as e:
logger.error(f"Job {job['id']} failed: {e}", exc_info=True)
queue.fail_job(job['id'], str(e))
else:
time.sleep(POLL_INTERVAL)
except Exception as e:
logger.error(f"Worker loop error: {e}")
time.sleep(POLL_INTERVAL)
if __name__ == "__main__":
run_worker()

View File

@@ -220,6 +220,31 @@ services:
GEMINI_API_KEY_FILE: "/app/gemini_api_key.txt"
# Port 8000 is internal only
connector-superoffice:
build:
context: ./connector-superoffice
dockerfile: Dockerfile
container_name: connector-superoffice
restart: unless-stopped
ports:
- "8003:8000" # Expose internal 8000 to host 8003 (8002 was taken)
volumes:
- ./connector-superoffice:/app
- ./gemini_api_key.txt:/app/gemini_api_key.txt
- ./connector_queue.db:/app/connector_queue.db
environment:
PYTHONUNBUFFERED: "1"
API_USER: "admin"
API_PASSWORD: "gemini"
DB_PATH: "/app/connector_queue.db"
COMPANY_EXPLORER_URL: "http://company-explorer:8000"
# Pass through SO credentials from host .env
SO_CLIENT_ID: "${SO_CLIENT_ID}"
SO_CLIENT_SECRET: "${SO_CLIENT_SECRET}"
SO_REFRESH_TOKEN: "${SO_REFRESH_TOKEN}"
SO_ENVIRONMENT: "${SO_ENVIRONMENT}"
SO_CONTEXT_IDENTIFIER: "${SO_CONTEXT_IDENTIFIER}"
# --- INFRASTRUCTURE SERVICES ---
duckdns:
image: lscr.io/linuxserver/duckdns:latest

View File

@@ -89,8 +89,8 @@ http {
location /ce/ {
# Company Explorer (Robotics Edition)
# KEIN Trailing Slash, damit der /ce/ Pfad erhalten bleibt!
proxy_pass http://company-explorer:8000;
# Trailing Slash STRIPS the /ce/ prefix!
proxy_pass http://company-explorer:8000/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Upgrade $http_upgrade;
@@ -166,5 +166,18 @@ http {
# Explicit timeouts
proxy_read_timeout 86400; # Long timeout for stream
}
location /connector/ {
# SuperOffice Connector Webhook
# Disable Basic Auth for Webhooks as SO cannot provide it easily
auth_basic off;
# Forward to FastAPI app
# Trailing Slash STRIPS the /connector/ prefix!
proxy_pass http://connector-superoffice:8000/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
}

111
seed_test_data.py Normal file
View File

@@ -0,0 +1,111 @@
import sys
import os
import requests
import json
from sqlalchemy import create_engine, select
from sqlalchemy.orm import sessionmaker
# Add paths to use backend models directly for complex seeding (Matrix/Person)
sys.path.append(os.path.join(os.getcwd(), "company-explorer"))
from backend.database import Base, Company, Contact, Industry, JobRoleMapping, MarketingMatrix
# Database Connection (Direct SQL access is easier for seeding specific IDs)
DB_PATH = "sqlite:///companies_v3_fixed_2.db" # Local relative path
engine = create_engine(DB_PATH)
Session = sessionmaker(bind=engine)
session = Session()
def seed():
print("--- Company Explorer Test Data Seeder ---")
print("This script prepares the database for the SuperOffice Connector End-to-End Test.")
# 1. User Input
so_contact_id = input("Enter SuperOffice Contact ID (Company) [e.g. 123]: ").strip()
so_person_id = input("Enter SuperOffice Person ID [e.g. 456]: ").strip()
company_name = input("Enter Company Name [e.g. Test GmbH]: ").strip() or "Test GmbH"
person_role = "Geschäftsführer" # Fixed for test simplicity
industry_name = "Logistik" # Fixed for test simplicity
if not so_contact_id or not so_person_id:
print("Error: IDs are required!")
return
print(f"\nSeeding for Company '{company_name}' (ID: {so_contact_id}) and Person (ID: {so_person_id})...")
# 2. Check/Create Industry
industry = session.query(Industry).filter_by(name=industry_name).first()
if not industry:
industry = Industry(name=industry_name, description="Test Industry")
session.add(industry)
session.commit()
print(f"✅ Created Industry '{industry_name}'")
else:
print(f" Industry '{industry_name}' exists")
# 3. Check/Create Job Role
role_map = session.query(JobRoleMapping).filter_by(role=person_role).first()
if not role_map:
role_map = JobRoleMapping(pattern=person_role, role=person_role) # Simple mapping
session.add(role_map)
session.commit()
print(f"✅ Created Role Mapping '{person_role}'")
else:
print(f" Role Mapping '{person_role}' exists")
# 4. Check/Create Company
company = session.query(Company).filter_by(crm_id=str(so_contact_id)).first()
if not company:
company = Company(
name=company_name,
crm_id=str(so_contact_id),
industry_ai=industry_name, # Link to our test industry
status="ENRICHED"
)
session.add(company)
session.commit()
print(f"✅ Created Company '{company_name}' with CRM-ID {so_contact_id}")
else:
company.industry_ai = industry_name # Ensure correct industry for test
session.commit()
print(f" Company '{company_name}' exists (Updated Industry)")
# 5. Check/Create Person
person = session.query(Contact).filter_by(so_person_id=int(so_person_id)).first()
if not person:
person = Contact(
company_id=company.id,
first_name="Max",
last_name="Mustermann",
so_person_id=int(so_person_id),
so_contact_id=int(so_contact_id),
role=person_role
)
session.add(person)
session.commit()
print(f"✅ Created Person with SO-ID {so_person_id}")
else:
person.role = person_role # Ensure role match
session.commit()
print(f" Person with SO-ID {so_person_id} exists (Updated Role)")
# 6. Check/Create Matrix Entry
matrix = session.query(MarketingMatrix).filter_by(industry_id=industry.id, role_id=role_map.id).first()
if not matrix:
matrix = MarketingMatrix(
industry_id=industry.id,
role_id=role_map.id,
subject="Test Betreff: Optimierung für {{company_name}}",
intro="Hallo, dies ist ein generierter Test-Text aus dem Company Explorer.",
social_proof="Wir arbeiten bereits erfolgreich mit anderen Logistikern zusammen."
)
session.add(matrix)
session.commit()
print(f"✅ Created Matrix Entry for {industry_name} x {person_role}")
else:
print(f" Matrix Entry exists")
print("\n🎉 Seeding Complete! The Company Explorer is ready.")
print(f"You can now trigger the Webhook for Contact {so_contact_id} / Person {so_person_id}.")
if __name__ == "__main__":
seed()