[30388f42] Infrastructure Hardening: Repaired CE/Connector DB schema, fixed frontend styling build, implemented robust echo shield in worker v2.1.1, and integrated Lead Engine into gateway.

This commit is contained in:
2026-03-07 14:08:42 +00:00
parent efcaa57cf0
commit ae2303b733
404 changed files with 24100 additions and 13301 deletions

View File

@@ -1,10 +1,31 @@
FROM python:3.9-slim
# --- STAGE 1: Builder ---
FROM python:3.11-slim AS builder
WORKDIR /app
# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Install python dependencies
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
# --- STAGE 2: Runtime ---
FROM python:3.11-slim
WORKDIR /app
# Copy installed packages
COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH
# Copy app code
COPY . .
RUN pip install streamlit pandas
ENV PYTHONUNBUFFERED=1
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
EXPOSE 8501
EXPOSE 8004
# Start monitor in background and streamlit in foreground
CMD ["sh", "-c", "python monitor.py & streamlit run app.py --server.port=8501 --server.address=0.0.0.0 --server.baseUrlPath=/lead"]

114
lead-engine/README.md Normal file
View File

@@ -0,0 +1,114 @@
# Lead Engine: Multi-Source Automation v1.3 [31988f42]
## 🚀 Übersicht
Die **Lead Engine** ist ein spezialisiertes Modul zur autonomen Verarbeitung von B2B-Anfragen aus verschiedenen Quellen. Sie fungiert als Brücke zwischen dem E-Mail-Postfach und dem **Company Explorer**, um innerhalb von Minuten hochgradig personalisierte Antwort-Entwürfe auf "Human Expert Level" zu generieren.
## 🛠 Hauptfunktionen
### 1. Intelligenter E-Mail Ingest
* **Multi-Source:** Überwacht das Postfach `info@robo-planet.de` via **Microsoft Graph API** auf verschiedene Lead-Typen.
* **Filter & Routing:** Erkennt und unterscheidet Anfragen von **TradingTwins** und dem **Roboplanet-Kontaktformular**.
* **Parsing:** Spezialisierte HTML-Parser extrahieren für jede Quelle strukturierte Daten (Firma, Kontakt, Bedarf, etc.).
### 2. Contact Research (LinkedIn Lookup)
* **Automatisierung:** Sucht via **SerpAPI** und **Gemini 2.0 Flash** nach der beruflichen Position des Ansprechpartners.
* **Ergebnis:** Identifiziert Rollen wie "CFO", "Mitglied der Klinikleitung" oder "Facharzt", um den Tonfall der Antwort perfekt anzupassen.
### 3. Company Explorer Sync & Monitoring
* **Integration:** Legt Accounts und Kontakte automatisch im CE an.
* **Monitor:** Ein Hintergrund-Prozess (`monitor.py`) überwacht asynchron den Status der KI-Analyse im CE.
* **Daten-Pull:** Sobald die Analyse (Branche, Dossier) fertig ist, werden die Daten in die lokale Lead-Datenbank übernommen.
### 4. Expert Response Generator
* **KI-Engine:** Nutzt Gemini 2.0 Flash zur Erstellung von E-Mail-Entwürfen.
* **Kontext:** Kombiniert Lead-Daten (Fläche) + CE-Daten (Dossier) + Matrix-Argumente (Pains/Gains).
* **Persistente Entwürfe:** Generierte E-Mail-Entwürfe werden direkt beim Lead gespeichert und bleiben erhalten.
### 5. UI & Qualitätskontrolle
* **Visuelle Unterscheidung:** Klare Kennzeichnung der Lead-Quelle (z.B. 🌐 für Website, 🤝 für Partner) in der Übersicht.
* **Status-Tracking:** Visueller Indikator (🆕/✅) für den Synchronisations-Status mit dem Company Explorer.
* **Low-Quality-Warnung:** Visuelle Kennzeichnung (⚠️) von Leads mit Free-Mail-Adressen oder ohne Firmennamen direkt in der Übersicht.
### 6. Trading Twins Autopilot (PRODUKTIV v2.0)
Der vollautomatische "Zero Touch" Workflow für Trading Twins Anfragen.
* **Human-in-the-Loop:** Vor Versand erhält Elizabeta Melcer eine Teams-Nachricht ("Approve/Deny") via Adaptive Card.
* **Feedback-Server:** Ein integrierter FastAPI-Server (Port 8004) verarbeitet die Klicks aus Teams und gibt sofortiges visuelles Feedback.
* **Direct Calendar Booking (Eigener Service):**
* **Problem:** MS Bookings API lässt sich nicht per Application Permission steuern (Erstellung verboten).
* **Lösung:** Wir haben einen eigenen Micro-Booking-Service gebaut.
* **Ablauf:** Das System prüft echte freie Slots im Kalender von `e.melcer` (via Graph API).
* **E-Mail:** Der Kunde erhält eine E-Mail mit zwei konkreten Terminvorschlägen (Links).
* **Buchung:** Klick auf einen Link -> Server bestätigt -> **Echte Outlook-Kalendereinladung** wird automatisch von `info@` versendet.
* **Technologie:**
* **Teams Webhook:** Für interaktive "Adaptive Cards".
* **Graph API:** Für E-Mail-Versand (`info@`) und Kalender-Check (`e.melcer`).
* **Orchestrator (`manager.py`):** Steuert den Ablauf (Lead -> CE -> Teams -> Timer -> Mail -> Booking).
## 🏗 Architektur
```text
/app/lead-engine/
├── app.py # Streamlit Web-Interface
├── trading_twins_ingest.py # E-Mail Importer (Graph API)
├── monitor.py # Monitor + Trigger für Orchestrator
├── trading_twins/ # [NEU] Autopilot Modul
│ ├── manager.py # Orchestrator, FastAPI Server, Graph API Logic
│ ├── signature.html # HTML-Signatur für E-Mails
│ └── debug_bookings_only.py # Diagnose-Tool (Legacy)
├── db.py # Lokale Lead-Datenbank
└── data/ # DB-Storage
```
## 🚨 Lessons Learned & Troubleshooting (Critical)
### 1. Microsoft Bookings API Falle
* **Problem:** Wir wollten `Bookings.Manage.All` nutzen, um eine Buchungsseite für `info@` zu erstellen.
* **Fehler:** `403 Forbidden` ("Api Business.Create does not support the token type: App") und `500 Internal Server Error` (bei `GET`).
* **Erkenntnis:** Eine App (Service Principal) kann zwar Bookings *verwalten*, aber **nicht initial erstellen**. Die erste Seite muss zwingend manuell oder per Delegated-User angelegt werden. Zudem erfordert der Zugriff oft eine User-Lizenz, die Service Principals nicht haben.
* **Lösung:** Umstieg auf **Direct Calendar Booking** (Graph API `Calendar.ReadWrite`). Wir schreiben Termine direkt in den Outlook-Kalender, statt über die Bookings-Schicht zu gehen. Das ist robuster und voll automatisierbar.
### 4. Exchange AppOnly AccessPolicy
* **Problem:** Trotz globaler `Calendars.ReadWrite` Berechtigung schlug das Erstellen von Terminen im Kalender von `e.melcer@` fehl (`403 Forbidden: Blocked by tenant configured AppOnly AccessPolicy settings`).
* **Erkenntnis:** Viele Organisationen schränken per Policy ein, auf welche Postfächer eine App zugreifen darf. Ein Zugriff auf "fremde" Postfächer ist oft standardmäßig gesperrt.
* **Lösung:** Der Termin wird im **eigenen Kalender** des Service-Accounts (`info@robo-planet.de`) erstellt. Der zuständige Mitarbeiter (`e.melcer@`) wird als **erforderlicher Teilnehmer** hinzugefügt. Dies umgeht die Policy-Sperre und stellt sicher, dass der Mitarbeiter den Termin in seinem Kalender sieht und das Teams-Meeting voll steuern kann.
## 🚀 Inbetriebnahme (Docker)
## 🚀 Inbetriebnahme (Docker)
Die Lead Engine ist als Service in der zentralen `docker-compose.yml` integriert.
```bash
# Neustart des Dienstes nach Code-Änderungen
docker-compose up -d --build --force-recreate lead-engine
```
**Zugriff:** `https://floke-ai.duckdns.org/lead/` (Passwortgeschützt)
**Feedback API:** `https://floke-ai.duckdns.org/feedback/` (Öffentlich)
## 📝 Credentials (.env)
Für den Betrieb sind folgende Variablen in der zentralen `.env` zwingend erforderlich:
```env
# App 1: Info-Postfach (Schreiben)
INFO_Application_ID=...
INFO_Tenant_ID=...
INFO_Secret=...
# App 2: E.Melcer Kalender (Lesen)
CAL_APPID=...
CAL_TENNANT_ID=...
CAL_SECRET=...
# Teams
TEAMS_WEBHOOK_URL=...
# Public URL
FEEDBACK_SERVER_BASE_URL=https://floke-ai.duckdns.org/feedback
```
---
*Dokumentationsstand: 5. März 2026*
*Task: [31988f42]*

View File

@@ -0,0 +1,76 @@
# Trading Twins Autopilot - Setup & Go-Live Checkliste
Dieses Dokument beschreibt die Schritte zur finalen Inbetriebnahme des vollautomatischen Trading Twins E-Mail-Versands.
---
## 1. IT-Voraussetzungen (Warten auf IT)
Sobald die IT die Anfrage bearbeitet hat, benötigen wir folgende Informationen:
* **Teams Webhook URL:**
* *Beispiel:* `https://outlook.office.com/webhook/xxxxx@yyyyy/IncomingWebhook/zzzzz`
* *Verwendung:* Zum Senden der "Approve/Deny"-Karte an Elizabeta.
* **Azure App Registration (Graph API):**
* **Application (Client) ID:** (GUID)
* **Directory (Tenant) ID:** (GUID)
* **Client Secret:** (Geheimer String)
* **Berechtigungen:** `Mail.Send` (App) und `Calendars.Read` (Delegated/App) für `e.melcer@robo-planet.de`.
---
## 2. Konfiguration (.env)
Füge diese Werte in die zentrale `.env`-Datei des Projekts ein:
```env
# Trading Twins Autopilot
TEAMS_WEBHOOK_URL="<HIER_URL_EINFÜGEN>"
AZURE_CLIENT_ID="<HIER_CLIENT_ID>"
AZURE_CLIENT_SECRET="<HIER_SECRET>"
AZURE_TENANT_ID="<HIER_TENANT_ID>"
# API Erreichbarkeit (Damit die Buttons in Teams funktionieren)
API_BASE_URL="https://floke-ai.duckdns.org/api/tt"
# (Hinweis: Nginx-Proxy muss Port 8004 nach außen leiten oder intern erreichbar sein)
```
---
## 3. Assets prüfen
Stelle sicher, dass diese Dateien im Ordner `/app/lead-engine/trading_twins/` vorhanden sind:
1. **Banner-Bild:** `RoboPlanetBannerWebinarEinladung.png`
* *Check:* `ls -l /app/lead-engine/trading_twins/RoboPlanetBannerWebinarEinladung.png`
2. **HTML-Signatur:** `signature.html`
* *Inhalt:* Prüfe, ob die Links und Telefonnummern korrekt sind.
* *Platzhalter:* Achte darauf, dass `cid:banner_image` im `<img>`-Tag steht, damit das Bild inline angezeigt wird.
---
## 4. Test-Modus deaktivieren
Aktuell läuft das System im "Mock-Modus" für den Kalender (simuliert freie Termine).
Sobald der echte Zugriff besteht:
1. Öffne `/app/lead-engine/trading_twins/manager.py`.
2. Ersetze `self._mock_calendar_availability()` durch den echten Graph-API-Aufruf (Code muss noch finalisiert werden, sobald `Calendars.Read` aktiv ist).
---
## 5. Logs überwachen
Nach dem Start (`docker-compose restart lead-engine`) kannst du den Prozess live verfolgen:
```bash
docker logs -f lead-engine | grep "TradingTwins"
```
* **Erwarteter Output:**
* `[ACTION] Triggering Trading Twins Orchestrator...`
* `Job erstellt: ...`
* `Timer abgelaufen...`
* `🚀 E-MAIL WURDE VERSENDET...`

0
lead-engine/__init__.py Normal file
View File

View File

@@ -1,8 +1,47 @@
import streamlit as st
import pandas as pd
from db import get_leads, init_db
from db import get_leads, init_db, reset_lead, update_lead_draft
import json
from enrich import run_sync # Import our sync function
import re
import os
from enrich import run_sync, refresh_ce_data, sync_single_lead
from generate_reply import generate_email_draft
def clean_html_to_text(html_content):
"""Surgical helper to extract relevant Tradingtwins data and format it cleanly."""
if not html_content:
return ""
# 1. Strip head and style
clean = re.sub(r'<head.*?>.*?</head>', '', html_content, flags=re.DOTALL | re.IGNORECASE)
clean = re.sub(r'<style.*?>.*?</style>', '', clean, flags=re.DOTALL | re.IGNORECASE)
# 2. Extract the core data block (from 'Datum:' until the matchmaking plug)
# We look for the first 'Datum:' label
start_match = re.search(r'Datum:', clean, re.IGNORECASE)
end_match = re.search(r'Kennen Sie schon Ihr persönliches Konto', clean, re.IGNORECASE)
if start_match:
start_pos = start_match.start()
end_pos = end_match.start() if end_match else len(clean)
clean = clean[start_pos:end_pos]
# 3. Format Table Structure: </td><td> should be a space/tab, </tr> a newline
# This prevents the "Label on one line, value on next" issue
clean = re.sub(r'</td>\s*<td.*?>', ' ', clean, flags=re.IGNORECASE)
clean = re.sub(r'</tr>', '\n', clean, flags=re.IGNORECASE)
# 4. Standard Cleanup
clean = re.sub(r'<br\s*/?>', '\n', clean, flags=re.IGNORECASE)
clean = re.sub(r'</p>', '\n', clean, flags=re.IGNORECASE)
clean = re.sub(r'<.*?>', '', clean)
# 5. Entity Decoding
clean = clean.replace('&nbsp;', ' ').replace('&amp;', '&').replace('&quot;', '"').replace('&gt;', '>')
# 6. Final Polish: remove empty lines and leading/trailing whitespace
lines = [line.strip() for line in clean.split('\n') if line.strip()]
return '\n'.join(lines)
st.set_page_config(page_title="TradingTwins Lead Engine", layout="wide")
@@ -18,7 +57,20 @@ if st.sidebar.button("1. Ingest Emails (Mock)"):
st.sidebar.success(f"Ingested {count} new leads.")
st.rerun()
if st.sidebar.button("2. Sync to Company Explorer"):
if st.sidebar.button("2. Ingest Real Emails (Graph API)"):
try:
from trading_twins_ingest import process_leads
with st.spinner("Fetching emails from Microsoft Graph..."):
count = process_leads()
if count > 0:
st.sidebar.success(f"Successfully ingested {count} new leads form inbox!")
else:
st.sidebar.info("No new leads found in inbox.")
st.rerun()
except Exception as e:
st.sidebar.error(f"Ingest failed: {e}")
if st.sidebar.button("3. Sync to Company Explorer"):
with st.spinner("Syncing with Company Explorer API..."):
# Capture output for debugging
try:
@@ -37,6 +89,42 @@ if st.sidebar.button("2. Sync to Company Explorer"):
except Exception as e:
st.error(f"Sync Failed: {e}")
if st.sidebar.checkbox("Show System Debug"):
st.sidebar.subheader("System Diagnostics")
# 1. API Key Check
from lookup_role import get_gemini_key
key = get_gemini_key()
if key:
st.sidebar.success(f"Gemini Key found ({key[:5]}...)")
else:
st.sidebar.error("Gemini Key NOT found!")
# 2. SerpAPI Check
serp_key = os.getenv("SERP_API")
if serp_key:
st.sidebar.success(f"SerpAPI Key found ({serp_key[:5]}...)")
else:
st.sidebar.error("SerpAPI Key NOT found in Env!")
# 3. Network Check
try:
import requests
res = requests.get("https://generativelanguage.googleapis.com", timeout=2)
st.sidebar.success(f"Gemini API Reachable ({res.status_code})")
except Exception as e:
st.sidebar.error(f"Network Error: {e}")
# 4. Live Lookup Test
if st.sidebar.button("Test Role Lookup (Georg Stahl)"):
from lookup_role import lookup_person_role
with st.sidebar.status("Running Lookup..."):
res = lookup_person_role("Georg Stahl", "Klemm Bohrtechnik GmbH")
if res:
st.sidebar.success(f"Result: {res}")
else:
st.sidebar.error("Result: None")
# Main View
leads = get_leads()
df = pd.DataFrame(leads)
@@ -50,26 +138,123 @@ if not df.empty:
st.subheader("Lead Pipeline")
for index, row in df.iterrows():
with st.expander(f"{row['company_name']} ({row['status']})"):
c1, c2 = st.columns(2)
c1.write(f"**Contact:** {row['contact_name']}")
c1.write(f"**Email:** {row['email']}")
c1.text(row['raw_body'][:200] + "...")
# Format date for title
date_str = ""
if row.get('received_at'):
try:
dt = pd.to_datetime(row['received_at'])
date_str = dt.strftime("%d.%m. %H:%M")
except:
pass
# --- DYNAMIC TITLE ---
source_icon = "🌐" if row.get('source') == 'Website-Formular' else "🤝"
status_icon = "" if row.get('status') == 'synced' else "🆕"
meta = {}
if row.get('lead_metadata'):
try: meta = json.loads(row['lead_metadata'])
except: pass
quality_icon = "⚠️ " if meta.get('is_low_quality') else ""
title = f"{quality_icon}{status_icon} {source_icon} {row.get('source', 'Lead')} | {date_str} | {row['company_name']}"
with st.expander(title):
# The full warning message is still shown inside for clarity
if meta.get('is_low_quality'):
st.warning("⚠️ **Low Quality Lead detected** (Free-mail provider or missing company name). Please verify manually.")
# --- SECTION 1: LEAD INFO & INTELLIGENCE ---
col_lead, col_intel = st.columns(2)
enrichment = json.loads(row['enrichment_data']) if row['enrichment_data'] else {}
if enrichment:
c2.write("--- Integration Status ---")
if enrichment.get('ce_id'):
c2.success(f"✅ Linked to Company Explorer (ID: {enrichment['ce_id']})")
c2.write(f"**CE Status:** {enrichment.get('ce_status')}")
with col_lead:
st.markdown("### 📋 Lead Data")
st.write(f"**Salutation:** {meta.get('salutation', '-')}")
st.write(f"**Contact:** {row['contact_name']}")
st.write(f"**Email:** {row['email']}")
st.write(f"**Phone:** {meta.get('phone', row.get('phone', '-'))}")
role = meta.get('role')
if role:
st.info(f"**Role:** {role}")
else:
c2.warning("⚠️ Not yet synced or failed")
if st.button("🔍 Find Role", key=f"role_{row['id']}"):
from enrich import enrich_contact_role
with st.spinner("Searching..."):
found_role = enrich_contact_role(row)
if found_role: st.success(f"Found: {found_role}"); st.rerun()
else: st.error("No role found.")
c2.info(f"Log: {enrichment.get('message')}")
st.write(f"**Area:** {meta.get('area', '-')}")
st.write(f"**Purpose:** {meta.get('purpose', '-')}")
st.write(f"**Functions:** {meta.get('cleaning_functions', '-')}")
st.write(f"**Location:** {meta.get('zip', '')} {meta.get('city', '')}")
with col_intel:
st.markdown("### 🔍 Intelligence (CE)")
enrichment = json.loads(row['enrichment_data']) if row['enrichment_data'] else {}
ce_id = enrichment.get('ce_id')
if enrichment.get('ce_data'):
c2.json(enrichment['ce_data'])
if ce_id:
st.success(f"✅ Linked to Company Explorer (ID: {ce_id})")
ce_data = enrichment.get('ce_data', {})
vertical = ce_data.get('industry_ai') or ce_data.get('vertical')
summary = ce_data.get('research_dossier') or ce_data.get('summary')
if vertical and vertical != 'None':
st.info(f"**Industry:** {vertical}")
else:
st.warning("Industry Analysis pending...")
if summary:
with st.expander("Show AI Research Dossier", expanded=True):
st.write(summary)
if st.button("🔄 Refresh CE Data", key=f"refresh_{row['id']}"):
with st.spinner("Fetching..."):
refresh_ce_data(row['id'], ce_id)
st.rerun()
else:
st.warning("⚠️ Not synced with Company Explorer yet")
if st.button("🚀 Sync to Company Explorer", key=f"sync_single_{row['id']}"):
with st.spinner("Syncing..."):
sync_single_lead(row['id'])
st.rerun()
st.divider()
# --- SECTION 2: ORIGINAL EMAIL ---
with st.expander("✉️ View Original Email Content"):
st.text(clean_html_to_text(row['raw_body']))
if st.checkbox("Show Raw HTML", key=f"raw_{row['id']}"):
st.code(row['raw_body'], language="html")
st.divider()
# --- SECTION 3: RESPONSE DRAFT (Full Width) ---
st.markdown("### 📝 Response Draft")
if row['status'] != 'new' and ce_id:
if st.button("✨ Generate Expert Reply", key=f"gen_{row['id']}", type="primary"):
with st.spinner("Writing email..."):
ce_data = enrichment.get('ce_data', {})
draft = generate_email_draft(row.to_dict(), ce_data)
update_lead_draft(row['id'], draft) # Save to DB
st.rerun() # Rerun to display the new draft from DB
# Always display the draft from the database if it exists
if row.get('response_draft'):
st.text_area("Email Entwurf", value=row['response_draft'], height=400)
st.button("📋 Copy to Clipboard", key=f"copy_{row['id']}", on_click=lambda: st.write("Copy functionality simulated"))
else:
st.info("Sync with Company Explorer first to generate a response.")
if row['status'] != 'new':
st.markdown("---")
if st.button("🔄 Reset Lead Status", key=f"reset_{row['id']}", help="Back to 'new' status"):
reset_lead(row['id'])
st.rerun()
else:
st.info("No leads found. Click 'Ingest Emails' in the sidebar.")

View File

@@ -0,0 +1,166 @@
import requests
import os
import base64
import json
import time
# --- Konfiguration ---
# Default to internal Docker service URL
BASE_URL = os.getenv("COMPANY_EXPLORER_URL", "http://company-explorer:8000/api")
API_USER = os.getenv("COMPANY_EXPLORER_API_USER", "admin")
API_PASSWORD = os.getenv("COMPANY_EXPLORER_API_PASSWORD", "gemini")
def _make_api_request(method, endpoint, params=None, json_data=None):
"""Eine zentrale Hilfsfunktion für API-Anfragen."""
url = f"{BASE_URL}{endpoint}"
# Remove leading slash if BASE_URL ends with one or endpoint starts with one to avoid double slashes issues
# Actually, requests handles this mostly, but let's be clean.
# Assuming BASE_URL has no trailing slash and endpoint has leading slash
try:
# Auth is technically handled by Nginx, but if we go direct to port 8000,
# the backend might not enforce it unless configured.
# But we pass it anyway.
response = requests.request(
method,
url,
auth=(API_USER, API_PASSWORD),
params=params,
json=json_data,
timeout=20
)
response.raise_for_status()
if response.status_code == 204 or not response.content:
return {}
return response.json()
except requests.exceptions.HTTPError as http_err:
return {"error": f"HTTP error occurred: {http_err} - {response.text}"}
except requests.exceptions.ConnectionError as conn_err:
return {"error": f"Connection error: {conn_err}."}
except requests.exceptions.Timeout as timeout_err:
return {"error": f"Timeout error: {timeout_err}."}
except requests.exceptions.RequestException as req_err:
return {"error": f"An unexpected error occurred: {req_err}"}
except json.JSONDecodeError:
return {"error": f"Failed to decode JSON from response: {response.text}"}
def check_company_existence(company_name: str) -> dict:
"""Prüft die Existenz eines Unternehmens."""
response = _make_api_request("GET", "/companies", params={"search": company_name})
if "error" in response:
return {"exists": False, "error": response["error"]}
if response.get("total", 0) > 0:
for company in response.get("items", []):
if company.get("name", "").lower() == company_name.lower():
return {"exists": True, "company": company}
return {"exists": False, "message": f"Company '{company_name}' not found."}
def create_company(company_name: str) -> dict:
"""Erstellt ein neues Unternehmen."""
return _make_api_request("POST", "/companies", json_data={"name": company_name, "country": "DE"})
def trigger_discovery(company_id: int) -> dict:
"""Startet den Discovery-Prozess."""
return _make_api_request("POST", "/enrich/discover", json_data={"company_id": company_id})
def trigger_analysis(company_id: int) -> dict:
"""Startet den Analyse-Prozess."""
return _make_api_request("POST", "/enrich/analyze", json_data={"company_id": company_id})
def get_company_details(company_id: int) -> dict:
"""Holt die vollständigen Details zu einem Unternehmen."""
return _make_api_request("GET", f"/companies/{company_id}")
def create_contact(company_id: int, contact_data: dict) -> dict:
"""Erstellt einen neuen Kontakt für ein Unternehmen im Company Explorer."""
payload = {
"company_id": company_id,
"first_name": contact_data.get("first_name"),
"last_name": contact_data.get("last_name"),
"email": contact_data.get("email"),
"job_title": contact_data.get("job_title"),
"role": contact_data.get("role"),
"is_primary": contact_data.get("is_primary", True)
}
return _make_api_request("POST", "/contacts", json_data=payload)
def handle_company_workflow(company_name: str, contact_info: dict = None) -> dict:
"""
Haupt-Workflow: Prüft, erstellt und reichert ein Unternehmen an.
Optional wird auch ein Kontakt angelegt.
Gibt die finalen Unternehmensdaten zurück.
"""
print(f"Workflow gestartet für: '{company_name}'")
# 1. Prüfen, ob das Unternehmen existiert
existence_check = check_company_existence(company_name)
company_id = None
if existence_check.get("exists"):
company_id = existence_check["company"]["id"]
print(f"Unternehmen '{company_name}' (ID: {company_id}) existiert bereits.")
elif "error" in existence_check:
print(f"Fehler bei der Existenzprüfung: {existence_check['error']}")
return {"status": "error", "message": existence_check['error']}
else:
# 2. Wenn nicht, Unternehmen erstellen
print(f"Unternehmen '{company_name}' nicht gefunden. Erstelle es...")
creation_response = create_company(company_name)
if "error" in creation_response:
return {"status": "error", "message": creation_response['error']}
company_id = creation_response.get("id")
print(f"Unternehmen '{company_name}' erfolgreich mit ID {company_id} erstellt.")
# 2b. Kontakt anlegen/aktualisieren (falls Info vorhanden)
if company_id and contact_info:
print(f"Lege Kontakt für {contact_info.get('last_name')} an...")
contact_res = create_contact(company_id, contact_info)
if "error" in contact_res:
print(f"Hinweis: Kontakt konnte nicht angelegt werden: {contact_res['error']}")
# 3. Discovery anstoßen (falls Status NEW)
# Wir holen Details, um den Status zu prüfen
details = get_company_details(company_id)
if details.get("status") == "NEW":
print(f"Starte Discovery für ID {company_id}...")
trigger_discovery(company_id)
# 4. Warten, bis Discovery eine Website gefunden hat (Polling)
max_wait_time = 30
start_time = time.time()
website_found = False
print("Warte auf Abschluss der Discovery (max. 30s)...")
while time.time() - start_time < max_wait_time:
details = get_company_details(company_id)
if details.get("website") and details["website"] not in ["", "k.A."]:
print(f"Website gefunden: {details['website']}")
website_found = True
break
time.sleep(3)
print(".")
# 5. Analyse anstoßen (falls Website da, aber noch nicht ENRICHED)
if details.get("website") and details["website"] not in ["", "k.A."] and details.get("status") != "ENRICHED":
print(f"Starte Analyse für ID {company_id}...")
trigger_analysis(company_id)
# 6. Finale Daten abrufen und zurückgeben
final_company_data = get_company_details(company_id)
return {"status": "synced", "data": final_company_data}
if __name__ == "__main__":
test_company_existing = "Robo-Planet GmbH"
test_company_new = f"Zufallsfirma {int(time.time())}"
print(f"--- Szenario 1: Test mit einem existierenden Unternehmen: '{test_company_existing}' ---")
result_existing = handle_company_workflow(test_company_existing)
print(json.dumps(result_existing, indent=2, ensure_ascii=False))
print(f"\n--- Szenario 2: Test mit einem neuen Unternehmen: '{test_company_new}' ---")
result_new = handle_company_workflow(test_company_new)
print(json.dumps(result_new, indent=2, ensure_ascii=False))

View File

@@ -27,12 +27,26 @@ def init_db():
email TEXT,
phone TEXT,
raw_body TEXT,
lead_metadata TEXT,
enrichment_data TEXT,
status TEXT DEFAULT 'new',
response_draft TEXT,
sent_at TIMESTAMP
)
''')
# Simple migration check: add 'lead_metadata' if not exists
c.execute("PRAGMA table_info(leads)")
columns = [row[1] for row in c.fetchall()]
if 'lead_metadata' not in columns:
print("Migrating DB: Adding lead_metadata column...")
c.execute('ALTER TABLE leads ADD COLUMN lead_metadata TEXT')
if 'source' not in columns:
print("Migrating DB: Adding source column...")
c.execute('ALTER TABLE leads ADD COLUMN source TEXT')
conn.commit()
conn.close()
@@ -41,21 +55,40 @@ def insert_lead(lead_data):
if not os.path.exists(DB_PATH):
init_db()
# Extract metadata fields
meta = {
'area': lead_data.get('area'),
'purpose': lead_data.get('purpose'),
'zip': lead_data.get('zip'),
'city': lead_data.get('city'),
'role': lead_data.get('role'),
'salutation': lead_data.get('salutation'),
'phone': lead_data.get('phone'),
'cleaning_functions': lead_data.get('cleaning_functions'),
'is_free_mail': lead_data.get('is_free_mail', False),
'is_low_quality': lead_data.get('is_low_quality', False)
}
# Use provided received_at or default to now
received_at = lead_data.get('received_at') or datetime.now()
conn = sqlite3.connect(DB_PATH)
c = conn.cursor()
try:
c.execute('''
INSERT INTO leads (source_id, received_at, company_name, contact_name, email, phone, raw_body, status)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
INSERT INTO leads (source_id, received_at, company_name, contact_name, email, phone, raw_body, lead_metadata, status, source)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
''', (
lead_data.get('id'),
datetime.now(),
received_at,
lead_data.get('company'),
lead_data.get('contact'),
lead_data.get('email'),
lead_data.get('phone'),
lead_data.get('raw_body'),
'new'
json.dumps(meta),
'new',
lead_data.get('source') # Added source
))
conn.commit()
return True
@@ -64,6 +97,14 @@ def insert_lead(lead_data):
finally:
conn.close()
def update_lead_metadata(lead_id, meta_data):
"""Helper to update metadata for existing leads (repair)"""
conn = sqlite3.connect(DB_PATH)
c = conn.cursor()
c.execute('UPDATE leads SET lead_metadata = ? WHERE id = ?', (json.dumps(meta_data), lead_id))
conn.commit()
conn.close()
def get_leads():
if not os.path.exists(DB_PATH):
init_db()
@@ -85,3 +126,19 @@ def update_lead_status(lead_id, status, response_draft=None):
c.execute('UPDATE leads SET status = ? WHERE id = ?', (status, lead_id))
conn.commit()
conn.close()
def update_lead_draft(lead_id, draft_text):
"""Saves a generated email draft to the database."""
conn = sqlite3.connect(DB_PATH)
c = conn.cursor()
c.execute('UPDATE leads SET response_draft = ? WHERE id = ?', (draft_text, lead_id))
conn.commit()
conn.close()
def reset_lead(lead_id):
"""Resets a lead to 'new' status and clears enrichment data."""
conn = sqlite3.connect(DB_PATH)
c = conn.cursor()
c.execute('UPDATE leads SET status = "new", enrichment_data = NULL WHERE id = ?', (lead_id,))
conn.commit()
conn.close()

View File

@@ -5,8 +5,9 @@ import sqlite3
# Füge das Hauptverzeichnis zum Python-Pfad hinzu, damit der Connector gefunden wird
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
from company_explorer_connector import handle_company_workflow
from db import get_leads, DB_PATH
from company_explorer_connector import handle_company_workflow, get_company_details
from db import get_leads, DB_PATH, update_lead_metadata
from lookup_role import lookup_person_role
def update_lead_enrichment(lead_id, data, status):
"""Aktualisiert einen Lead in der Datenbank mit neuen Enrichment-Daten und einem neuen Status."""
@@ -18,9 +19,129 @@ def update_lead_enrichment(lead_id, data, status):
conn.close()
print(f"Lead {lead_id} aktualisiert. Neuer Status: {status}")
def refresh_ce_data(lead_id, ce_id):
"""
Holt die aktuellsten Daten (inkl. Analyse-Ergebnis) vom Company Explorer
und aktualisiert den lokalen Lead.
"""
print(f"Refreshing data for CE ID {ce_id}...")
ce_data = get_company_details(ce_id)
# Bestehende Enrichment-Daten holen
leads = get_leads()
lead = next((l for l in leads if l['id'] == lead_id), None)
enrichment_data = {}
if lead and lead.get('enrichment_data'):
try:
enrichment_data = json.loads(lead['enrichment_data'])
except:
pass
enrichment_data.update({
"sync_status": "refreshed",
"ce_id": ce_id,
"message": "Data refreshed from CE",
"ce_data": ce_data
})
update_lead_enrichment(lead_id, enrichment_data, status='synced')
return ce_data
def enrich_contact_role(lead):
"""
Versucht, die Rolle des Kontakts via SerpAPI zu finden und speichert sie in den Metadaten.
"""
meta = {}
if lead.get('lead_metadata'):
try:
meta = json.loads(lead.get('lead_metadata'))
except:
pass
# Skip if we already have a role (and it's not None/Unknown)
if meta.get('role') and meta.get('role') != "Unbekannt":
return meta.get('role')
print(f"Looking up role for {lead['contact_name']} at {lead['company_name']}...")
role = lookup_person_role(lead['contact_name'], lead['company_name'])
if role:
print(f" -> Found role: {role}")
meta['role'] = role
update_lead_metadata(lead['id'], meta)
else:
print(" -> No role found.")
return role
def sync_single_lead(lead_id):
"""
Verarbeitet einen einzelnen Lead: Rolle suchen, CE-Sync, Analyse triggern.
"""
conn = sqlite3.connect(DB_PATH)
conn.row_factory = sqlite3.Row
c = conn.cursor()
c.execute('SELECT * FROM leads WHERE id = ?', (lead_id,))
lead = c.fetchone()
conn.close()
if not lead:
return {"status": "error", "message": "Lead not found"}
lead_dict = dict(lead)
company_name = lead_dict['company_name']
print(f"\n--- Manually Syncing Lead ID: {lead_id}, Company: '{company_name}' ---")
# 1. Contact Enrichment (Role Lookup)
role = enrich_contact_role(lead_dict)
# 2. Prepare Contact Info
meta = {}
if lead_dict.get('lead_metadata'):
try: meta = json.loads(lead_dict['lead_metadata'])
except: pass
# Smarter name splitting if meta is empty (for repaired leads)
full_name = lead_dict.get('contact_name', '')
first_name = meta.get('contact_first')
last_name = meta.get('contact_last')
if not first_name and full_name:
parts = full_name.strip().split(' ')
if len(parts) > 1:
first_name = parts[0]
last_name = ' '.join(parts[1:])
else:
last_name = full_name
first_name = ''
contact_info = {
"first_name": first_name,
"last_name": last_name,
"email": lead_dict['email'],
"job_title": meta.get('role', role),
"role": None, # Set to None so CE can use its RoleMappingService
"is_primary": True
}
# 3. CE Workflow
result = handle_company_workflow(company_name, contact_info=contact_info)
# 4. Save results
enrichment_data = {
"sync_status": result.get("status"),
"ce_id": result.get("data", {}).get("id") if result.get("data") else None,
"message": result.get("message", "Manual sync successful"),
"ce_data": result.get("data")
}
update_lead_enrichment(lead_id, enrichment_data, status='synced')
return result
def run_sync():
"""
Haupt-Synchronisationsprozess.
Haupt-Synchronisationsprozess (Batch).
Holt alle neuen Leads und stößt den Company Explorer Workflow für jeden an.
"""
# Hole nur die Leads, die wirklich neu sind und noch nicht verarbeitet wurden
@@ -36,15 +157,33 @@ def run_sync():
company_name = lead['company_name']
print(f"\n--- Processing Lead ID: {lead['id']}, Company: '{company_name}' ---")
# Rufe den zentralen Workflow auf, den wir im Connector definiert haben
# Diese Funktion kümmert sich um alles: prüfen, erstellen, discovern, pollen, analysieren
result = handle_company_workflow(company_name)
# 1. Contact Enrichment (Role Lookup via SerpAPI)
role = enrich_contact_role(lead)
# 2. Prepare Contact Info for CE
meta = {}
if lead.get('lead_metadata'):
try:
meta = json.loads(lead.get('lead_metadata'))
except:
pass
contact_info = {
"first_name": meta.get('contact_first', ''),
"last_name": meta.get('contact_last', lead['contact_name'].split(' ')[-1] if lead['contact_name'] else ''),
"email": lead['email'],
"job_title": meta.get('role', role), # The raw title or Gemini result
"role": meta.get('role', role) # Currently mapped to same field
}
# 3. Company Enrichment (CE Workflow with Contact)
result = handle_company_workflow(company_name, contact_info=contact_info)
# Bereite die Daten für die Speicherung in der DB vor
enrichment_data = {
"sync_status": result.get("status"),
"ce_id": result.get("data", {}).get("id") if result.get("data") else None,
"message": result.get("message"),
"message": result.get("message", "Sync successful"),
"ce_data": result.get("data")
}

View File

@@ -0,0 +1,232 @@
import os
import json
import requests
import sqlite3
import re
import datetime
# --- Helper: Get Gemini Key ---
def get_gemini_key():
candidates = [
"gemini_api_key.txt", # Current dir
"/app/gemini_api_key.txt", # Docker default
os.path.join(os.path.dirname(__file__), "gemini_api_key.txt"), # Script dir
os.path.join(os.path.dirname(os.path.dirname(__file__)), 'gemini_api_key.txt') # Parent dir
]
for path in candidates:
if os.path.exists(path):
try:
with open(path, 'r') as f:
return f.read().strip()
except:
pass
return os.getenv("GEMINI_API_KEY")
def get_matrix_context(industry_name, persona_name):
"""Fetches Pains, Gains and Arguments from CE Database."""
context = {
"industry_pains": "",
"industry_gains": "",
"persona_description": "",
"persona_arguments": ""
}
db_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), 'companies_v3_fixed_2.db')
if not os.path.exists(db_path):
return context
try:
conn = sqlite3.connect(db_path)
c = conn.cursor()
# Get Industry Data
c.execute('SELECT pains, gains FROM industries WHERE name = ?', (industry_name,))
ind_res = c.fetchone()
if ind_res:
context["industry_pains"], context["industry_gains"] = ind_res
# Get Persona Data
c.execute('SELECT description, convincing_arguments FROM personas WHERE name = ?', (persona_name,))
per_res = c.fetchone()
if per_res:
context["persona_description"], context["persona_arguments"] = per_res
conn.close()
except Exception as e:
print(f"DB Error in matrix lookup: {e}")
return context
def get_suggested_date():
"""Calculates a suggested meeting date (3-4 days in future, avoiding weekends)."""
now = datetime.datetime.now()
# Jump 3 days ahead
suggested = now + datetime.timedelta(days=3)
# If weekend, move to Monday
if suggested.weekday() == 5: # Saturday
suggested += datetime.timedelta(days=2)
elif suggested.weekday() == 6: # Sunday
suggested += datetime.timedelta(days=1)
days_de = ["Montag", "Dienstag", "Mittwoch", "Donnerstag", "Freitag", "Samstag", "Sonntag"]
return f"{days_de[suggested.weekday()]}, den {suggested.strftime('%d.%m.')} um 10:00 Uhr"
def clean_company_name(name):
"""Removes legal suffixes like GmbH, AG, etc. for a more personal touch."""
if not name: return ""
# Remove common German legal forms
cleaned = re.sub(r'\s+(GmbH|AG|GmbH\s+&\s+Co\.\s+KG|KG|e\.V\.|e\.K\.|Limited|Ltd|Inc)\.?(?:\s|$)', '', name, flags=re.IGNORECASE)
return cleaned.strip()
def get_qualitative_area_description(area_str):
"""Converts a string with area information into a qualitative description."""
nums = re.findall(r'\d+', area_str.replace('.', '').replace(',', ''))
area_val = int(nums[0]) if nums else 0
if area_val >= 10000:
return "sehr große Flächen"
if area_val >= 5000:
return "große Flächen"
if area_val >= 1000:
return "mittlere Flächen"
if area_val > 0:
return "kleine bis mittlere Flächen"
return "Ihre Flächen" # Fallback
def get_multi_solution_recommendation(area_str, purpose_str):
"""
Selects a range of robots based on surface area AND requested purposes.
"""
recommendations = []
purpose_lower = purpose_str.lower()
# 1. Cleaning Logic (Area based)
nums = re.findall(r'\d+', area_str.replace('.', '').replace(',', ''))
area_val = int(nums[0]) if nums else 0
if "reinigung" in purpose_lower:
if area_val >= 5000 or "über 10.000" in area_str:
recommendations.append("den Scrubber 75 als industrielles Kraftpaket für Ihre Großflächen")
elif area_val >= 1000:
recommendations.append("den Scrubber 50 oder Phantas für eine wendige und gründliche Bodenreinigung")
else:
recommendations.append("den Phantas oder Pudu CC1 für eine effiziente Reinigung Ihrer Räumlichkeiten")
# 2. Service/Transport Logic
if any(word in purpose_lower for word in ["servieren", "abräumen", "speisen", "getränke"]):
recommendations.append("den BellaBot zur Entlastung Ihres Teams beim Transport von Speisen und Getränken")
# 3. Marketing/Interaction Logic
if any(word in purpose_lower for word in ["marketing", "gästebetreuung", "kundenansprache"]):
recommendations.append("den KettyBot als interaktiven Begleiter für Marketing und Patienteninformation")
if not recommendations:
recommendations.append("unsere wendigen Allrounder wie den Phantas")
return {
"solution_text": " und ".join(recommendations),
"has_multi": len(recommendations) > 1
}
def generate_email_draft(lead_data, company_data, booking_link="[IHR BUCHUNGSLINK]"):
"""
Generates a high-end, personalized sales email using Gemini API and Matrix knowledge.
"""
api_key = get_gemini_key()
if not api_key:
return "Error: Gemini API Key not found."
# Extract Data from Lead Engine
company_raw = lead_data.get('company_name', 'Interessent')
company_name = clean_company_name(company_raw)
contact_name = lead_data.get('contact_name', 'Damen und Herren')
# Metadata from Lead
meta = {}
if lead_data.get('lead_metadata'):
try: meta = json.loads(lead_data['lead_metadata'])
except: pass
area = meta.get('area', 'Unbekannte Fläche')
purpose = meta.get('purpose', 'Reinigung')
role = meta.get('role', 'Wirtschaftlicher Entscheider')
salutation = meta.get('salutation', 'Damen und Herren')
cleaning_functions = meta.get('cleaning_functions', '')
# Data from Company Explorer
ce_summary = company_data.get('research_dossier') or company_data.get('summary', '')
ce_vertical = company_data.get('industry_ai') or company_data.get('vertical', 'Healthcare')
ce_opener = company_data.get('ai_opener', '')
# Multi-Solution Logic
solution = get_multi_solution_recommendation(area, purpose)
qualitative_area = get_qualitative_area_description(area)
suggested_date = get_suggested_date()
# Fetch "Golden Records" from Matrix
matrix = get_matrix_context(ce_vertical, role)
# Prompt Engineering for "Unwiderstehliche E-Mail"
prompt = f"""
Du bist ein Senior Sales Executive bei Robo-Planet. Antworte auf eine Anfrage von Tradingtwins.
Schreibe eine E-Mail auf "Human Expert Level".
WICHTIGE IDENTITÄT:
- Anrede-Form: {salutation} (z.B. Herr, Frau)
- Name: {contact_name}
- Firma: {company_name}
STRATEGIE:
- STARTE DIREKT mit dem strategischen Aufhänger aus dem Company Explorer ({ce_opener}). Baue daraus den ersten Absatz.
- KEIN "mit großem Interesse verfolge ich..." oder ähnliche Phrasen. Das wirkt unnatürlich.
- Deine Mail reagiert auf die Anfrage zu: {purpose} für {qualitative_area}.
- Fasse die vorgeschlagene Lösung ({solution['solution_text']}) KOMPAKT zusammen. Wir bieten ein ganzheitliches Entlastungskonzept an, keine Detail-Auflistung von Datenblättern.
KONTEXT:
- Branche: {ce_vertical}
- Pains aus Matrix: {matrix['industry_pains']}
- Dossier/Wissen: {ce_summary}
- Strategischer Aufhänger (CE-Opener): {ce_opener}
AUFGABE:
1. ANREDE: Persönlich.
2. EINSTIEG: Nutze den inhaltlichen Kern von: "{ce_opener}".
3. DER ÜBERGANG: Verknüpfe dies mit der Anfrage zu {purpose}. Erkläre, dass manuelle Prozesse bei {qualitative_area} angesichts der Dokumentationspflichten und des Fachkräftemangels zum Risiko werden.
4. DIE LÖSUNG: Schlage die Kombination aus {solution['solution_text']} als integriertes Konzept vor, um das Team in Reinigung, Service und Patientenansprache spürbar zu entlasten.
5. ROI: Sprich kurz die Amortisation (18-24 Monate) an als Argument für den wirtschaftlichen Entscheider.
6. CTA: Schlag konkret den {suggested_date} vor. Alternativ: {booking_link}
STIL: Senior, lösungsorientiert, direkt. Keine unnötigen Füllwörter.
FORMAT:
Betreff: [Prägnant, z.B. Automatisierungskonzept für {company_name}]
[E-Mail Text]
"""
# Call Gemini API
url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key={api_key}"
headers = {'Content-Type': 'application/json'}
payload = {"contents": [{"parts": [{"text": prompt}]}]}
try:
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
result = response.json()
return result['candidates'][0]['content']['parts'][0]['text']
except Exception as e:
return f"Error generating draft: {str(e)}"
if __name__ == "__main__":
# Test Mock
mock_lead = {
"company_name": "Klinikum Test",
"contact_name": "Dr. Müller",
"lead_metadata": json.dumps({"area": "5000 qm", "purpose": "Desinfektion und Boden", "city": "Berlin"})
}
mock_company = {
"vertical": "Healthcare / Krankenhaus",
"summary": "Ein großes Klinikum der Maximalversorgung mit Fokus auf Kardiologie."
}
print(generate_email_draft(mock_lead, mock_company))

View File

@@ -1,4 +1,5 @@
import re
from datetime import datetime
from db import insert_lead
def parse_tradingtwins_email(body):
@@ -28,6 +29,108 @@ def parse_tradingtwins_email(body):
data['raw_body'] = body
return data
def is_free_mail(email_addr):
"""Checks if an email belongs to a known free-mail provider."""
if not email_addr: return False
free_domains = {
'gmail.com', 'googlemail.com', 'outlook.com', 'hotmail.com', 'live.com',
'msn.com', 'icloud.com', 'me.com', 'mac.com', 'yahoo.com', 'ymail.com',
'rocketmail.com', 'gmx.de', 'gmx.net', 'web.de', 't-online.de',
'freenet.de', 'mail.com', 'protonmail.com', 'proton.me', 'online.de'
}
domain = email_addr.split('@')[-1].lower()
return domain in free_domains
def parse_tradingtwins_html(html_body):
"""
Extracts data from the Tradingtwins HTML table structure.
Pattern: <p ...>Label:</p>...<p ...>Value</p>
"""
data = {}
# Map label names in HTML to our keys
field_map = {
'Firma': 'company',
'Vorname': 'contact_first',
'Nachname': 'contact_last',
'Anrede': 'salutation',
'E-Mail': 'email',
'Rufnummer': 'phone',
'Einsatzzweck': 'purpose',
'Reinigungs-Funktionen': 'cleaning_functions',
'Reinigungs-Fläche': 'area',
'PLZ': 'zip',
'Stadt': 'city',
'Lead-ID': 'source_id'
}
for label, key in field_map.items():
pattern = fr'>\s*{re.escape(label)}:\s*</p>.*?<p[^>]*>(.*?)</p>'
match = re.search(pattern, html_body, re.DOTALL | re.IGNORECASE)
if match:
raw_val = match.group(1).strip()
clean_val = re.sub(r'<[^>]+>', '', raw_val).strip()
data[key] = clean_val
# Composite fields
if data.get('contact_first') and data.get('contact_last'):
data['contact'] = f"{data['contact_first']} {data['contact_last']}"
# Quality Check: Free mail or missing company
email = data.get('email', '')
company = data.get('company', '-')
data['is_free_mail'] = is_free_mail(email)
data['is_low_quality'] = data['is_free_mail'] or company == '-' or not company
# Ensure source_id is present and map to 'id' for db.py compatibility
if not data.get('source_id'):
data['source_id'] = f"tt_unknown_{int(datetime.now().timestamp())}"
data['id'] = data['source_id'] # db.py expects 'id' for source_id column
return data
def parse_roboplanet_form(html_body):
"""
Parses the Roboplanet website contact form (HTML format).
Example: <b>Vorname:</b> Gordana <br><b>Nachname:</b> Dumitrovic <br>...
"""
data = {}
# Map label names in HTML to our keys
field_map = {
'Vorname': 'contact_first',
'Nachname': 'contact_last',
'Email': 'email',
'Telefon': 'phone',
'Firma': 'company',
'PLZ': 'zip',
'Nachricht': 'message'
}
for label, key in field_map.items():
# Pattern: <b>Label:</b> Value <br>
pattern = fr'<b>{re.escape(label)}:</b>\s*(.*?)\s*<br>'
match = re.search(pattern, html_body, re.DOTALL | re.IGNORECASE)
if match:
raw_val = match.group(1).strip()
clean_val = re.sub(r'<[^>]+>', '', raw_val).strip() # Clean any leftover HTML tags
data[key] = clean_val
# Composite fields
if data.get('contact_first') and data.get('contact_last'):
data['contact'] = f"{data['contact_first']} {data['contact_last']}"
# For Roboplanet forms, we use the timestamp as ID or a hash if missing
# We need to ensure 'id' is present for db.py compatibility
if not data.get('source_id'):
data['source_id'] = f"rp_unknown_{int(datetime.now().timestamp())}"
data['id'] = data['source_id']
data['raw_body'] = html_body
return data
def ingest_mock_leads():
# Mock data from the session context
leads = [

129
lead-engine/lookup_role.py Normal file
View File

@@ -0,0 +1,129 @@
import os
import requests
import re
from dotenv import load_dotenv
# Try loading .env only if file exists (Local Dev), otherwise rely on Docker Env
env_path = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '.env'))
if os.path.exists(env_path):
load_dotenv(dotenv_path=env_path, override=True)
SERP_API_KEY = os.getenv("SERP_API")
if not SERP_API_KEY:
print(f"DEBUG: SERP_API not found in environment.")
import json
# --- Helper: Get Gemini Key ---
def get_gemini_key():
candidates = [
"gemini_api_key.txt", # Current dir
"/app/gemini_api_key.txt", # Docker default
os.path.join(os.path.dirname(__file__), "gemini_api_key.txt"), # Script dir
os.path.join(os.path.dirname(os.path.dirname(__file__)), 'gemini_api_key.txt') # Parent dir
]
for path in candidates:
if os.path.exists(path):
try:
with open(path, 'r') as f:
return f.read().strip()
except:
pass
return os.getenv("GEMINI_API_KEY")
def extract_role_with_llm(name, company, search_results):
"""Uses Gemini to identify the job title from search snippets."""
api_key = get_gemini_key()
if not api_key: return None
context = "\n".join([f"- {r.get('title')}: {r.get('snippet')}" for r in search_results])
prompt = f"""
Analyze these Google Search results to identify the professional role of "{name}" at "{company}".
SEARCH RESULTS:
{context}
TASK:
Extract the professional Job Title / Role.
Look for:
- Management: "Geschäftsführer", "Vorstand", "CFO", "Mitglied der Klinikleitung"
- Department Heads: "Leiter", "Bereichsleitung", "Head of", "Pflegedienstleitung"
- Specialized: "Arzt", "Ingenieur", "Einkäufer"
RULES:
1. Extract the most specific and senior current role.
2. Return ONLY the role string (e.g. "Bereichsleitung Patientenmanagement").
3. Maximum length: 60 characters.
4. If no role is found, return "Unbekannt".
"""
url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key={api_key}"
try:
response = requests.post(url, headers={'Content-Type': 'application/json'}, json={"contents": [{"parts": [{"text": prompt}]}]})
if response.status_code == 200:
role = response.json()['candidates'][0]['content']['parts'][0]['text'].strip()
# Remove markdown formatting if any
role = role.replace('**', '').replace('"', '').rstrip('.')
return None if "Unbekannt" in role else role
else:
print(f"DEBUG: Gemini API Error {response.status_code}: {response.text}")
except Exception as e:
print(f"DEBUG: Gemini API Exception: {e}")
return None
def lookup_person_role(name, company):
"""
Searches for a person's role via SerpAPI and extracts it using LLM.
Uses a multi-step search strategy to find the best snippets.
"""
if not SERP_API_KEY:
print("Error: SERP_API key not found in .env")
return None
# Step 1: Highly specific search
queries = [
f'site:linkedin.com "{name}" "{company}"',
f'"{name}" "{company}" position',
f'{name} {company}'
]
all_results = []
for query in queries:
params = {
"engine": "google",
"q": query,
"api_key": SERP_API_KEY,
"num": 3,
"hl": "de",
"gl": "de"
}
try:
response = requests.get("https://serpapi.com/search", params=params)
response.raise_for_status()
data = response.json()
results = data.get("organic_results", [])
if results:
all_results.extend(results)
# If we have good results, we don't necessarily need more searches
if len(all_results) >= 3:
break
except Exception as e:
print(f"SerpAPI lookup failed for query '{query}': {e}")
if not all_results:
return None
# Delegate extraction to LLM with the best results found
return extract_role_with_llm(name, company, all_results)
if __name__ == "__main__":
# Test cases
print(f"Markus Drees: {lookup_person_role('Markus Drees', 'Ärztehaus Rünthe')}")
print(f"Georg Stahl: {lookup_person_role('Georg Stahl', 'Klemm Bohrtechnik GmbH')}")
print(f"Steve Trüby: {lookup_person_role('Steve Trüby', 'RehaKlinikum Bad Säckingen GmbH')}")

83
lead-engine/monitor.py Normal file
View File

@@ -0,0 +1,83 @@
import time
import json
import logging
import os
import sys
# Path setup to import local modules
sys.path.append(os.path.dirname(__file__))
from db import get_leads
from enrich import refresh_ce_data
# Import our new Trading Twins Orchestrator
try:
from trading_twins.orchestrator import TradingTwinsOrchestrator
except ImportError:
# Fallback for dev environment or missing dependencies
TradingTwinsOrchestrator = None
# Setup logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger("lead-monitor")
def run_monitor():
logger.info("Starting Lead Monitor (Polling CE for updates)...")
# Initialize Orchestrator once
orchestrator = TradingTwinsOrchestrator() if TradingTwinsOrchestrator else None
while True:
try:
leads = get_leads()
# Filter leads that are synced but missing analysis data
pending_leads = []
for lead in leads:
if lead['status'] == 'synced':
enrichment = json.loads(lead['enrichment_data']) if lead['enrichment_data'] else {}
ce_data = enrichment.get('ce_data', {})
ce_id = enrichment.get('ce_id')
# If we have a CE ID but no vertical/summary yet, it's pending
vertical = ce_data.get('industry_ai') or ce_data.get('vertical')
if ce_id and (not vertical or vertical == 'None'):
pending_leads.append(lead)
if pending_leads:
logger.info(f"Checking {len(pending_leads)} pending leads for analysis updates...")
for lead in pending_leads:
enrichment = json.loads(lead['enrichment_data'])
ce_id = enrichment['ce_id']
logger.info(f" -> Refreshing Lead {lead['id']} ({lead['company_name']})...")
new_data = refresh_ce_data(lead['id'], ce_id)
new_vertical = new_data.get('industry_ai') or new_data.get('vertical')
if new_vertical and new_vertical != 'None':
logger.info(f" [SUCCESS] Analysis finished for {lead['company_name']}: {new_vertical}")
# Trigger Trading Twins Process
if orchestrator:
logger.info(f" [ACTION] Triggering Trading Twins Orchestrator for {lead['company_name']}...")
try:
# Extract contact details safely
email = lead.get('email')
name = lead.get('contact_name', 'Interessent')
company = lead.get('company_name', 'Ihre Firma')
if email:
orchestrator.process_lead(email, name, company)
else:
logger.warning(f" [SKIP] No email address found for lead {lead['id']}")
except Exception as e:
logger.error(f" [ERROR] Failed to trigger orchestrator: {e}")
else:
logger.warning(" [SKIP] Orchestrator not available (Import Error)")
except Exception as e:
logger.error(f"Monitor error: {e}")
# Wait before next check
time.sleep(30) # Poll every 30 seconds
if __name__ == "__main__":
run_monitor()

View File

@@ -0,0 +1,7 @@
streamlit
pandas
requests
python-dotenv
fastapi
uvicorn[standard]
msal

View File

View File

@@ -0,0 +1,58 @@
from flask import Flask, request, jsonify, render_template_string
from .manager import TradingTwinsManager
app = Flask(__name__)
manager = TradingTwinsManager()
# Einfaches HTML-Template für Feedback
HTML_TEMPLATE = """
<!DOCTYPE html>
<html>
<head>
<title>Trading Twins Status</title>
<style>
body { font-family: sans-serif; text-align: center; padding: 50px; }
.success { color: green; font-size: 24px; }
.cancelled { color: red; font-size: 24px; }
.info { color: gray; margin-top: 20px; }
</style>
</head>
<body>
<h1>{title}</h1>
<p class="{status_class}">{message}</p>
<p class="info">Job ID: {job_uuid}</p>
</body>
</html>
"""
@app.route('/action/approve/<job_uuid>', methods=['GET'])
def approve_job(job_uuid):
current_status = manager.get_job_status(job_uuid)
if not current_status:
return render_template_string(HTML_TEMPLATE, title="Fehler", status_class="cancelled", message="Job nicht gefunden.", job_uuid=job_uuid), 404
if current_status == 'pending':
manager.update_job_status(job_uuid, 'approved')
# TODO: Hier würde der E-Mail-Versand sofort getriggert werden (Phase 3)
return render_template_string(HTML_TEMPLATE, title="Erfolg", status_class="success", message="✅ E-Mail wird jetzt versendet!", job_uuid=job_uuid)
elif current_status == 'approved':
return render_template_string(HTML_TEMPLATE, title="Info", status_class="success", message="⚠️ Job wurde bereits genehmigt.", job_uuid=job_uuid)
else:
return render_template_string(HTML_TEMPLATE, title="Info", status_class="cancelled", message=f"Job-Status ist bereits: {current_status}", job_uuid=job_uuid)
@app.route('/action/cancel/<job_uuid>', methods=['GET'])
def cancel_job(job_uuid):
current_status = manager.get_job_status(job_uuid)
if not current_status:
return render_template_string(HTML_TEMPLATE, title="Fehler", status_class="cancelled", message="Job nicht gefunden.", job_uuid=job_uuid), 404
if current_status == 'pending':
manager.update_job_status(job_uuid, 'cancelled')
return render_template_string(HTML_TEMPLATE, title="Abbruch", status_class="cancelled", message="❌ E-Mail-Versand gestoppt.", job_uuid=job_uuid)
else:
return render_template_string(HTML_TEMPLATE, title="Info", status_class="info", message=f"Job konnte nicht gestoppt werden (Status: {current_status}).", job_uuid=job_uuid)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8004)

View File

@@ -0,0 +1,85 @@
import os
import msal
import requests
import base64
# Graph API Konfiguration (aus .env laden)
CLIENT_ID = os.getenv("AZURE_CLIENT_ID")
CLIENT_SECRET = os.getenv("AZURE_CLIENT_SECRET")
TENANT_ID = os.getenv("AZURE_TENANT_ID")
AUTHORITY = f"https://login.microsoftonline.com/{TENANT_ID}"
SCOPE = ["https://graph.microsoft.com/.default"]
SENDER_EMAIL = "info@robo-planet.de"
def get_access_token():
"""Holt ein Token für die Graph API."""
app = msal.ConfidentialClientApplication(
CLIENT_ID, authority=AUTHORITY, client_credential=CLIENT_SECRET
)
result = app.acquire_token_for_client(scopes=SCOPE)
if "access_token" in result:
return result["access_token"]
else:
raise Exception(f"Fehler beim Abrufen des Tokens: {result.get('error_description')}")
def send_email_via_graph(to_email, subject, body_html, banner_path=None):
"""
Sendet eine E-Mail über die Microsoft Graph API.
"""
token = get_access_token()
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
# E-Mail Struktur für Graph API
email_msg = {
"message": {
"subject": subject,
"body": {
"contentType": "HTML",
"content": body_html
},
"toRecipients": [
{
"emailAddress": {
"address": to_email
}
}
],
"from": {
"emailAddress": {
"address": SENDER_EMAIL
}
}
},
"saveToSentItems": "true"
}
# Optional: Banner-Bild als Inline-Attachment einfügen
if banner_path and os.path.exists(banner_path):
with open(banner_path, "rb") as f:
content_bytes = f.read()
content_b64 = base64.b64encode(content_bytes).decode("utf-8")
email_msg["message"]["attachments"] = [
{
"@odata.type": "#microsoft.graph.fileAttachment",
"name": "banner.png",
"contentBytes": content_b64,
"isInline": True,
"contentId": "banner_image"
}
]
endpoint = f"https://graph.microsoft.com/v1.0/users/{SENDER_EMAIL}/sendMail"
response = requests.post(endpoint, headers=headers, json=email_msg)
if response.status_code == 202:
print(f"E-Mail erfolgreich an {to_email} gesendet.")
return True
else:
print(f"Fehler beim Senden: {response.status_code} - {response.text}")
return False

View File

@@ -0,0 +1,318 @@
# lead-engine/trading_twins/manager.py
from email.mime.text import MIMEText
import base64
import requests
import json
import os
import time
from datetime import datetime, timedelta
from zoneinfo import ZoneInfo
from threading import Thread, Lock
import uvicorn
from fastapi import FastAPI, Response
import msal
# --- Zeitzonen-Konfiguration ---
TZ_BERLIN = ZoneInfo("Europe/Berlin")
# --- Konfiguration ---
TEAMS_WEBHOOK_URL = os.getenv("TEAMS_WEBHOOK_URL", "https://wacklergroup.webhook.office.com/webhookb2/fe728cde-790c-4190-b1d3-be393ca0f9bd@6d85a9ef-3878-420b-8f43-38d6cb12b665/IncomingWebhook/e9a8ee6157594a6cab96048cf2ea2232/d26033cd-a81f-41a6-8cd2-b4a3ba0b5a01/V2WFmjcbkMzSU4f6lDSdUOM9VNm7F7n1Th4YDiu3fLZ_Y1")
# Öffentliche URL für Feedback-Links
FEEDBACK_SERVER_BASE_URL = os.getenv("FEEDBACK_SERVER_BASE_URL", "https://floke-ai.duckdns.org/feedback")
DEFAULT_WAIT_MINUTES = 5
SENDER_EMAIL = os.getenv("SENDER_EMAIL", "info@robo-planet.de")
TEST_RECEIVER_EMAIL = "floke.com@gmail.com" # Für E2E Tests
SIGNATURE_FILE_PATH = "/app/trading_twins/signature.html"
# Credentials für die Haupt-App (E-Mail & Kalender info@)
AZURE_CLIENT_ID = os.getenv("INFO_Application_ID")
AZURE_CLIENT_SECRET = os.getenv("INFO_Secret")
AZURE_TENANT_ID = os.getenv("INFO_Tenant_ID")
# Credentials für die Kalender-Lese-App (e.melcer)
CAL_APPID = os.getenv("CAL_APPID")
CAL_SECRET = os.getenv("CAL_SECRET")
CAL_TENNANT_ID = os.getenv("CAL_TENNANT_ID")
GRAPH_API_ENDPOINT = "https://graph.microsoft.com/v1.0"
# --- In-Memory-Speicher ---
# Wir speichern hier Details zu jeder Anfrage, um beim Klick auf den Slot reagieren zu können.
request_status_storage = {}
_lock = Lock()
# --- Auth Helper ---
def get_access_token(client_id, client_secret, tenant_id):
if not all([client_id, client_secret, tenant_id]):
return None
authority = f"https://login.microsoftonline.com/{tenant_id}"
app = msal.ConfidentialClientApplication(client_id=client_id, authority=authority, client_credential=client_secret)
result = app.acquire_token_silent(["https://graph.microsoft.com/.default"], account=None)
if not result:
result = app.acquire_token_for_client(scopes=["https://graph.microsoft.com/.default"])
return result.get('access_token')
# --- KALENDER LOGIK ---
def get_availability(target_email: str, app_creds: tuple) -> tuple:
"""Holt die Verfügbarkeit für eine E-Mail über die angegebene App."""
token = get_access_token(*app_creds)
if not token: return None
headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json", "Prefer": 'outlook.timezone="Europe/Berlin"'}
# Basis: Heute 00:00 Uhr
start_time = datetime.now(TZ_BERLIN).replace(hour=0, minute=0, second=0, microsecond=0)
end_time = start_time + timedelta(days=3) # 3 Tage Vorschau
payload = {
"schedules": [target_email],
"startTime": {"dateTime": start_time.strftime("%Y-%m-%dT%H:%M:%S"), "timeZone": "Europe/Berlin"},
"endTime": {"dateTime": end_time.strftime("%Y-%m-%dT%H:%M:%S"), "timeZone": "Europe/Berlin"},
"availabilityViewInterval": 60
}
try:
response = requests.post(f"{GRAPH_API_ENDPOINT}/users/{target_email}/calendar/getSchedule", headers=headers, json=payload)
if response.status_code == 200:
view = response.json()['value'][0].get('availabilityView', '')
# start_time ist wichtig für die Berechnung in find_slots
return start_time, view, 60
except: pass
return None
def round_to_next_quarter_hour(dt: datetime) -> datetime:
"""Rundet eine Zeit auf die nächste volle Viertelstunde auf."""
minutes = (dt.minute // 15 + 1) * 15
rounded = dt.replace(minute=0, second=0, microsecond=0) + timedelta(minutes=minutes)
return rounded
def find_slots(start_time_base: datetime, view: str, interval: int) -> list:
"""
Findet zwei intelligente Slots basierend auf der Verfügbarkeit.
start_time_base: Der Beginn der availabilityView (meist 00:00 Uhr heute)
"""
suggestions = []
now = datetime.now(TZ_BERLIN)
# Frühestmöglicher Termin: Jetzt + 15 Min Puffer, gerundet auf Viertelstunde
earliest_possible = round_to_next_quarter_hour(now + timedelta(minutes=15))
def is_slot_free(dt: datetime):
"""Prüft, ob der 60-Minuten-Block, der diesen Zeitpunkt enthält, frei ist."""
# Index in der View berechnen
offset = dt - start_time_base
hours_offset = int(offset.total_seconds() // 3600)
if 0 <= hours_offset < len(view):
return view[hours_offset] == '0' # '0' bedeutet Free
return False
# 1. Slot 1: Nächstmöglicher freier Termin
current_search = earliest_possible
while len(suggestions) < 1 and (current_search - now).days < 3:
# Nur Werktags (Mo-Fr), zwischen 09:00 und 17:00
if current_search.weekday() < 5 and 9 <= current_search.hour < 17:
if is_slot_free(current_search):
suggestions.append(current_search)
break
# Weiterspringen
current_search += timedelta(minutes=15)
# Wenn wir 17 Uhr erreichen, springe zum nächsten Tag 09:00
if current_search.hour >= 17:
current_search += timedelta(days=1)
current_search = current_search.replace(hour=9, minute=0)
if not suggestions:
return []
first_slot = suggestions[0]
# 2. Slot 2: Alternative (Nachmittag oder Folgetag)
# Ziel: 2-3 Stunden später
target_slot_2 = first_slot + timedelta(hours=2.5)
target_slot_2 = round_to_next_quarter_hour(target_slot_2)
# Suchstart für Slot 2
current_search = target_slot_2
while len(suggestions) < 2 and (current_search - now).days < 4:
# Kriterien für Slot 2:
# - Muss frei sein
# - Muss Werktag sein
# - Bevorzugt Nachmittag (13:00 - 16:30), außer wir sind schon am Folgetag, dann ab 9:00
is_working_hours = 9 <= current_search.hour < 17
is_afternoon = 13 <= current_search.hour < 17
is_next_day = current_search.date() > first_slot.date()
# Wir nehmen den Slot, wenn:
# a) Er am selben Tag nachmittags ist
# b) ODER er am nächsten Tag zu einer vernünftigen Zeit ist (falls wir heute zu spät sind)
valid_time = (current_search.date() == first_slot.date() and is_afternoon) or (is_next_day and is_working_hours)
if current_search.weekday() < 5 and valid_time:
if is_slot_free(current_search):
suggestions.append(current_search)
break
current_search += timedelta(minutes=15)
if current_search.hour >= 17:
current_search += timedelta(days=1)
current_search = current_search.replace(hour=9, minute=0)
return suggestions
def create_calendar_invite(lead_email: str, company_name: str, start_time: datetime):
"""Sendet eine echte Outlook-Kalendereinladung aus dem info@-Kalender."""
# Wir erstellen den Termin bei info@ (SENDER_EMAIL), da wir dort Schreibrechte haben sollten.
target_organizer = SENDER_EMAIL
print(f"INFO: Creating calendar invite for {lead_email} in {target_organizer}'s calendar")
token = get_access_token(AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID)
if not token: return False
headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
end_time = start_time + timedelta(minutes=15)
event_payload = {
"subject": f"Kennenlerngespräch RoboPlanet <> {company_name}",
"body": {"contentType": "HTML", "content": f"Hallo, <br><br>vielen Dank für die Terminbuchung über unsere Lead-Engine. Wir freuen uns auf das Gespräch!<br><br>Beste Grüße,<br>RoboPlanet Team"},
"start": {"dateTime": start_time.strftime("%Y-%m-%dT%H:%M:%S"), "timeZone": "Europe/Berlin"},
"end": {"dateTime": end_time.strftime("%Y-%m-%dT%H:%M:%S"), "timeZone": "Europe/Berlin"},
"location": {"displayName": "Microsoft Teams Meeting"},
"attendees": [
{"emailAddress": {"address": lead_email, "name": "Interessent"}, "type": "required"},
{"emailAddress": {"address": "e.melcer@robo-planet.de", "name": "Elizabeta Melcer"}, "type": "required"}
],
"isOnlineMeeting": True,
"onlineMeetingProvider": "teamsForBusiness"
}
# URL zeigt auf info@ Kalender
url = f"{GRAPH_API_ENDPOINT}/users/{target_organizer}/calendar/events"
try:
resp = requests.post(url, headers=headers, json=event_payload)
if resp.status_code in [200, 201]:
print(f"SUCCESS: Calendar event created for {target_organizer}.")
return True
else:
print(f"ERROR: Failed to create event. HTTP {resp.status_code}: {resp.text}")
return False
except Exception as e:
print(f"EXCEPTION during event creation: {e}")
return False
# --- E-MAIL & WEB LOGIK ---
def generate_booking_html(request_id: str, suggestions: list) -> str:
html = "<p>Bitte wählen Sie einen passenden Termin für ein 15-minütiges Kennenlerngespräch:</p><ul>"
for slot in suggestions:
ts = int(slot.timestamp())
# Link zu unserem eigenen Bestätigungs-Endpunkt
link = f"{FEEDBACK_SERVER_BASE_URL}/book_slot/{request_id}/{ts}"
html += f'<li><a href="{link}" style="font-weight: bold; color: #0078d4;">{slot.strftime("%d.%m. um %H:%M Uhr")}</a></li>'
html += "</ul><p>Mit Klick auf einen Termin wird automatisch eine Kalendereinladung an Sie versendet.</p>"
return html
# --- Server & API ---
app = FastAPI()
@app.get("/stop/{request_id}")
async def stop(request_id: str):
with _lock:
if request_id in request_status_storage:
request_status_storage[request_id]["status"] = "cancelled"
return Response("<html><body><h1>Versand gestoppt.</h1></body></html>", media_type="text/html")
return Response("Ungültig.", status_code=404)
@app.get("/send_now/{request_id}")
async def send_now(request_id: str):
with _lock:
if request_id in request_status_storage:
request_status_storage[request_id]["status"] = "send_now"
return Response("<html><body><h1>E-Mail wird sofort versendet.</h1></body></html>", media_type="text/html")
return Response("Ungültig.", status_code=404)
@app.get("/book_slot/{request_id}/{ts}")
async def book_slot(request_id: str, ts: int):
slot_time = datetime.fromtimestamp(ts, tz=TZ_BERLIN)
with _lock:
data = request_status_storage.get(request_id)
if not data: return Response("Anfrage nicht gefunden.", status_code=404)
if data.get("booked"): return Response("<html><body><h1>Termin wurde bereits bestätigt.</h1></body></html>", media_type="text/html")
data["booked"] = True
# Einladung senden
success = create_calendar_invite(data['receiver'], data['company'], slot_time)
if success:
return Response(f"<html><body><h1>Vielen Dank!</h1><p>Die Einladung für den <b>{slot_time.strftime('%d.%m. um %H:%M')}</b> wurde an {data['receiver']} versendet.</p></body></html>", media_type="text/html")
return Response("Fehler beim Erstellen des Termins.", status_code=500)
# --- Haupt Workflow ---
def send_email(subject, body, to_email, signature):
token = get_access_token(AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID)
headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
payload = {"message": {"subject": subject, "body": {"contentType": "HTML", "content": body + signature}, "toRecipients": [{"emailAddress": {"address": to_email}}]}, "saveToSentItems": "true"}
requests.post(f"{GRAPH_API_ENDPOINT}/users/{SENDER_EMAIL}/sendMail", headers=headers, json=payload)
def process_lead(request_id: str, company: str, opener: str, receiver: str):
# 1. Freie Slots finden (Check bei e.melcer UND info)
print(f"INFO: Searching slots for {company}...")
# Wir nehmen hier e.melcer als Referenz für die Zeit
cal_data = get_availability("e.melcer@robo-planet.de", (CAL_APPID, CAL_SECRET, CAL_TENNANT_ID))
suggestions = find_slots(*cal_data) if cal_data else []
with _lock:
request_status_storage[request_id] = {"status": "pending", "company": company, "receiver": receiver, "slots": suggestions}
# 2. Teams Notification
send_time = datetime.now(TZ_BERLIN) + timedelta(minutes=DEFAULT_WAIT_MINUTES)
card = {
"type": "message", "attachments": [{"contentType": "application/vnd.microsoft.card.adaptive", "content": {
"type": "AdaptiveCard", "version": "1.4", "body": [
{"type": "TextBlock", "text": f"🤖 E-Mail an {company} ({receiver}) geplant für {send_time.strftime('%H:%M')}", "weight": "Bolder"},
{"type": "TextBlock", "text": f"Vorgeschlagene Slots: {', '.join([s.strftime('%H:%M') for s in suggestions])}", "isSubtle": True}
],
"actions": [
{"type": "Action.OpenUrl", "title": "❌ STOP", "url": f"{FEEDBACK_SERVER_BASE_URL}/stop/{request_id}"},
{"type": "Action.OpenUrl", "title": "✅ JETZT", "url": f"{FEEDBACK_SERVER_BASE_URL}/send_now/{request_id}"}
]
}}]
}
requests.post(TEAMS_WEBHOOK_URL, json=card)
# 3. Warten
while datetime.now(TZ_BERLIN) < send_time:
with _lock:
if request_status_storage[request_id]["status"] in ["cancelled", "send_now"]:
break
time.sleep(5)
# 4. Senden
with _lock:
if request_status_storage[request_id]["status"] == "cancelled": return
print(f"INFO: Sending lead email to {receiver}...")
booking_html = generate_booking_html(request_id, suggestions)
with open(SIGNATURE_FILE_PATH, 'r') as f: sig = f.read()
body = f"<p>Sehr geehrte Damen und Herren,</p><p>{opener}</p>{booking_html}"
send_email(f"Ihr Kontakt mit RoboPlanet - {company}", body, receiver, sig)
if __name__ == "__main__":
# Starte den API-Server im Hintergrund
Thread(target=lambda: uvicorn.run(app, host="0.0.0.0", port=8004), daemon=True).start()
print("INFO: Trading Twins Feedback Server started on port 8004.")
time.sleep(2)
# Optional: E2E Test Lead auslösen
if os.getenv("RUN_TEST_LEAD") == "true":
print("\n--- Running E2E Test Lead ---")
process_lead(f"req_{int(time.time())}", "Testfirma GmbH", "Wir haben Ihre Anfrage erhalten.", TEST_RECEIVER_EMAIL)
print("\n[PROD] Manager is active and waiting for leads via import or API.")
try:
while True: time.sleep(1)
except KeyboardInterrupt:
print("Shutting down.")

View File

@@ -0,0 +1,45 @@
from sqlalchemy import create_engine, Column, Integer, String, DateTime, ForeignKey, Boolean
from sqlalchemy.orm import declarative_base, relationship, sessionmaker
from datetime import datetime
Base = declarative_base()
class ProposalJob(Base):
__tablename__ = 'proposal_jobs'
id = Column(Integer, primary_key=True)
job_uuid = Column(String, unique=True, nullable=False) # Für die API-Links
customer_email = Column(String, nullable=False)
customer_name = Column(String, nullable=True)
customer_company = Column(String, nullable=True)
# Status-Tracking
status = Column(String, default='pending') # pending, approved, rejected, sent, failed
# Audit-Trail
created_at = Column(DateTime, default=datetime.now)
updated_at = Column(DateTime, default=datetime.now, onupdate=datetime.now)
approved_at = Column(DateTime, nullable=True)
# Verknüpfung zu den vorgeschlagenen Slots
slots = relationship("ProposedSlot", back_populates="job")
class ProposedSlot(Base):
__tablename__ = 'proposed_slots'
id = Column(Integer, primary_key=True)
job_id = Column(Integer, ForeignKey('proposal_jobs.id'))
start_time = Column(DateTime, nullable=False)
end_time = Column(DateTime, nullable=False)
# Wir brauchen kein 'is_blocked' Flag mehr, da wir dynamisch zählen,
# wie oft 'start_time' in den letzten 24h verwendet wurde.
job = relationship("ProposalJob", back_populates="slots")
# DB Setup Helper
def init_db(db_path='sqlite:///trading_twins/trading_twins.db'):
engine = create_engine(db_path)
Base.metadata.create_all(engine)
return sessionmaker(bind=engine)

View File

@@ -0,0 +1,130 @@
import time
import threading
import logging
import datetime
from .manager import TradingTwinsManager
from .teams_notification import send_approval_card
from .email_sender import send_email_via_graph
import os
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("TradingTwinsOrchestrator")
TIMEOUT_SECONDS = 300 # 5 Minuten
SIGNATURE_FILE = "trading_twins/signature.html"
BANNER_IMAGE = "trading_twins/RoboPlanetBannerWebinarEinladung.png"
class TradingTwinsOrchestrator:
def __init__(self):
self.manager = TradingTwinsManager()
def process_lead(self, customer_email, customer_name, customer_company):
"""
Startet den gesamten Prozess für einen Lead.
"""
logger.info(f"Neuer Lead eingegangen: {customer_email}")
# 1. Job und Slots erstellen (mit Faktor-3 Logik)
job_uuid, slots = self.manager.create_proposal_job(
customer_email, customer_name, customer_company
)
logger.info(f"Job erstellt: {job_uuid}. Slots: {slots}")
# 2. Teams Benachrichtigung senden
# Formatieren der Uhrzeit für die Teams-Nachricht
send_time = (datetime.datetime.now() + datetime.timedelta(seconds=TIMEOUT_SECONDS)).strftime("%H:%M")
success = send_approval_card(job_uuid, customer_name, send_time)
if not success:
logger.error("Konnte Teams-Benachrichtigung nicht senden!")
# Fallback? Trotzdem Timer starten oder abbrechen?
# Wir machen weiter, da E-Mail-Versand Priorität hat.
# 3. Timer starten für automatischen Versand
timer = threading.Timer(TIMEOUT_SECONDS, self._check_timeout, args=[job_uuid])
timer.start()
return job_uuid
def _check_timeout(self, job_uuid):
"""
Wird nach Ablauf des Timers aufgerufen.
Prüft den Status und sendet ggf. automatisch.
"""
logger.info(f"Timer abgelaufen für Job {job_uuid}. Prüfe Status...")
current_status = self.manager.get_job_status(job_uuid)
if current_status == 'pending':
logger.info(f"Job {job_uuid} ist noch 'pending'. Löse automatischen Versand aus.")
self._trigger_email_send(job_uuid)
elif current_status == 'approved':
logger.info(f"Job {job_uuid} wurde bereits manuell genehmigt.")
elif current_status == 'cancelled':
logger.info(f"Job {job_uuid} wurde manuell abgebrochen.")
else:
logger.warning(f"Unbekannter Status für Job {job_uuid}: {current_status}")
def _trigger_email_send(self, job_uuid):
"""
Hier wird der tatsächliche E-Mail-Versand angestoßen.
"""
# Job Details laden
job_details = self.manager.get_job_details(job_uuid)
if not job_details:
logger.error(f"Konnte Job {job_uuid} nicht finden!")
return
# E-Mail Body zusammenbauen
try:
with open(SIGNATURE_FILE, "r") as f:
signature_html = f.read()
except FileNotFoundError:
logger.warning("Signatur-Datei nicht gefunden!")
signature_html = "<br>Viele Grüße<br>Ihr RoboPlanet Team"
# Dynamische Terminvorschläge formatieren
slots_text = ""
for slot in job_details['slots']:
# Format: "Morgen, 14:00 Uhr" oder Datum
start = slot['start']
slots_text += f"<li>{start.strftime('%d.%m.%Y um %H:%M Uhr')}</li>"
email_body = f"""
<html>
<body>
<p>Hallo {job_details['name']},</p>
<p>vielen Dank für Ihr Interesse an Trading Twins.</p>
<p>Gerne würde ich Ihnen in einem kurzen Gespräch (ca. 15-30 Min) zeigen, wie wir Sie unterstützen können.</p>
<p>Hätten Sie an einem dieser Termine Zeit?</p>
<ul>
{slots_text}
</ul>
<p>Ich freue mich auf Ihre Rückmeldung.</p>
{signature_html}
</body>
</html>
"""
# Banner Check
banner_path = BANNER_IMAGE if os.path.exists(BANNER_IMAGE) else None
# Senden
success = send_email_via_graph(
to_email=job_details['email'],
subject="Ihr Termin für Trading Twins",
body_html=email_body,
banner_path=banner_path
)
if success:
logger.info(f"🚀 E-MAIL WURDE VERSENDET für Job {job_uuid}")
self.manager.update_job_status(job_uuid, 'sent')
else:
logger.error(f"❌ Fehler beim E-Mail-Versand für Job {job_uuid}")
self.manager.update_job_status(job_uuid, 'failed')
if __name__ == "__main__":
# Test-Lauf
orchestrator = TradingTwinsOrchestrator()
orchestrator.process_lead("test@example.com", "Max Mustermann", "Musterfirma GmbH")

View File

@@ -0,0 +1,40 @@
<!DOCTYPE html>
<html lang="de">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>E-Mail Signatur</title>
</head>
<body>
<!--
HINWEIS:
Dieser Inhalt wird von der IT-Abteilung bereitgestellt.
Bitte den finalen HTML-Code hier einfügen.
Das Bild 'RoboPlanetBannerWebinarEinladung.png' muss sich im selben Verzeichnis befinden.
[31988f42]
-->
<p>Freundliche Grüße</p>
<p>
<b>Elizabeta Melcer</b><br>
Inside Sales Managerin
</p>
<p>
<!-- Wackler Logo -->
<b>RoboPlanet GmbH</b><br>
Schatzbogen 39, 81829 München<br>
T: +49 89 420490-402 | M: +49 175 8334071<br>
<a href="mailto:e.melcer@robo-planet.de">e.melcer@robo-planet.de</a> | <a href="http://www.robo-planet.de">www.robo-planet.de</a>
</p>
<p>
<a href="#">LinkedIn</a> | <a href="#">Instagram</a> | <a href="#">Newsletteranmeldung</a>
</p>
<p style="font-size: smaller; color: grey;">
Sitz der Gesellschaft München | Geschäftsführung: Axel Banoth<br>
Registergericht AG München, HRB 296113 | USt.-IdNr. DE400464410<br>
<a href="#">Hinweispflichten zum Datenschutz</a>
</p>
<p>
<img src="RoboPlanetBannerWebinarEinladung.png" alt="RoboPlanet Webinar Einladung">
</p>
</body>
</html>

View File

@@ -0,0 +1,70 @@
import requests
import json
import os
from datetime import datetime
# Default-Webhook (Platzhalter) - sollte in .env stehen
DEFAULT_WEBHOOK_URL = os.getenv("TEAMS_WEBHOOK_URL", "")
def send_approval_card(job_uuid, customer_name, time_string, webhook_url=DEFAULT_WEBHOOK_URL):
"""
Sendet eine Adaptive Card an Teams mit Approve/Deny Buttons.
"""
# Die URL unserer API (muss von außen erreichbar sein, z.B. via ngrok oder Server-IP)
api_base_url = os.getenv("API_BASE_URL", "http://localhost:8004")
card_payload = {
"type": "message",
"attachments": [
{
"contentType": "application/vnd.microsoft.card.adaptive",
"contentUrl": None,
"content": {
"$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
"type": "AdaptiveCard",
"version": "1.4",
"body": [
{
"type": "TextBlock",
"text": f"🤖 Automatisierte E-Mail an {customer_name}",
"weight": "Bolder",
"size": "Medium"
},
{
"type": "TextBlock",
"text": f"(via Trading Twins) wird um {time_string} Uhr ausgesendet.",
"isSubtle": True,
"wrap": True
},
{
"type": "TextBlock",
"text": "Wenn Du bis dahin NICHT reagierst, wird die E-Mail automatisch gesendet.",
"color": "Attention",
"wrap": True
}
],
"actions": [
{
"type": "Action.OpenUrl",
"title": "✅ JETZT Aussenden",
"url": f"{api_base_url}/action/approve/{job_uuid}"
},
{
"type": "Action.OpenUrl",
"title": "❌ STOP Aussendung",
"url": f"{api_base_url}/action/cancel/{job_uuid}"
}
]
}
}
]
}
try:
response = requests.post(webhook_url, json=card_payload)
response.raise_for_status()
return True
except Exception as e:
print(f"Fehler beim Senden an Teams: {e}")
return False

View File

@@ -0,0 +1,139 @@
import unittest
from unittest.mock import patch, MagicMock
import time
import os
import logging
from datetime import datetime, timedelta
# Importiere unsere Module
from trading_twins.orchestrator import TradingTwinsOrchestrator
from trading_twins.manager import TradingTwinsManager
from trading_twins.models import init_db, ProposalJob, ProposedSlot
# Logging reduzieren
logging.basicConfig(level=logging.INFO)
class TestTradingTwinsDryRun(unittest.TestCase):
@classmethod
def setUpClass(cls):
# Wir nutzen eine temporäre Test-Datenbank
cls.test_db_path = 'sqlite:///trading_twins/test_dry_run.db'
if os.path.exists("trading_twins/test_dry_run.db"):
os.remove("trading_twins/test_dry_run.db")
# Manager mit Test-DB initialisieren
cls.manager = TradingTwinsManager(db_path=cls.test_db_path)
def setUp(self):
# Orchestrator neu erstellen
self.orchestrator = TradingTwinsOrchestrator()
self.orchestrator.manager = self.manager # Inject Test Manager
# Timer drastisch verkürzen für Tests
self.orchestrator_timeout_patch = patch('trading_twins.orchestrator.TIMEOUT_SECONDS', 2)
self.orchestrator_timeout_patch.start()
def tearDown(self):
self.orchestrator_timeout_patch.stop()
@patch('trading_twins.orchestrator.send_email_via_graph')
@patch('trading_twins.orchestrator.send_approval_card')
def test_1_happy_path_timeout(self, mock_teams, mock_email):
"""Testet den automatischen Versand nach Timeout."""
print("\n--- TEST 1: Happy Path (Timeout -> Auto-Send) ---")
mock_teams.return_value = True
mock_email.return_value = True
# Lead verarbeiten
job_uuid = self.orchestrator.process_lead("test1@example.com", "Kunde Eins", "Firma A")
print(f"Job {job_uuid} gestartet. Warte auf Timeout (2s).")
time.sleep(3) # Warte länger als Timeout
# Prüfungen
mock_teams.assert_called_once()
mock_email.assert_called_once() # E-Mail muss versendet worden sein
status = self.manager.get_job_status(job_uuid)
self.assertEqual(status, 'sent')
print("✅ E-Mail wurde automatisch versendet.")
@patch('trading_twins.orchestrator.send_email_via_graph')
@patch('trading_twins.orchestrator.send_approval_card')
def test_2_manual_cancel(self, mock_teams, mock_email):
"""Testet den manuellen Abbruch."""
print("\n--- TEST 2: Manueller Abbruch (STOP) ---")
mock_teams.return_value = True
# Lead verarbeiten
job_uuid = self.orchestrator.process_lead("test2@example.com", "Kunde Zwei", "Firma B")
# Simuliere Klick auf "STOP" (direkter DB-Update wie API es tun würde)
print("Simuliere Klick auf 'STOP'...")
self.manager.update_job_status(job_uuid, 'cancelled')
print("Warte auf Timeout (2s).")
time.sleep(3)
# Prüfungen
mock_teams.assert_called_once()
mock_email.assert_not_called() # E-Mail darf NICHT versendet werden
status = self.manager.get_job_status(job_uuid)
self.assertEqual(status, 'cancelled')
print("✅ Abbruch erfolgreich, keine E-Mail gesendet.")
@patch('trading_twins.manager.TradingTwinsManager._mock_calendar_availability')
def test_3_overbooking_factor_3(self, mock_calendar):
"""
Testet die Faktor-3 Logik.
Wir stellen 4 Slots zur Verfügung.
Wir erzeugen 4 Leads.
Die ersten 3 sollten Slot A bekommen.
Der 4. Lead sollte Slot A NICHT mehr bekommen, sondern Slot B (oder andere).
"""
print("\n--- TEST 3: Überbuchungs-Logik (Faktor 3) ---")
# Setup Mock Calendar: Gibt immer dieselben 4 Slots zurück
tomorrow = datetime.now().date() + timedelta(days=1)
slot_a = {'start': datetime.combine(tomorrow, datetime.min.time().replace(hour=10)), 'end': datetime.combine(tomorrow, datetime.min.time().replace(hour=10, minute=45))}
slot_b = {'start': datetime.combine(tomorrow, datetime.min.time().replace(hour=11)), 'end': datetime.combine(tomorrow, datetime.min.time().replace(hour=11, minute=45))}
slot_c = {'start': datetime.combine(tomorrow, datetime.min.time().replace(hour=14)), 'end': datetime.combine(tomorrow, datetime.min.time().replace(hour=14, minute=45))}
# Mock gibt diese Liste zurück
mock_calendar.return_value = [slot_a, slot_b, slot_c]
# Wir feuern 4 Leads ab
uuids = []
for i in range(1, 5):
uuid, slots = self.manager.create_proposal_job(f"bulk{i}@test.com", f"Bulk {i}", "Bulk Corp")
uuids.append((uuid, slots))
# print(f"Lead {i} bekam Slots: {[s['start'].strftime('%H:%M') for s in slots]}")
# Analyse
# Lead 1: Bekommt A, B (A hat Count 0 -> 1)
# Lead 2: Bekommt A, B (A hat Count 1 -> 2)
# Lead 3: Bekommt A, B (A hat Count 2 -> 3 -> VOLL)
# Lead 4: Sollte A NICHT bekommen, sondern B, C
slots_lead_1 = uuids[0][1]
slots_lead_4 = uuids[3][1]
start_time_lead_1_first_slot = slots_lead_1[0]['start']
start_time_lead_4_first_slot = slots_lead_4[0]['start']
print(f"Lead 1 Slot 1: {start_time_lead_1_first_slot}")
print(f"Lead 4 Slot 1: {start_time_lead_4_first_slot}")
if start_time_lead_1_first_slot != start_time_lead_4_first_slot:
print("✅ Faktor-3 Logik greift: Lead 4 hat einen anderen Start-Slot bekommen!")
else:
print("❌ Faktor-3 Logik fehlgeschlagen: Lead 4 hat denselben Slot bekommen.")
# Debug
session = self.manager.Session()
count = session.query(ProposedSlot).filter(ProposedSlot.start_time == start_time_lead_1_first_slot).count()
print(f"Total entries for Slot A: {count}")
session.close()
if __name__ == '__main__':
unittest.main()

View File

@@ -0,0 +1,50 @@
import requests
import json
import os
def send_teams_message(webhook_url, message):
"""
Sends a simple message to a Microsoft Teams channel using a webhook.
Args:
webhook_url (str): The URL of the incoming webhook.
message (str): The plain text message to send.
Returns:
bool: True if the message was sent successfully (HTTP 200), False otherwise.
"""
if not webhook_url:
print("Error: TEAMS_WEBHOOK_URL is not set.")
return False
headers = {
"Content-Type": "application/json"
}
payload = {
"text": message
}
try:
response = requests.post(webhook_url, headers=headers, data=json.dumps(payload), timeout=10)
if response.status_code == 200:
print("Message sent successfully to Teams.")
return True
else:
print(f"Failed to send message. Status code: {response.status_code}")
print(f"Response: {response.text}")
return False
except requests.exceptions.RequestException as e:
print(f"An error occurred while sending the request: {e}")
return False
if __name__ == "__main__":
# The webhook URL is taken directly from the project description for this test.
# In a real application, this should be loaded from an environment variable.
webhook_url = "https://wacklergroup.webhook.office.com/webhookb2/fe728cde-790c-4190-b1d3-be393ca0f9bd@6d85a9ef-3878-420b-8f43-38d6cb12b665/IncomingWebhook/e9a8ee6157594a6cab96048cf2ea2232/d26033cd-a81f-41a6-8cd2-b4a3ba0b5a01/V2WFmjcbkMzSU4f6lDSdUOM9VNm7F7n1Th4YDiu3fLZ_Y1"
test_message = "🤖 This is a test message from the Gemini Trading Twins Engine. If you see this, the webhook is working. [31988f42]"
send_teams_message(webhook_url, test_message)

View File

@@ -0,0 +1,157 @@
import os
import sys
import re
import logging
import requests
import json
from datetime import datetime
from dotenv import load_dotenv
# Ensure we can import from root directory
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
# Import db functions and parsers
try:
from db import insert_lead, init_db
from ingest import parse_roboplanet_form, parse_tradingtwins_html, is_free_mail
except ImportError:
# Fallback for direct execution
sys.path.append(os.path.dirname(__file__))
from db import insert_lead, init_db
from ingest import parse_roboplanet_form, parse_tradingtwins_html, is_free_mail
# Configuration
env_path = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '.env'))
load_dotenv(dotenv_path=env_path, override=True)
CLIENT_ID = os.getenv("INFO_Application_ID")
TENANT_ID = os.getenv("INFO_Tenant_ID")
CLIENT_SECRET = os.getenv("INFO_Secret")
USER_EMAIL = "info@robo-planet.de"
# Setup logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
def get_access_token():
url = f"https://login.microsoftonline.com/{TENANT_ID}/oauth2/v2.0/token"
data = {
"client_id": CLIENT_ID,
"scope": "https://graph.microsoft.com/.default",
"client_secret": CLIENT_SECRET,
"grant_type": "client_credentials"
}
response = requests.post(url, data=data)
response.raise_for_status()
return response.json().get("access_token")
def fetch_new_leads_emails(token, limit=200):
url = f"https://graph.microsoft.com/v1.0/users/{USER_EMAIL}/messages"
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
params = {
"$top": limit,
"$select": "id,subject,receivedDateTime,body",
"$orderby": "receivedDateTime desc"
}
response = requests.get(url, headers=headers, params=params)
if response.status_code != 200:
logger.error(f"Graph API Error: {response.status_code} - {response.text}")
return []
all_msgs = response.json().get("value", [])
# Filter client-side for both TradingTwins and Roboplanet contact forms
filtered = [m for m in all_msgs if (
"Neue Anfrage zum Thema Roboter" in (m.get('subject') or '') or
"Kontaktformular Roboplanet" in (m.get('subject') or '')
)]
return filtered
def process_leads(auto_sync=False):
init_db()
new_count = 0
try:
token = get_access_token()
emails = fetch_new_leads_emails(token) # Use the new function
logger.info(f"Found {len(emails)} potential lead emails.")
for email in emails:
subject = email.get('subject') or ''
body = email.get('body', {}).get('content', '')
received_at_str = email.get('receivedDateTime')
received_at = None
if received_at_str:
try:
received_at = datetime.fromisoformat(received_at_str.replace('Z', '+00:00'))
except:
pass
lead_data = {}
source_prefix = "unknown"
source_display_name = "Unknown"
if "Neue Anfrage zum Thema Roboter" in subject:
lead_data = parse_tradingtwins_html(body)
source_prefix = "tt"
source_display_name = "TradingTwins"
elif "Kontaktformular Roboplanet" in subject:
lead_data = parse_roboplanet_form(body)
source_prefix = "rp"
source_display_name = "Website-Formular"
else:
# Should not happen with current filtering, but good for robustness
logger.warning(f"Skipping unknown email type: {subject}")
continue
lead_data['source'] = source_display_name # Add the new source field for the DB
lead_data['raw_body'] = body
lead_data['received_at'] = received_at
# Apply general quality checks (if not already done by parser)
if 'is_free_mail' not in lead_data:
lead_data['is_free_mail'] = is_free_mail(lead_data.get('email', ''))
if 'is_low_quality' not in lead_data:
company_name_check = lead_data.get('company', '')
# Consider company name '-' as missing/invalid
if company_name_check == '-': company_name_check = ''
lead_data['is_low_quality'] = lead_data['is_free_mail'] or not company_name_check
company_name = lead_data.get('company')
if not company_name or company_name == '-':
# Fallback: if company name is missing, use contact name as company
company_name = lead_data.get('contact')
lead_data['company'] = company_name
if not company_name:
logger.warning(f"Skipping lead due to missing company and contact name: {subject}")
continue
# Ensure source_id and 'id' for db.py compatibility
if not lead_data.get('source_id'):
lead_data['source_id'] = f"{source_prefix}_unknown_{int(datetime.now().timestamp())}"
lead_data['id'] = lead_data['source_id'] # db.py expects 'id' for source_id column
if insert_lead(lead_data):
logger.info(f" -> Ingested ({source_prefix}): {company_name}")
new_count += 1
if new_count > 0 and auto_sync:
logger.info(f"Triggering auto-sync for {new_count} new leads...")
from enrich import run_sync
run_sync()
return new_count
except Exception as e:
logger.error(f"Error in process_leads: {e}")
return 0
if __name__ == "__main__":
count = process_leads()
print(f"Ingested {count} new leads.")