feat(company-explorer): Initial Web UI & Backend with Enrichment Flow
This commit introduces the foundational elements for the new "Company Explorer" web application, marking a significant step away from the legacy Google Sheets / CLI system. Key changes include: - Project Structure: A new directory with separate (FastAPI) and (React/Vite) components. - Data Persistence: Migration from Google Sheets to a local SQLite database () using SQLAlchemy. - Core Utilities: Extraction and cleanup of essential helper functions (LLM wrappers, text utilities) into . - Backend Services: , , for AI-powered analysis, and logic. - Frontend UI: Basic React application with company table, import wizard, and dynamic inspector sidebar. - Docker Integration: Updated and for multi-stage builds and sideloading. - Deployment & Access: Integrated into central Nginx proxy and dashboard, accessible via . Lessons Learned & Fixed during development: - Frontend Asset Loading: Addressed issues with Vite's path and FastAPI's . - TypeScript Configuration: Added and . - Database Schema Evolution: Solved errors by forcing a new database file and correcting override. - Logging: Implemented robust file-based logging (). This new foundation provides a powerful and maintainable platform for future B2B robotics lead generation.
This commit is contained in:
BIN
FRITZbox7530.pdf
Normal file
BIN
FRITZbox7530.pdf
Normal file
Binary file not shown.
80
MIGRATION_PLAN.md
Normal file
80
MIGRATION_PLAN.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# Migrations-Plan: Legacy GSheets -> Company Explorer (Robotics Edition)
|
||||
|
||||
**Kontext:** Neuanfang für die Branche **Robotik & Facility Management**.
|
||||
**Ziel:** Ablösung von Google Sheets/CLI durch eine Web-App ("Company Explorer") mit SQLite-Backend.
|
||||
|
||||
## 1. Strategische Neuausrichtung
|
||||
|
||||
| Bereich | Alt (Legacy) | Neu (Robotics Edition) |
|
||||
| :--- | :--- | :--- |
|
||||
| **Daten-Basis** | Google Sheets | **SQLite** (Lokal, performant, filterbar). |
|
||||
| **Ziel-Daten** | Allgemein / Kundenservice | **Robotics-Signale** (SPA-Bereich? Intralogistik? Werkschutz?). |
|
||||
| **Branchen** | KI-Vorschlag (Freitext) | **Strict Mode:** Mapping auf feste CRM-Liste (z.B. "Hotellerie", "Maschinenbau"). |
|
||||
| **Texterstellung** | Pain/Gain Matrix (Service) | **Pain/Gain Matrix (Robotics)**. "Übersetzung" des alten Wissens auf Roboter. |
|
||||
| **Analytics** | Techniker-ML-Modell | **Deaktiviert**. Vorerst keine Relevanz. |
|
||||
| **Operations** | D365 Sync (Broken) | **Excel-Import & Deduplizierung**. Fokus auf Matching externer Listen gegen Bestand. |
|
||||
|
||||
## 2. Architektur & Komponenten-Mapping
|
||||
|
||||
Das System wird in `company-explorer/` neu aufgebaut. Wir lösen Abhängigkeiten zur Root `helpers.py` auf.
|
||||
|
||||
### A. Core Backend (`backend/`)
|
||||
|
||||
| Komponente | Aufgabe & Neue Logik | Prio |
|
||||
| :--- | :--- | :--- |
|
||||
| **Database** | Ersetzt `GoogleSheetHandler`. Speichert Firmen & "Enrichment Blobs". | 1 |
|
||||
| **Importer** | Ersetzt `SyncManager`. Importiert Excel-Dumps (CRM) und Event-Listen. | 1 |
|
||||
| **Deduplicator** | Ersetzt `company_deduplicator.py`. **Kern-Feature:** Checkt Event-Listen gegen DB. Muss "intelligent" matchen (Name + Ort + Web). | 1 |
|
||||
| **Scraper (Base)** | Extrahiert Text von Websites. Basis für alle Analysen. | 1 |
|
||||
| **Signal Detector** | **NEU.** Analysiert Website-Text auf Roboter-Potential. <br> *Logik:* Wenn Branche = Hotel & Keyword = "Wellness" -> Potential: Reinigungsroboter. | 1 |
|
||||
| **Classifier** | Brancheneinstufung. **Strict Mode:** Prüft gegen `config/allowed_industries.json`. | 2 |
|
||||
| **Marketing Engine** | Ersetzt `generate_marketing_text.py`. Nutzt neue `marketing_wissen_robotics.yaml`. | 3 |
|
||||
|
||||
### B. Frontend (`frontend/`) - React
|
||||
|
||||
* **View 1: Der "Explorer":** DataGrid aller Firmen. Filterbar nach "Roboter-Potential" und Status.
|
||||
* **View 2: Der "Inspector":** Detailansicht einer Firma. Zeigt gefundene Signale ("Hat SPA Bereich"). Manuelle Korrektur-Möglichkeit.
|
||||
* **View 3: "List Matcher":** Upload einer Excel-Liste -> Anzeige von Duplikaten -> Button "Neue importieren".
|
||||
|
||||
## 3. Umgang mit Shared Code (`helpers.py` & Co.)
|
||||
|
||||
Wir kapseln das neue Projekt vollständig ab ("Fork & Clean").
|
||||
|
||||
* **Quelle:** `helpers.py` (Root)
|
||||
* **Ziel:** `company-explorer/backend/lib/core_utils.py`
|
||||
* **Aktion:** Wir kopieren nur:
|
||||
* OpenAI/Gemini Wrapper (Retry Logic).
|
||||
* Text Cleaning (`clean_text`, `normalize_string`).
|
||||
* URL Normalization.
|
||||
|
||||
* **Quelle:** Andere Gemini Apps (`duckdns`, `gtm-architect`, `market-intel`)
|
||||
* **Aktion:** Wir betrachten diese als Referenz. Nützliche Logik (z.B. die "Grit"-Prompts aus `market-intel`) wird explizit in die neuen Service-Module kopiert.
|
||||
|
||||
## 4. Datenstruktur (SQLite Schema)
|
||||
|
||||
### Tabelle `companies` (Stammdaten)
|
||||
* `id` (PK)
|
||||
* `name` (String)
|
||||
* `website` (String)
|
||||
* `crm_id` (String, nullable - Link zum D365)
|
||||
* `industry_crm` (String - Die "erlaubte" Branche)
|
||||
* `city` (String)
|
||||
* `country` (String)
|
||||
* `status` (Enum: NEW, IMPORTED, ENRICHED, QUALIFIED)
|
||||
|
||||
### Tabelle `signals` (Roboter-Potential)
|
||||
* `company_id` (FK)
|
||||
* `signal_type` (z.B. "has_spa", "has_large_warehouse", "has_security_needs")
|
||||
* `confidence` (Float)
|
||||
* `proof_text` (Snippet von der Website)
|
||||
|
||||
### Tabelle `duplicates_log`
|
||||
* Speichert Ergebnisse von Listen-Abgleichen ("Upload X enthielt 20 bekannte Firmen").
|
||||
|
||||
## 5. Phasenplan Umsetzung
|
||||
|
||||
1. **Housekeeping:** Archivierung des Legacy-Codes (`_legacy_gsheets_system`).
|
||||
2. **Setup:** Init `company-explorer` (Backend + Frontend Skeleton).
|
||||
3. **Foundation:** DB-Schema + "List Matcher" (Deduplizierung ist Prio A für Operations).
|
||||
4. **Enrichment:** Implementierung des Scrapers + Signal Detector (Robotics).
|
||||
5. **UI:** React Interface für die Daten.
|
||||
674
_legacy_gsheets_system/config.py
Normal file
674
_legacy_gsheets_system/config.py
Normal file
@@ -0,0 +1,674 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
config.py
|
||||
|
||||
Zentrale Konfiguration für das Projekt "Automatisierte Unternehmensbewertung".
|
||||
Enthält Dateipfade, API-Schlüssel-Pfade, die globale Config-Klasse
|
||||
und das Spalten-Mapping für das Google Sheet.
|
||||
"""
|
||||
|
||||
import os
|
||||
import re
|
||||
|
||||
import logging
|
||||
|
||||
# ==============================================================================
|
||||
# 1. GLOBALE KONSTANTEN UND DATEIPFADE
|
||||
# ==============================================================================
|
||||
|
||||
# --- Dateipfade (NEU: Feste Pfade für Docker-Betrieb) ---
|
||||
# Das Basisverzeichnis ist im Docker-Kontext immer /app.
|
||||
BASE_DIR = "/app"
|
||||
|
||||
CREDENTIALS_FILE = os.path.join(BASE_DIR, "service_account.json")
|
||||
API_KEY_FILE = os.path.join(BASE_DIR, "gemini_api_key.txt")
|
||||
SERP_API_KEY_FILE = os.path.join(BASE_DIR, "serpapikey.txt")
|
||||
GENDERIZE_API_KEY_FILE = os.path.join(BASE_DIR, "genderize_API_Key.txt")
|
||||
BRANCH_MAPPING_FILE = None
|
||||
LOG_DIR = os.path.join(BASE_DIR, "Log_from_docker") # Log in den gemounteten Ordner schreiben
|
||||
|
||||
# --- ML Modell Artefakte ---
|
||||
MODEL_FILE = os.path.join(BASE_DIR, "technician_decision_tree_model.pkl")
|
||||
IMPUTER_FILE = os.path.join(BASE_DIR, "median_imputer.pkl")
|
||||
PATTERNS_FILE_TXT = os.path.join(BASE_DIR, "technician_patterns.txt") # Alt (Optional beibehalten)
|
||||
PATTERNS_FILE_JSON = os.path.join(BASE_DIR, "technician_patterns.json") # Neu (Empfohlen)
|
||||
|
||||
# Marker für URLs, die erneut per SERP gesucht werden sollen
|
||||
URL_CHECK_MARKER = "URL_CHECK_NEEDED"
|
||||
|
||||
# --- User Agents für Rotation ---
|
||||
USER_AGENTS = [
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
|
||||
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
|
||||
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
|
||||
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
|
||||
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15',
|
||||
'Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15',
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0',
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0',
|
||||
'Mozilla/5.0 (X11; Linux i686; rv:108.0) Gecko/20100101 Firefox/108.0',
|
||||
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:108.0) Gecko/20100101 Firefox/108.0',
|
||||
]
|
||||
|
||||
# ==============================================================================
|
||||
# 2. VORAB-HELPER FUNKTION (wird von Config-Klasse benötigt)
|
||||
# ==============================================================================
|
||||
|
||||
def normalize_for_mapping(text):
|
||||
"""
|
||||
Normalisiert einen String aggressiv für Mapping-Zwecke.
|
||||
Muss VOR der Config-Klasse definiert werden, da sie dort verwendet wird.
|
||||
"""
|
||||
if not isinstance(text, str):
|
||||
return ""
|
||||
text = text.lower()
|
||||
text = text.strip()
|
||||
text = re.sub(r'[^a-z0-9]', '', text)
|
||||
return text
|
||||
|
||||
# ==============================================================================
|
||||
# 3. ZENTRALE KONFIGURATIONS-KLASSE
|
||||
# ==============================================================================
|
||||
|
||||
class Config:
|
||||
"""Zentrale Konfigurationseinstellungen."""
|
||||
VERSION = "v2.0.0" # Version hochgezählt nach Refactoring
|
||||
LANG = "de" # Sprache fuer Wikipedia etc.
|
||||
# ACHTUNG: SHEET_URL ist hier ein Platzhalter. Ersetzen Sie ihn durch Ihre tatsaechliche URL.
|
||||
SHEET_URL = "https://docs.google.com/spreadsheets/d/1u_gHr9JUfmV1-iviRzbSe3575QEp7KLhK5jFV_gJcgo" # <<< ERSETZEN SIE DIES!
|
||||
MAX_RETRIES = 5
|
||||
RETRY_DELAY = 10
|
||||
REQUEST_TIMEOUT = 20
|
||||
SIMILARITY_THRESHOLD = 0.65
|
||||
DEBUG = True
|
||||
WIKIPEDIA_SEARCH_RESULTS = 5
|
||||
HTML_PARSER = "html.parser"
|
||||
TOKEN_MODEL = "gpt-3.5-turbo"
|
||||
USER_AGENT = 'Mozilla/5.0 (compatible; UnternehmenSkript/1.0; +https://www.example.com/bot)'
|
||||
|
||||
# --- Konfiguration fuer Batching & Parallelisierung ---
|
||||
PROCESSING_BATCH_SIZE = 20
|
||||
OPENAI_BATCH_SIZE_LIMIT = 4
|
||||
MAX_SCRAPING_WORKERS = 10
|
||||
UPDATE_BATCH_ROW_LIMIT = 50
|
||||
MAX_BRANCH_WORKERS = 10
|
||||
OPENAI_CONCURRENCY_LIMIT = 3
|
||||
PROCESSING_BRANCH_BATCH_SIZE = 20
|
||||
SERPAPI_DELAY = 1.5
|
||||
|
||||
# --- (NEU) GTM Architect: Stilvorgabe für Bildgenerierung ---
|
||||
CORPORATE_DESIGN_PROMPT = (
|
||||
"cinematic industrial photography, sleek high-tech aesthetic, futuristic but grounded reality, "
|
||||
"volumetric lighting, sharp focus on modern technology, 8k resolution, photorealistic, "
|
||||
"highly detailed textures, cool steel-blue color grading with subtle safety-yellow accents, "
|
||||
"wide angle lens, shallow depth of field."
|
||||
)
|
||||
|
||||
# --- Plausibilitäts-Schwellenwerte ---
|
||||
PLAUSI_UMSATZ_MIN_WARNUNG = 50000
|
||||
PLAUSI_UMSATZ_MAX_WARNUNG = 200000000000
|
||||
PLAUSI_MA_MIN_WARNUNG_ABS = 1
|
||||
PLAUSI_MA_MIN_WARNUNG_BEI_UMSATZ = 3
|
||||
PLAUSI_UMSATZ_MIN_SCHWELLE_FUER_MA_CHECK = 1000000
|
||||
PLAUSI_MA_MAX_WARNUNG = 1000000
|
||||
PLAUSI_RATIO_UMSATZ_PRO_MA_MIN = 25000
|
||||
PLAUSI_RATIO_UMSATZ_PRO_MA_MAX = 1500000
|
||||
PLAUSI_ABWEICHUNG_CRM_WIKI_PROZENT = 30
|
||||
|
||||
# --- Mapping für Länder-Codes ---
|
||||
# Übersetzt D365 Country Codes in die im GSheet verwendete Langform.
|
||||
# WICHTIG: Die Schlüssel (Codes) sollten in Kleinbuchstaben sein für einen robusten Vergleich.
|
||||
COUNTRY_CODE_MAP = {
|
||||
'de': 'Deutschland',
|
||||
'gb': 'Vereinigtes Königreich',
|
||||
'ch': 'Schweiz',
|
||||
'at': 'Österreich',
|
||||
'it': 'Italien',
|
||||
'es': 'Spanien',
|
||||
'dk': 'Dänemark',
|
||||
'hu': 'Ungarn',
|
||||
'se': 'Schweden',
|
||||
'fr': 'Frankreich',
|
||||
'us': 'USA',
|
||||
'br': 'Brasilien',
|
||||
'cz': 'Tschechien',
|
||||
'au': 'Australien',
|
||||
'mx': 'Mexiko',
|
||||
'nl': 'Niederlande',
|
||||
'pl': 'Polen',
|
||||
'be': 'Belgien',
|
||||
'sk': 'Slowakei',
|
||||
'nz': 'Neuseeland',
|
||||
'in': 'Indien',
|
||||
'li': 'Liechtenstein',
|
||||
'ae': 'Vereinigte Arabische Emirate',
|
||||
'ru': 'Russland',
|
||||
'jp': 'Japan',
|
||||
'ro': 'Rumänien',
|
||||
'is': 'Island',
|
||||
'lu': 'Luxemburg',
|
||||
'me': 'Montenegro',
|
||||
'ph': 'Philippinen',
|
||||
'fi': 'Finnland',
|
||||
'no': 'Norwegen',
|
||||
'ma': 'Marokko',
|
||||
'hr': 'Kroatien',
|
||||
'ca': 'Kanada',
|
||||
'ua': 'Ukraine',
|
||||
'sb': 'Salomonen',
|
||||
'za': 'Südafrika',
|
||||
'ee': 'Estland',
|
||||
'cn': 'China',
|
||||
'si': 'Slowenien',
|
||||
'lt': 'Litauen',
|
||||
}
|
||||
|
||||
|
||||
# --- Branchen-Gruppen Mapping (v2.0 - Angereichert mit Definitionen & Beispielen) ---
|
||||
# Single Source of Truth für alle Branchen.
|
||||
BRANCH_GROUP_MAPPING = {
|
||||
"Maschinenbau": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Herstellung von zumeist größeren und komplexen Maschinen. Abgrenzung: Keine Anlagen wie z.B. Aufzüge, Rolltreppen oder komplette Produktionsstraßen.",
|
||||
"beispiele": "EBM Papst, Kärcher, Winterhalter, Testo, ZwickRoell, Koch Pac, Uhlmann, BHS, Schlie, Kasto, Chiron",
|
||||
"d365_branch_detail": "Maschinenbau"
|
||||
},
|
||||
"Automobil": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Hersteller von (Spezial)-Fahrzeugen, die meist in ihrer Bewegung eingeschränkt sind (z.B. Mähdrescher, Pistenraupen). Abgrenzung: Keine Autohändler oder Service an PKWs.",
|
||||
"beispiele": "Kässbohrer, Aebi Schmidt, Pesko, Nova, PV Automotive",
|
||||
"d365_branch_detail": "Automobil"
|
||||
},
|
||||
"Anlagenbau": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Hersteller von komplexen Anlagen, die fest beim Kunden installiert werden (z.B. Fertigungsanlagen) und oft der Herstellung nachgelagerter Erzeugnisse dienen. Abgrenzung: Keine Aufzugsanlagen, keine Rolltreppen.",
|
||||
"beispiele": "Yaskawa, Good Mills, Jungheinrich, Abus, BWT",
|
||||
"d365_branch_detail": "Anlagenbau"
|
||||
},
|
||||
"Medizintechnik": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Hersteller von medizinischen Geräten für Krankenhäuser, (Zahn-)Arztpraxen oder den Privatbereich. Abgrenzung: Keine reinen Dienstleister/Pflegedienste.",
|
||||
"beispiele": "Carl Zeiss, MMM, Olympus, Sysmex, Henry Schein, Dental Bauer, Vitalaire",
|
||||
"d365_branch_detail": "Medizintechnik"
|
||||
},
|
||||
"Chemie & Pharma": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Unternehmen, die chemische oder pharmazeutische Erzeugnisse herstellen. Abgrenzung: Keine Lebensmittel.",
|
||||
"beispiele": "Brillux",
|
||||
"d365_branch_detail": "Chemie & Pharma"
|
||||
},
|
||||
"Elektrotechnik": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Hersteller von Maschinen und Geräten, die sich hauptsächlich durch elektrische Komponenten auszeichnen.",
|
||||
"beispiele": "Triathlon, SBS BatterieSystem",
|
||||
"d365_branch_detail": "Elektrotechnik"
|
||||
},
|
||||
"Lebensmittelproduktion": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Unternehmen, die Lebensmittel im industriellen Maßstab produzieren.",
|
||||
"beispiele": "Ferrero, Lohmann, Mars, Fuchs, Teekanne, Frischli",
|
||||
"d365_branch_detail": "Lebensmittelproduktion"
|
||||
},
|
||||
"IT / Telekommunikation": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Hersteller von Telekommunikations-Hardware und -Equipment. Abgrenzung: Keine Telekommunikations-Netzbetreiber.",
|
||||
"beispiele": "NDI Nordisk Daek Import Danmark",
|
||||
"d365_branch_detail": "IT / Telekommunikation"
|
||||
},
|
||||
"Bürotechnik": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Hersteller von Geräten für die Büro-Infrastruktur wie Drucker, Kopierer oder Aktenvernichter.",
|
||||
"beispiele": "Ricoh, Rosskopf",
|
||||
"d365_branch_detail": "Bürotechnik"
|
||||
},
|
||||
"Automaten (Vending / Slot)": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Reine Hersteller von Verkaufs-, Service- oder Spielautomaten, die mitunter einen eigenen Kundenservice haben.",
|
||||
"beispiele": "Coffema, Melitta, Tchibo, Selecta",
|
||||
"d365_branch_detail": "Automaten (Vending, Slot)"
|
||||
},
|
||||
"Gebäudetechnik Heizung / Lüftung / Klima": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Reine Hersteller von Heizungs-, Lüftungs- und Klimaanlagen (HLK), die mitunter einen eigenen Kundenservice haben.",
|
||||
"beispiele": "Wolf, ETA, Fröling, Ochsner, Windhager, DKA",
|
||||
"d365_branch_detail": "Gebäudetechnik Heizung, Lüftung, Klima"
|
||||
},
|
||||
"Gebäudetechnik Allgemein": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Hersteller von Produkten, die fest in Gebäuden installiert werden (z.B. Sicherheitstechnik, Türen, Sonnenschutz).",
|
||||
"beispiele": "Geze, Bothe Hild, Warema, Hagleitner",
|
||||
"d365_branch_detail": "Gebäudetechnik Allgemein"
|
||||
},
|
||||
"Schädlingsbekämpfung": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Hersteller von Systemen und Produkten zur Schädlingsbekämpfung.",
|
||||
"beispiele": "BioTec, RSD Systems",
|
||||
"d365_branch_detail": "Schädlingsbekämpfung"
|
||||
},
|
||||
"Braune & Weiße Ware": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Hersteller von Haushaltsgroßgeräten (Weiße Ware) und Unterhaltungselektronik (Braune Ware).",
|
||||
"beispiele": "BSH",
|
||||
"d365_branch_detail": "Braune & Weiße Ware"
|
||||
},
|
||||
"Fenster / Glas": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Hersteller von Fenstern, Türen oder Glaselementen.",
|
||||
"beispiele": "",
|
||||
"d365_branch_detail": "Fenster / Glas"
|
||||
},
|
||||
"Getränke": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Industrielle Hersteller von Getränken.",
|
||||
"beispiele": "Wesergold, Schlossquelle, Winkels",
|
||||
"d365_branch_detail": "Getränke"
|
||||
},
|
||||
"Möbel": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Industrielle Hersteller von Möbeln.",
|
||||
"beispiele": "mycs",
|
||||
"d365_branch_detail": "Möbel"
|
||||
},
|
||||
"Agrar / Pellets": {
|
||||
"gruppe": "Hersteller / Produzenten",
|
||||
"definition": "Hersteller von landwirtschaftlichen Produkten, Maschinen oder Brennstoffen wie Holzpellets.",
|
||||
"beispiele": "KWB Energiesysteme",
|
||||
"d365_branch_detail": "Agrar, Pellets"
|
||||
},
|
||||
"Stadtwerke": {
|
||||
"gruppe": "Versorger",
|
||||
"definition": "Lokale Stadtwerke, die die lokale Infrastruktur für die Energieversorgung (Strom, Gas, Wasser) betreiben.",
|
||||
"beispiele": "Badenova, Drewag, Stadtwerke Leipzig, Stadtwerke Kiel",
|
||||
"d365_branch_detail": "Stadtwerke"
|
||||
},
|
||||
"Verteilnetzbetreiber": {
|
||||
"gruppe": "Versorger",
|
||||
"definition": "Überregionale Betreiber von Verteilnetzen (Strom, Gas), die oft keine direkten Endkundenversorger sind.",
|
||||
"beispiele": "Rheinenergie, Open Grid, ENBW",
|
||||
"d365_branch_detail": "Verteilnetzbetreiber"
|
||||
},
|
||||
"Telekommunikation": {
|
||||
"gruppe": "Versorger",
|
||||
"definition": "Betreiber von Telekommunikations-Infrastruktur und Netzen (z.B. Telefon, Internet, Mobilfunk).",
|
||||
"beispiele": "M-Net, NetCologne, Thiele, Willy.tel",
|
||||
"d365_branch_detail": "Telekommunikation"
|
||||
},
|
||||
"Gase & Mineralöl": {
|
||||
"gruppe": "Versorger",
|
||||
"definition": "Unternehmen, die Gas- oder Mineralölprodukte an Endkunden oder Unternehmen liefern.",
|
||||
"beispiele": "Westfalen AG, GasCom",
|
||||
"d365_branch_detail": "Gase & Mineralöl"
|
||||
},
|
||||
"Messdienstleister": {
|
||||
"gruppe": "Service provider (Dienstleister)",
|
||||
"definition": "Unternehmen, die sich auf die Ablesung und Abrechnung von Verbrauchszählern (Heizung, Wasser) spezialisiert haben. Abgrenzung: Kein Versorger.",
|
||||
"beispiele": "Brunata, Ista, Telent",
|
||||
"d365_branch_detail": "Messdienstleister"
|
||||
},
|
||||
"Facility Management": {
|
||||
"gruppe": "Service provider (Dienstleister)",
|
||||
"definition": "Anbieter von Dienstleistungen rund um Immobilien, von der technischen Instandhaltung bis zur Reinigung.",
|
||||
"beispiele": "Wisag, Vonovia, Infraserv, Gewofag, B&O, Sprint Sanierungen, BWTS",
|
||||
"d365_branch_detail": "Facility Management"
|
||||
},
|
||||
"Healthcare/Pflegedienste": {
|
||||
"gruppe": "Service provider (Dienstleister)",
|
||||
"definition": "Erbringen von reinen Dienstleistungen an medizinischen Geräten (z.B. Wartung, Lieferung) oder direkt an Menschen (Pflege). Abgrenzung: Keine Hersteller.",
|
||||
"beispiele": "Sanimed, Fuchs+Möller, Strehlow, Healthcare at Home",
|
||||
"d365_branch_detail": "Healthcare/Pflegedienste"
|
||||
},
|
||||
"Servicedienstleister / Reparatur ohne Produktion": {
|
||||
"gruppe": "Service provider (Dienstleister)",
|
||||
"definition": "Reine Service-Organisationen, die technische Geräte warten und reparieren, aber nicht selbst herstellen.",
|
||||
"beispiele": "HSR, FFB",
|
||||
"d365_branch_detail": "Servicedienstleister / Reparatur ohne Produktion"
|
||||
},
|
||||
"Aufzüge und Rolltreppen": {
|
||||
"gruppe": "Service provider (Dienstleister)",
|
||||
"definition": "Hersteller und Unternehmen, die Service, Wartung und Installation von Aufzügen und Rolltreppen anbieten.",
|
||||
"beispiele": "TKE, Liftstar, Lifta",
|
||||
"d365_branch_detail": "Aufzüge und Rolltreppen"
|
||||
},
|
||||
"Feuer- und Sicherheitssysteme": {
|
||||
"gruppe": "Service provider (Dienstleister)",
|
||||
"definition": "Dienstleister für die Wartung, Installation und Überprüfung von Brandmelde- und Sicherheitssystemen.",
|
||||
"beispiele": "Minimax, Securiton",
|
||||
"d365_branch_detail": "Feuer- und Sicherheitssysteme"
|
||||
},
|
||||
"Personentransport": {
|
||||
"gruppe": "Service provider (Dienstleister)",
|
||||
"definition": "Unternehmen, die Personen befördern (z.B. Busunternehmen, Taxi-Zentralen) und eine eigene Fahrzeugflotte warten.",
|
||||
"beispiele": "Rhein-Sieg-Verkehrsgesellschaft",
|
||||
"d365_branch_detail": "Personentransport"
|
||||
},
|
||||
"Entsorgung": {
|
||||
"gruppe": "Service provider (Dienstleister)",
|
||||
"definition": "Unternehmen der Abfall- und Entsorgungswirtschaft mit komplexer Logistik und Fahrzeugmanagement.",
|
||||
"beispiele": "",
|
||||
"d365_branch_detail": "Entsorgung"
|
||||
},
|
||||
"Catering Services": {
|
||||
"gruppe": "Service provider (Dienstleister)",
|
||||
"definition": "Anbieter von Verpflegungsdienstleistungen, oft mit komplexer Logistik und Wartung von Küchengeräten.",
|
||||
"beispiele": "Café+Co International",
|
||||
"d365_branch_detail": "Catering Services"
|
||||
},
|
||||
"Auslieferdienste": {
|
||||
"gruppe": "Handel & Logistik",
|
||||
"definition": "Unternehmen, deren Kerngeschäft der Transport und die Logistik von Waren zum Endkunden ist (Lieferdienste). Abgrenzung: Keine reinen Logistik-Dienstleister.",
|
||||
"beispiele": "Edeka, Rewe, Saturn, Gamma Reifen",
|
||||
"d365_branch_detail": "Auslieferdienste"
|
||||
},
|
||||
"Energie (Brennstoffe)": {
|
||||
"gruppe": "Handel & Logistik",
|
||||
"definition": "Unternehmen, deren Kerngeschäft der Transport und die Logistik von Brennstoffen wie Heizöl zum Endkunden ist.",
|
||||
"beispiele": "Eckert & Ziegler",
|
||||
"d365_branch_detail": "Energie (Brennstoffe)"
|
||||
},
|
||||
"Großhandel": {
|
||||
"gruppe": "Handel & Logistik",
|
||||
"definition": "Großhandelsunternehmen, bei denen der Transport und die Logistik eine zentrale Rolle spielen.",
|
||||
"beispiele": "Hairhaus, NDI Nordisk",
|
||||
"d365_branch_detail": "Großhandel"
|
||||
},
|
||||
"Einzelhandel": {
|
||||
"gruppe": "Handel & Logistik",
|
||||
"definition": "Einzelhandelsunternehmen, oft mit eigener Lieferlogistik zum Endkunden.",
|
||||
"beispiele": "Cactus, mertens, Teuto",
|
||||
"d365_branch_detail": "Einzelhandel"
|
||||
},
|
||||
"Logistik": {
|
||||
"gruppe": "Handel & Logistik",
|
||||
"definition": "Allgemeine Logistikdienstleister, die nicht in eine der spezifischeren Kategorien passen.",
|
||||
"beispiele": "Gerdes + Landwehr, Rüdebusch, Winner",
|
||||
"d365_branch_detail": "Logistik - Sonstige"
|
||||
},
|
||||
"Baustoffhandel": {
|
||||
"gruppe": "Baubranche",
|
||||
"definition": "Großhandel mit Baustoffen wie Zement, Kies, Holz oder Fliesen – oft mit eigenen Fuhrparks und komplexer Filiallogistik.",
|
||||
"beispiele": "Kemmler Baustoffe, Henri Benthack",
|
||||
"d365_branch_detail": "Baustoffhandel"
|
||||
},
|
||||
"Baustoffindustrie": {
|
||||
"gruppe": "Baubranche",
|
||||
"definition": "Produktion von Baustoffen wie Beton, Ziegeln, Gips oder Dämmmaterial – häufig mit werkseigener Logistik.",
|
||||
"beispiele": "Heidelberg Materials, Saint Gobain Weber",
|
||||
"d365_branch_detail": "Baustoffindustrie"
|
||||
},
|
||||
"Logistiker Baustoffe": {
|
||||
"gruppe": "Baubranche",
|
||||
"definition": "Spezialisierte Transportdienstleister für Baustoffe – häufig im Nahverkehr, mit engen Zeitfenstern und Baustellenbelieferung.",
|
||||
"beispiele": "C.Bergmann, HENGE Baustoff GmbH",
|
||||
"d365_branch_detail": "Logistiker Baustoffe"
|
||||
},
|
||||
"Baustoffindustrie": {
|
||||
"gruppe": "Baubranche",
|
||||
"definition": "Produktion von Baustoffen wie Beton, Ziegeln, Gips oder Dämmmaterial – häufig mit werkseigener Logistik.",
|
||||
"beispiele": "Heidelberg Materials, Saint Gobain Weber",
|
||||
"d365_branch_detail": "Baustoffindustrie"
|
||||
},
|
||||
"Bauunternehmen": {
|
||||
"gruppe": "Baubranche",
|
||||
"definition": "Ausführung von Bauprojekten, oft mit eigenem Materialtransport – hoher Koordinationsaufwand bei Fahrzeugen, Maschinen und Baustellen.",
|
||||
"beispiele": "Max Bögl, Leonhard Weiss",
|
||||
"d365_branch_detail": "Bauunternehmen"
|
||||
},
|
||||
"Versicherungsgutachten": {
|
||||
"gruppe": "Gutachter / Versicherungen",
|
||||
"definition": "Gutachter, die im Auftrag von Versicherungen Schäden prüfen und bewerten.",
|
||||
"beispiele": "DEVK, Allianz",
|
||||
"d365_branch_detail": "Versicherungsgutachten"
|
||||
},
|
||||
"Technische Gutachten": {
|
||||
"gruppe": "Gutachter / Versicherungen",
|
||||
"definition": "Sachverständige und Organisationen, die technische Prüfungen, Inspektionen und Gutachten durchführen.",
|
||||
"beispiele": "TÜV, Audatex, Value, MDK",
|
||||
"d365_branch_detail": "Technische Gutachten"
|
||||
},
|
||||
"Medizinische Gutachten": {
|
||||
"gruppe": "Gutachter / Versicherungen",
|
||||
"definition": "Sachverständige und Organisationen (z.B. MDK), die medizinische Gutachten erstellen.",
|
||||
"beispiele": "MDK",
|
||||
"d365_branch_detail": "Medizinische Gutachten"
|
||||
},
|
||||
"Baugutachter": {
|
||||
"gruppe": "Gutachter / Versicherungen",
|
||||
"definition": "Sachverständige, die Bauschäden oder den Wert von Immobilien begutachten.",
|
||||
"beispiele": "",
|
||||
"d365_branch_detail": "Baugutachter"
|
||||
},
|
||||
"Wohnungswirtschaft": {
|
||||
"gruppe": "Housing",
|
||||
"definition": "Wohnungsbaugesellschaften oder -genossenschaften, die ihre Immobilien instand halten.",
|
||||
"beispiele": "GEWOFAG",
|
||||
"d365_branch_detail": "Wohnungswirtschaft"
|
||||
},
|
||||
"Renovierungsunternehmen": {
|
||||
"gruppe": "Housing",
|
||||
"definition": "Dienstleister, die auf die Renovierung und Sanierung von Wohnimmobilien spezialisiert sind.",
|
||||
"beispiele": "",
|
||||
"d365_branch_detail": "Renovierungsunternehmen"
|
||||
},
|
||||
"Sozialbau Unternehmen": {
|
||||
"gruppe": "Housing",
|
||||
"definition": "Unternehmen, die im Bereich des sozialen Wohnungsbaus tätig sind.",
|
||||
"beispiele": "",
|
||||
"d365_branch_detail": "Anbieter für Soziales Wohnen"
|
||||
},
|
||||
"IT Beratung": {
|
||||
"gruppe": "Sonstige",
|
||||
"definition": "Beratungsunternehmen mit Fokus auf IT-Strategie und -Implementierung. Abgrenzung: Keine Systemhäuser mit eigenem Außendienst.",
|
||||
"beispiele": "",
|
||||
"d365_branch_detail": "IT Beratung"
|
||||
},
|
||||
"Unternehmensberatung": {
|
||||
"gruppe": "Sonstige",
|
||||
"definition": "Klassische Management- und Strategieberatungen.",
|
||||
"beispiele": "",
|
||||
"d365_branch_detail": "Unternehmensberatung (old)"
|
||||
},
|
||||
"Engineering": {
|
||||
"gruppe": "Sonstige",
|
||||
"definition": "Ingenieurbüros und technische Planungsdienstleister.",
|
||||
"beispiele": "",
|
||||
"d365_branch_detail": "Engineering"
|
||||
},
|
||||
"Öffentliche Verwaltung": {
|
||||
"gruppe": "Sonstige",
|
||||
"definition": "Behörden und öffentliche Einrichtungen, oft mit eigenen technischen Abteilungen (z.B. Bauhöfe).",
|
||||
"beispiele": "",
|
||||
"d365_branch_detail": "Öffentliche Verwaltung"
|
||||
},
|
||||
"Sonstiger Service": {
|
||||
"gruppe": "Sonstige",
|
||||
"definition": "Auffangkategorie für Dienstleistungen, die keiner anderen Kategorie zugeordnet werden können.",
|
||||
"beispiele": "",
|
||||
"d365_branch_detail": "Sonstiger Service (old)"
|
||||
}
|
||||
}
|
||||
|
||||
# Branchenübergreifende Top-Referenzen als Fallback
|
||||
FALLBACK_REFERENCES = [
|
||||
"Jungheinrich (weltweit >4.000 Techniker)",
|
||||
"Vivawest (Kundenzufriedenheit > 95%)",
|
||||
"TK Elevators (1.500 Techniker)",
|
||||
"NetCologne"
|
||||
]
|
||||
|
||||
# --- API Schlüssel Speicherung (werden in main() geladen) ---
|
||||
API_KEYS = {}
|
||||
|
||||
@classmethod
|
||||
def load_api_keys(cls):
|
||||
"""Laedt API-Schluessel aus den definierten Dateien."""
|
||||
logger = logging.getLogger(__name__)
|
||||
logger.info("Lade API-Schluessel...")
|
||||
cls.API_KEYS['openai'] = cls._load_key_from_file(API_KEY_FILE)
|
||||
cls.API_KEYS['serpapi'] = cls._load_key_from_file(SERP_API_KEY_FILE)
|
||||
cls.API_KEYS['genderize'] = cls._load_key_from_file(GENDERIZE_API_KEY_FILE)
|
||||
|
||||
if cls.API_KEYS.get('openai'):
|
||||
# Hier nehmen wir an, dass 'openai' für Gemini verwendet wird (Legacy)
|
||||
# Falls in helpers.py direkt auf 'gemini' zugegriffen wird, müsste das hier auch gesetzt werden.
|
||||
logger.info("Gemini API Key (via 'openai' slot) erfolgreich geladen.")
|
||||
else:
|
||||
logger.warning("Gemini API Key konnte nicht geladen werden. KI-Funktionen sind deaktiviert.")
|
||||
|
||||
if not cls.API_KEYS.get('serpapi'):
|
||||
logger.warning("SerpAPI Key konnte nicht geladen werden. Suchfunktionen sind deaktiviert.")
|
||||
if not cls.API_KEYS.get('genderize'):
|
||||
logger.warning("Genderize API Key konnte nicht geladen werden. Geschlechtserkennung ist eingeschraenkt.")
|
||||
|
||||
@staticmethod
|
||||
def _load_key_from_file(filepath):
|
||||
"""Hilfsfunktion zum Laden eines Schluessels aus einer Datei."""
|
||||
logger = logging.getLogger(__name__)
|
||||
abs_path = os.path.abspath(filepath)
|
||||
try:
|
||||
with open(abs_path, "r", encoding="utf-8") as f:
|
||||
key = f.read().strip()
|
||||
if key:
|
||||
return key
|
||||
else:
|
||||
logger.warning(f"API key file is empty: '{abs_path}'")
|
||||
return None
|
||||
except FileNotFoundError:
|
||||
logger.warning(f"API key file not found at path: '{abs_path}'")
|
||||
return None
|
||||
except Exception as e:
|
||||
logger.error(f"Error reading key file '{abs_path}': {e}")
|
||||
return None
|
||||
|
||||
# ==============================================================================
|
||||
# 4. GLOBALE DATENSTRUKTUR-VARIABLEN
|
||||
# ==============================================================================
|
||||
|
||||
# NEU: Definiert die exakte und garantierte Reihenfolge der Spalten.
|
||||
# Dies ist die neue "Single Source of Truth" für alle Index-Berechnungen.
|
||||
COLUMN_ORDER = [
|
||||
"ReEval Flag", "CRM Name", "CRM Kurzform", "Parent Account Name", "CRM Website", "CRM Ort", "CRM Land",
|
||||
"CRM Beschreibung", "CRM Branche", "CRM Beschreibung Branche extern", "CRM Anzahl Techniker", "CRM Umsatz",
|
||||
"CRM Anzahl Mitarbeiter", "CRM Vorschlag Wiki URL", "System Vorschlag Parent Account", "Parent Vorschlag Status",
|
||||
"Parent Vorschlag Timestamp", "Wiki URL", "Wiki Sitz Stadt", "Wiki Sitz Land", "Wiki Absatz", "Wiki Branche",
|
||||
"Wiki Umsatz", "Wiki Mitarbeiter", "Wiki Kategorien", "Wikipedia Timestamp", "Wiki Verif. Timestamp",
|
||||
"SerpAPI Wiki Search Timestamp", "Chat Wiki Konsistenzpruefung", "Chat Begründung Wiki Inkonsistenz",
|
||||
"Chat Vorschlag Wiki Artikel", "Begründung bei Abweichung", "Website Rohtext", "Website Zusammenfassung",
|
||||
"Website Meta-Details", "Website Scrape Timestamp", "URL Prüfstatus", "Chat Vorschlag Branche",
|
||||
"Chat Branche Konfidenz", "Chat Konsistenz Branche", "Chat Begruendung Abweichung Branche",
|
||||
"Chat Prüfung FSM Relevanz", "Chat Begründung für FSM Relevanz", "Chat Schätzung Anzahl Mitarbeiter",
|
||||
"Chat Konsistenzprüfung Mitarbeiterzahl", "Chat Begruendung Abweichung Mitarbeiterzahl",
|
||||
"Chat Einschätzung Anzahl Servicetechniker", "Chat Begründung Abweichung Anzahl Servicetechniker",
|
||||
"Chat Schätzung Umsatz", "Chat Begründung Abweichung Umsatz", "FSM Pitch", "FSM Pitch Timestamp",
|
||||
"Linked Serviceleiter gefunden", "Linked It-Leiter gefunden", "Linked Management gefunden",
|
||||
"Linked Disponent gefunden", "Contact Search Timestamp", "Finaler Umsatz (Wiki>CRM)",
|
||||
"Finaler Mitarbeiter (Wiki>CRM)", "Geschaetzter Techniker Bucket", "Plausibilität Umsatz",
|
||||
"Plausibilität Mitarbeiter", "Plausibilität Umsatz/MA Ratio", "Abweichung Umsatz CRM/Wiki",
|
||||
"Abweichung MA CRM/Wiki", "Plausibilität Begründung", "Plausibilität Prüfdatum",
|
||||
"Archiviert", "SyncConflict", "Timestamp letzte Pruefung", "Version", "Tokens", "CRM ID"
|
||||
]
|
||||
|
||||
# --- Spalten-Mapping (Single Source of Truth) ---
|
||||
# Version 1.8.0 - 68 Spalten (A-BP)
|
||||
COLUMN_MAP = {
|
||||
# A-E: Stammdaten & Prozesssteuerung
|
||||
"ReEval Flag": {"Titel": "A", "index": 0},
|
||||
"CRM Name": {"Titel": "B", "index": 1},
|
||||
"CRM Kurzform": {"Titel": "C", "index": 2},
|
||||
"Parent Account Name": {"Titel": "D", "index": 3},
|
||||
"CRM Website": {"Titel": "E", "index": 4},
|
||||
# F-M: CRM-Daten
|
||||
"CRM Ort": {"Titel": "F", "index": 5},
|
||||
"CRM Land": {"Titel": "G", "index": 6},
|
||||
"CRM Beschreibung": {"Titel": "H", "index": 7},
|
||||
"CRM Branche": {"Titel": "I", "index": 8},
|
||||
"CRM Beschreibung Branche extern": {"Titel": "J", "index": 9},
|
||||
"CRM Anzahl Techniker": {"Titel": "K", "index": 10},
|
||||
"CRM Umsatz": {"Titel": "L", "index": 11},
|
||||
"CRM Anzahl Mitarbeiter": {"Titel": "M", "index": 12},
|
||||
# N-Q: System & Parent Vorschläge
|
||||
"CRM Vorschlag Wiki URL": {"Titel": "N", "index": 13},
|
||||
"System Vorschlag Parent Account": {"Titel": "O", "index": 14},
|
||||
"Parent Vorschlag Status": {"Titel": "P", "index": 15},
|
||||
"Parent Vorschlag Timestamp": {"Titel": "Q", "index": 16},
|
||||
# R-AB: Wikipedia Extraktion
|
||||
"Wiki URL": {"Titel": "R", "index": 17},
|
||||
"Wiki Sitz Stadt": {"Titel": "S", "index": 18},
|
||||
"Wiki Sitz Land": {"Titel": "T", "index": 19},
|
||||
"Wiki Absatz": {"Titel": "U", "index": 20},
|
||||
"Wiki Branche": {"Titel": "V", "index": 21},
|
||||
"Wiki Umsatz": {"Titel": "W", "index": 22},
|
||||
"Wiki Mitarbeiter": {"Titel": "X", "index": 23},
|
||||
"Wiki Kategorien": {"Titel": "Y", "index": 24},
|
||||
"Wikipedia Timestamp": {"Titel": "Z", "index": 25},
|
||||
"Wiki Verif. Timestamp": {"Titel": "AA", "index": 26},
|
||||
"SerpAPI Wiki Search Timestamp": {"Titel": "AB", "index": 27},
|
||||
# AC-AF: ChatGPT Wiki Verifizierung
|
||||
"Chat Wiki Konsistenzpruefung": {"Titel": "AC", "index": 28},
|
||||
"Chat Begründung Wiki Inkonsistenz": {"Titel": "AD", "index": 29},
|
||||
"Chat Vorschlag Wiki Artikel": {"Titel": "AE", "index": 30},
|
||||
"Begründung bei Abweichung": {"Titel": "AF", "index": 31},
|
||||
# AG-AK: Website Scraping
|
||||
"Website Rohtext": {"Titel": "AG", "index": 32},
|
||||
"Website Zusammenfassung": {"Titel": "AH", "index": 33},
|
||||
"Website Meta-Details": {"Titel": "AI", "index": 34},
|
||||
"Website Scrape Timestamp": {"Titel": "AJ", "index": 35},
|
||||
"URL Prüfstatus": {"Titel": "AK", "index": 36},
|
||||
# AL-AU: ChatGPT Branchen & FSM Analyse
|
||||
"Chat Vorschlag Branche": {"Titel": "AL", "index": 37},
|
||||
"Chat Branche Konfidenz": {"Titel": "AM", "index": 38},
|
||||
"Chat Konsistenz Branche": {"Titel": "AN", "index": 39},
|
||||
"Chat Begruendung Abweichung Branche": {"Titel": "AO", "index": 40},
|
||||
"Chat Prüfung FSM Relevanz": {"Titel": "AP", "index": 41},
|
||||
"Chat Begründung für FSM Relevanz": {"Titel": "AQ", "index": 42},
|
||||
"Chat Schätzung Anzahl Mitarbeiter": {"Titel": "AR", "index": 43},
|
||||
"Chat Konsistenzprüfung Mitarbeiterzahl": {"Titel": "AS", "index": 44},
|
||||
"Chat Begruendung Abweichung Mitarbeiterzahl": {"Titel": "AT", "index": 45},
|
||||
"Chat Einschätzung Anzahl Servicetechniker": {"Titel": "AU", "index": 46},
|
||||
# AV-AZ: ChatGPT Fortsetzung & FSM Pitch
|
||||
"Chat Begründung Abweichung Anzahl Servicetechniker": {"Titel": "AV", "index": 47},
|
||||
"Chat Schätzung Umsatz": {"Titel": "AW", "index": 48},
|
||||
"Chat Begründung Abweichung Umsatz": {"Titel": "AX", "index": 49},
|
||||
"FSM Pitch": {"Titel": "AY", "index": 50},
|
||||
"FSM Pitch Timestamp": {"Titel": "AZ", "index": 51},
|
||||
# BA-BE: LinkedIn Kontaktsuche
|
||||
"Linked Serviceleiter gefunden": {"Titel": "BA", "index": 52},
|
||||
"Linked It-Leiter gefunden": {"Titel": "BB", "index": 53},
|
||||
"Linked Management gefunden": {"Titel": "BC", "index": 54},
|
||||
"Linked Disponent gefunden": {"Titel": "BD", "index": 55},
|
||||
"Contact Search Timestamp": {"Titel": "BE", "index": 56},
|
||||
# BF-BH: Konsolidierte Daten & ML
|
||||
"Finaler Umsatz (Wiki>CRM)": {"Titel": "BF", "index": 57},
|
||||
"Finaler Mitarbeiter (Wiki>CRM)": {"Titel": "BG", "index": 58},
|
||||
"Geschaetzter Techniker Bucket": {"Titel": "BH", "index": 59},
|
||||
# BI-BO: Plausibilitäts-Checks
|
||||
"Plausibilität Umsatz": {"Titel": "BI", "index": 60},
|
||||
"Plausibilität Mitarbeiter": {"Titel": "BJ", "index": 61},
|
||||
"Plausibilität Umsatz/MA Ratio": {"Titel": "BK", "index": 62},
|
||||
"Abweichung Umsatz CRM/Wiki": {"Titel": "BL", "index": 63},
|
||||
"Abweichung MA CRM/Wiki": {"Titel": "BM", "index": 64},
|
||||
"Plausibilität Begründung": {"Titel": "BN", "index": 65},
|
||||
"Plausibilität Prüfdatum": {"Titel": "BO", "index": 66},
|
||||
"Archiviert": {"Titel": "BP", "index": 67},
|
||||
"SyncConflict": {"Titel": "BQ", "index": 68},
|
||||
# BR-BU: Metadaten (Indizes verschoben)
|
||||
"Timestamp letzte Pruefung": {"Titel": "BR", "index": 69},
|
||||
"Version": {"Titel": "BS", "index": 70},
|
||||
"Tokens": {"Titel": "BT", "index": 71},
|
||||
"CRM ID": {"Titel": "BU", "index": 72}
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# 5. DEALFRONT AUTOMATION CONFIGURATION
|
||||
# ==============================================================================
|
||||
DEALFRONT_CREDENTIALS_FILE = os.path.join(BASE_DIR, "dealfront_credentials.json")
|
||||
DEALFRONT_LOGIN_URL = "https://app.dealfront.com/login"
|
||||
|
||||
# Die direkte URL zum 'Target'-Bereich. Dies hat sich als der robusteste Weg erwiesen.
|
||||
DEALFRONT_TARGET_URL = "https://app.dealfront.com/t/prospector/companies"
|
||||
|
||||
# WICHTIG: Der exakte Name der vordefinierten Suche, die nach der Navigation geladen werden soll.
|
||||
TARGET_SEARCH_NAME = "Facility Management" # <-- PASSEN SIE DIESEN NAMEN AN IHRE ZIEL-LISTE AN
|
||||
|
||||
|
||||
# --- END OF FILE config.py ---
|
||||
412
_legacy_gsheets_system/helpers.py
Normal file
412
_legacy_gsheets_system/helpers.py
Normal file
@@ -0,0 +1,412 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
helpers.py
|
||||
|
||||
Sammlung von globalen, wiederverwendbaren Hilfsfunktionen für das Projekt
|
||||
"Automatisierte Unternehmensbewertung". Enthält Decorators, Text-Normalisierung,
|
||||
API-Wrapper und andere Dienstprogramme.
|
||||
"""
|
||||
|
||||
__version__ = "v2.4.0_Final_Fix"
|
||||
|
||||
ALLOWED_TARGET_BRANCHES = []
|
||||
|
||||
# ==============================================================================
|
||||
# 1. IMPORTS
|
||||
# ==============================================================================
|
||||
# Standardbibliotheken
|
||||
import os
|
||||
import time
|
||||
import re
|
||||
import csv
|
||||
import json
|
||||
import random
|
||||
import logging
|
||||
import traceback
|
||||
import unicodedata
|
||||
from datetime import datetime
|
||||
from urllib.parse import urlparse, unquote
|
||||
from difflib import SequenceMatcher
|
||||
import base64
|
||||
import sys
|
||||
|
||||
# Externe Bibliotheken
|
||||
try:
|
||||
import gspread
|
||||
GSPREAD_AVAILABLE = True
|
||||
except ImportError:
|
||||
GSPREAD_AVAILABLE = False
|
||||
gspread = None
|
||||
try:
|
||||
import wikipedia
|
||||
WIKIPEDIA_AVAILABLE = True
|
||||
except ImportError:
|
||||
WIKIPEDIA_AVAILABLE = False
|
||||
wikipedia = None
|
||||
import requests
|
||||
from bs4 import BeautifulSoup
|
||||
try:
|
||||
import pandas as pd
|
||||
PANDAS_AVAILABLE = True
|
||||
except Exception as e:
|
||||
logging.warning(f"Pandas import failed: {e}")
|
||||
PANDAS_AVAILABLE = False
|
||||
pd = None
|
||||
|
||||
# --- KI UMSCHALTUNG: Google Generative AI (Dual Support) ---
|
||||
HAS_NEW_GENAI = False
|
||||
HAS_OLD_GENAI = False
|
||||
|
||||
# 1. Neue Bibliothek (google-genai)
|
||||
try:
|
||||
from google import genai
|
||||
from google.genai import types
|
||||
HAS_NEW_GENAI = True
|
||||
logging.info("Bibliothek 'google.genai' (v1.0+) geladen.")
|
||||
except ImportError:
|
||||
logging.warning("Bibliothek 'google.genai' nicht gefunden. Versuche Fallback.")
|
||||
|
||||
# 2. Alte Bibliothek (google-generativeai)
|
||||
try:
|
||||
import google.generativeai as old_genai
|
||||
HAS_OLD_GENAI = True
|
||||
logging.info("Bibliothek 'google.generativeai' (Legacy) geladen.")
|
||||
except ImportError:
|
||||
logging.warning("Bibliothek 'google.generativeai' nicht gefunden.")
|
||||
|
||||
HAS_GEMINI = HAS_NEW_GENAI or HAS_OLD_GENAI
|
||||
|
||||
# OpenAI Imports (Legacy)
|
||||
try:
|
||||
import openai
|
||||
from openai.error import AuthenticationError, OpenAIError, RateLimitError, APIError, Timeout, InvalidRequestError, ServiceUnavailableError
|
||||
OPENAI_AVAILABLE = True
|
||||
except ImportError:
|
||||
OPENAI_AVAILABLE = False
|
||||
class AuthenticationError(Exception): pass
|
||||
class OpenAIError(Exception): pass
|
||||
class RateLimitError(Exception): pass
|
||||
class APIError(Exception): pass
|
||||
class Timeout(Exception): pass
|
||||
class InvalidRequestError(Exception): pass
|
||||
class ServiceUnavailableError(Exception): pass
|
||||
|
||||
from config import (Config, BRANCH_MAPPING_FILE, URL_CHECK_MARKER, USER_AGENTS, LOG_DIR)
|
||||
from config import Config, COLUMN_MAP, COLUMN_ORDER
|
||||
|
||||
# Optionale Bibliotheken
|
||||
try:
|
||||
import tiktoken
|
||||
except ImportError:
|
||||
tiktoken = None
|
||||
|
||||
gender = None
|
||||
gender_detector = None
|
||||
|
||||
def get_col_idx(key):
|
||||
try:
|
||||
return COLUMN_ORDER.index(key)
|
||||
except ValueError:
|
||||
return None
|
||||
|
||||
# ==============================================================================
|
||||
# 2. RETRY DECORATOR
|
||||
# ==============================================================================
|
||||
decorator_logger = logging.getLogger(__name__ + ".Retry")
|
||||
|
||||
def retry_on_failure(func):
|
||||
def wrapper(*args, **kwargs):
|
||||
func_name = func.__name__
|
||||
self_arg = args[0] if args and hasattr(args[0], func_name) and isinstance(args[0], object) else None
|
||||
effective_func_name = f"{self_arg.__class__.__name__}.{func_name}" if self_arg else func_name
|
||||
|
||||
max_retries_config = getattr(Config, 'MAX_RETRIES', 3)
|
||||
base_delay = getattr(Config, 'RETRY_DELAY', 5)
|
||||
|
||||
if max_retries_config <= 0:
|
||||
return func(*args, **kwargs)
|
||||
|
||||
for attempt in range(max_retries_config):
|
||||
try:
|
||||
if attempt > 0:
|
||||
decorator_logger.warning(f"Wiederhole Versuch {attempt + 1}/{max_retries_config} fuer '{effective_func_name}'...")
|
||||
return func(*args, **kwargs)
|
||||
|
||||
except Exception as e:
|
||||
permanent_errors = [ValueError]
|
||||
if GSPREAD_AVAILABLE:
|
||||
permanent_errors.append(gspread.exceptions.SpreadsheetNotFound)
|
||||
|
||||
if any(isinstance(e, error_type) for error_type in permanent_errors):
|
||||
raise e
|
||||
|
||||
if attempt < max_retries_config - 1:
|
||||
wait_time = base_delay * (2 ** attempt) + random.uniform(0, 1)
|
||||
time.sleep(wait_time)
|
||||
else:
|
||||
raise e
|
||||
raise RuntimeError(f"Retry loop error for {effective_func_name}")
|
||||
|
||||
return wrapper
|
||||
|
||||
# ==============================================================================
|
||||
# 3. LOGGING & UTILS
|
||||
# ==============================================================================
|
||||
|
||||
def token_count(text, model=None):
|
||||
if not text or not isinstance(text, str): return 0
|
||||
return len(str(text).split())
|
||||
|
||||
def log_module_versions(modules_to_log):
|
||||
pass
|
||||
|
||||
def create_log_filename(mode):
|
||||
try:
|
||||
now = datetime.now().strftime("%Y-%m-%d_%H-%M")
|
||||
ver_short = getattr(Config, 'VERSION', 'unknown').replace(".", "")
|
||||
return os.path.join(LOG_DIR, f"{now}_{ver_short}_Modus-{mode}.txt")
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
# ==============================================================================
|
||||
# 4. TEXT, STRING & URL UTILITIES
|
||||
# ==============================================================================
|
||||
def simple_normalize_url(url): return url if url else "k.A."
|
||||
def normalize_string(s): return s
|
||||
def clean_text(text): return str(text).strip() if text else "k.A."
|
||||
def normalize_company_name(name): return name.lower().strip() if name else ""
|
||||
def _get_col_letter(col_num): return ""
|
||||
def fuzzy_similarity(str1, str2): return 0.0
|
||||
def extract_numeric_value(raw_value, is_umsatz=False): return "k.A."
|
||||
def get_numeric_filter_value(value_str, is_umsatz=False): return 0.0
|
||||
@retry_on_failure
|
||||
def _call_genderize_api(name, api_key): return {}
|
||||
def get_gender(firstname): return "unknown"
|
||||
def get_email_address(firstname, lastname, website): return ""
|
||||
|
||||
# ==============================================================================
|
||||
# 8. GEMINI API WRAPPERS
|
||||
# ==============================================================================
|
||||
|
||||
def _get_gemini_api_key():
|
||||
api_key = Config.API_KEYS.get('gemini') or Config.API_KEYS.get('openai')
|
||||
if api_key: return api_key
|
||||
api_key = os.environ.get("GEMINI_API_KEY") or os.environ.get("OPENAI_API_KEY")
|
||||
if api_key: return api_key
|
||||
raise ValueError("API Key missing.")
|
||||
|
||||
@retry_on_failure
|
||||
def call_gemini_flash(prompt, system_instruction=None, temperature=0.3, json_mode=False):
|
||||
"""
|
||||
Ruft Gemini auf (Text). Nutzt gemini-2.0-flash als Standard.
|
||||
"""
|
||||
logger = logging.getLogger(__name__)
|
||||
api_key = _get_gemini_api_key()
|
||||
|
||||
# Priorität 1: Alte Bibliothek (bewährt für Text in diesem Setup)
|
||||
if HAS_OLD_GENAI:
|
||||
try:
|
||||
old_genai.configure(api_key=api_key)
|
||||
generation_config = {
|
||||
"temperature": temperature,
|
||||
"top_p": 0.95,
|
||||
"top_k": 40,
|
||||
"max_output_tokens": 8192,
|
||||
}
|
||||
if json_mode:
|
||||
generation_config["response_mime_type"] = "application/json"
|
||||
|
||||
# WICHTIG: Nutze 2.0, da 1.5 nicht verfügbar war
|
||||
model = old_genai.GenerativeModel(
|
||||
model_name="gemini-2.0-flash",
|
||||
generation_config=generation_config,
|
||||
system_instruction=system_instruction
|
||||
)
|
||||
contents = [prompt] if isinstance(prompt, str) else prompt
|
||||
response = model.generate_content(contents)
|
||||
return response.text.strip()
|
||||
except Exception as e:
|
||||
logger.error(f"Fehler mit alter GenAI Lib: {e}")
|
||||
if not HAS_NEW_GENAI: raise e
|
||||
# Fallthrough to new lib
|
||||
|
||||
# Priorität 2: Neue Bibliothek
|
||||
if HAS_NEW_GENAI:
|
||||
try:
|
||||
client = genai.Client(api_key=api_key)
|
||||
config = {
|
||||
"temperature": temperature,
|
||||
"top_p": 0.95,
|
||||
"top_k": 40,
|
||||
"max_output_tokens": 8192,
|
||||
}
|
||||
if json_mode:
|
||||
config["response_mime_type"] = "application/json"
|
||||
|
||||
response = client.models.generate_content(
|
||||
model="gemini-2.0-flash",
|
||||
contents=[prompt] if isinstance(prompt, str) else prompt,
|
||||
config=config
|
||||
)
|
||||
return response.text.strip()
|
||||
except Exception as e:
|
||||
logger.error(f"Fehler mit neuer GenAI Lib: {e}")
|
||||
raise e
|
||||
|
||||
raise ImportError("Keine Gemini Bibliothek verfügbar.")
|
||||
|
||||
@retry_on_failure
|
||||
def call_gemini_image(prompt, reference_image_b64=None, aspect_ratio=None):
|
||||
"""
|
||||
Generiert ein Bild.
|
||||
- Mit Referenzbild: Gemini 2.5 Flash Image.
|
||||
- Ohne Referenzbild: Imagen 4.0.
|
||||
- NEU: Akzeptiert `aspect_ratio` (z.B. "16:9").
|
||||
- NEU: Wendet einen zentralen Corporate Design Prompt an.
|
||||
"""
|
||||
logger = logging.getLogger(__name__)
|
||||
api_key = _get_gemini_api_key()
|
||||
|
||||
if HAS_NEW_GENAI:
|
||||
try:
|
||||
client = genai.Client(api_key=api_key)
|
||||
|
||||
# --- FALL A: REFERENZBILD VORHANDEN (Gemini 2.5) ---
|
||||
if reference_image_b64:
|
||||
try:
|
||||
from PIL import Image
|
||||
import io
|
||||
except ImportError:
|
||||
raise ImportError("Pillow (PIL) fehlt. Bitte 'pip install Pillow' ausführen.")
|
||||
|
||||
logger.info(f"Start Image-to-Image Generation mit gemini-2.5-flash-image. Seitenverhältnis: {aspect_ratio or 'default'}")
|
||||
|
||||
# Base64 zu PIL Image
|
||||
try:
|
||||
if "," in reference_image_b64:
|
||||
reference_image_b64 = reference_image_b64.split(",")[1]
|
||||
image_data = base64.b64decode(reference_image_b64)
|
||||
raw_image = Image.open(io.BytesIO(image_data))
|
||||
except Exception as e:
|
||||
logger.error(f"Fehler beim Laden des Referenzbildes: {e}")
|
||||
raise ValueError("Ungültiges Referenzbild.")
|
||||
|
||||
# Strengerer Prompt
|
||||
full_prompt = (
|
||||
"Use the provided reference image as the absolute truth. "
|
||||
f"Place EXACTLY this product into the scene: {prompt}. "
|
||||
"Do NOT alter the product's design, shape, or colors. "
|
||||
"Keep the product 100% identical to the reference. "
|
||||
"Only adjust lighting and perspective to match the scene."
|
||||
)
|
||||
|
||||
# Hier können wir das Seitenverhältnis nicht direkt steuern,
|
||||
# da es vom Referenzbild abhängt. Wir könnten es aber in den Prompt einbauen.
|
||||
if aspect_ratio:
|
||||
full_prompt += f" The final image composition should have an aspect ratio of {aspect_ratio}."
|
||||
|
||||
response = client.models.generate_content(
|
||||
model='gemini-2.5-flash-image',
|
||||
contents=[raw_image, full_prompt]
|
||||
)
|
||||
|
||||
if response.candidates and response.candidates[0].content.parts:
|
||||
for part in response.candidates[0].content.parts:
|
||||
if part.inline_data:
|
||||
return base64.b64encode(part.inline_data.data).decode('utf-8')
|
||||
|
||||
raise ValueError("Gemini 2.5 hat kein Bild zurückgeliefert.")
|
||||
|
||||
# --- FALL B: KEIN REFERENZBILD (Imagen 4) ---
|
||||
else:
|
||||
img_config = {
|
||||
"number_of_images": 1,
|
||||
"output_mime_type": "image/jpeg",
|
||||
}
|
||||
# Füge Seitenverhältnis hinzu, falls vorhanden
|
||||
if aspect_ratio in ["16:9", "9:16", "1:1", "4:3"]:
|
||||
img_config["aspect_ratio"] = aspect_ratio
|
||||
logger.info(f"Seitenverhältnis auf {aspect_ratio} gesetzt.")
|
||||
|
||||
# Wende zentralen Stil an
|
||||
final_prompt = f"{Config.CORPORATE_DESIGN_PROMPT}\n\nTask: {prompt}"
|
||||
|
||||
method = getattr(client.models, 'generate_images', None)
|
||||
if not method:
|
||||
available_methods = [m for m in dir(client.models) if not m.startswith('_')]
|
||||
raise AttributeError(f"Client hat keine Image-Methode. Verfügbar: {available_methods}")
|
||||
|
||||
candidates = [
|
||||
'imagen-4.0-generate-001',
|
||||
'imagen-4.0-fast-generate-001',
|
||||
'imagen-4.0-ultra-generate-001'
|
||||
]
|
||||
|
||||
last_error = None
|
||||
for model_name in candidates:
|
||||
try:
|
||||
logger.info(f"Versuche Text-zu-Bild mit Modell: {model_name}")
|
||||
response = method(
|
||||
model=model_name,
|
||||
prompt=final_prompt,
|
||||
config=img_config
|
||||
)
|
||||
|
||||
if response.generated_images:
|
||||
image_bytes = response.generated_images[0].image.image_bytes
|
||||
return base64.b64encode(image_bytes).decode('utf-8')
|
||||
except Exception as e:
|
||||
logger.warning(f"Modell {model_name} fehlgeschlagen: {e}")
|
||||
last_error = e
|
||||
|
||||
if last_error: raise last_error
|
||||
raise ValueError("Kein Modell konnte Bilder generieren.")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Fehler bei Image Gen: {e}")
|
||||
raise e
|
||||
else:
|
||||
logger.error("Image Generation erfordert die neue 'google-genai' Bibliothek.")
|
||||
raise ImportError("Installieren Sie 'google-genai' für Bildgenerierung.")
|
||||
|
||||
@retry_on_failure
|
||||
def call_openai_chat(prompt, temperature=0.3, model=None, response_format_json=False):
|
||||
return call_gemini_flash(
|
||||
prompt=prompt,
|
||||
temperature=temperature,
|
||||
json_mode=response_format_json,
|
||||
system_instruction=None
|
||||
)
|
||||
|
||||
def summarize_website_content(raw_text, company_name): return "k.A."
|
||||
def summarize_wikipedia_article(full_text, company_name): return "k.A."
|
||||
def evaluate_branche_chatgpt(company_name, website_summary, wiki_absatz): return {}
|
||||
def evaluate_branches_batch(companies_data): return []
|
||||
def verify_wiki_article_chatgpt(company_name, parent_name, website, wiki_title, wiki_summary): return {}
|
||||
def generate_fsm_pitch(company_name, company_short_name, ki_branche, website_summary, wiki_absatz, anzahl_ma, anzahl_techniker, techniker_bucket_ml): return ""
|
||||
def serp_website_lookup(company_name): return "k.A."
|
||||
def search_linkedin_contacts(company_name, website, position_query, crm_kurzform, num_results=10): return []
|
||||
def get_website_raw(url, max_length=30000, verify_cert=False): return "k.A."
|
||||
|
||||
def scrape_website_details(url):
|
||||
logger = logging.getLogger(__name__)
|
||||
if not url or not isinstance(url, str) or not url.startswith('http'):
|
||||
return "Keine gültige URL angegeben."
|
||||
try:
|
||||
headers = {'User-Agent': random.choice(USER_AGENTS)}
|
||||
response = requests.get(url, headers=headers, timeout=getattr(Config, 'REQUEST_TIMEOUT', 15), verify=False)
|
||||
response.raise_for_status()
|
||||
if 'text/html' not in response.headers.get('Content-Type', ''): return "Kein HTML."
|
||||
soup = BeautifulSoup(response.content, 'html.parser')
|
||||
for element in soup(['script', 'style', 'noscript', 'iframe', 'svg', 'header', 'footer', 'nav', 'aside', 'form', 'button', 'a']):
|
||||
element.decompose()
|
||||
body = soup.find('body')
|
||||
text = body.get_text(separator=' ', strip=True) if body else soup.get_text(separator=' ', strip=True)
|
||||
text = re.sub(r'\s+', ' ', text).strip()
|
||||
return text[:25000] if text else "Leer."
|
||||
except Exception as e:
|
||||
logger.error(f"Fehler URL {url}: {e}")
|
||||
return "Fehler beim Scraping."
|
||||
|
||||
def is_valid_wikipedia_article_url(url): return False
|
||||
def alignment_demo(sheet_handler): pass
|
||||
7
cat_log.py
Normal file
7
cat_log.py
Normal file
@@ -0,0 +1,7 @@
|
||||
import sys
|
||||
try:
|
||||
file_path = sys.argv[1] if len(sys.argv) > 1 else 'company-explorer/logs_debug/company_explorer_debug.log'
|
||||
with open(file_path, 'r') as f:
|
||||
print(f.read())
|
||||
except Exception as e:
|
||||
print(f"Error reading {file_path}: {e}")
|
||||
36
company-explorer/Dockerfile
Normal file
36
company-explorer/Dockerfile
Normal file
@@ -0,0 +1,36 @@
|
||||
# --- STAGE 1: Build Frontend ---
|
||||
FROM node:20-slim AS frontend-builder
|
||||
WORKDIR /build
|
||||
COPY frontend/package*.json ./
|
||||
RUN npm install
|
||||
COPY frontend/ ./
|
||||
RUN npm run build
|
||||
|
||||
# --- STAGE 2: Backend & Runtime ---
|
||||
FROM python:3.11-slim
|
||||
WORKDIR /app
|
||||
|
||||
# System Dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
build-essential \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Copy Requirements & Install
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Copy Built Frontend from Stage 1 (To a safe location outside /app)
|
||||
COPY --from=frontend-builder /build/dist /frontend_static
|
||||
|
||||
# Copy Backend Source
|
||||
COPY backend ./backend
|
||||
|
||||
# Environment Variables
|
||||
ENV PYTHONPATH=/app
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
|
||||
# Expose Port
|
||||
EXPOSE 8000
|
||||
|
||||
# Start FastAPI
|
||||
CMD ["uvicorn", "backend.app:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
314
company-explorer/backend/app.py
Normal file
314
company-explorer/backend/app.py
Normal file
@@ -0,0 +1,314 @@
|
||||
from fastapi import FastAPI, Depends, HTTPException, Query, BackgroundTasks
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from fastapi.staticfiles import StaticFiles
|
||||
from fastapi.responses import FileResponse
|
||||
from sqlalchemy.orm import Session, joinedload
|
||||
from typing import List, Optional, Dict, Any
|
||||
from pydantic import BaseModel
|
||||
from datetime import datetime
|
||||
import os
|
||||
import sys
|
||||
|
||||
from .config import settings
|
||||
from .lib.logging_setup import setup_logging
|
||||
|
||||
# Setup Logging first
|
||||
setup_logging()
|
||||
import logging
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
from .database import init_db, get_db, Company, Signal, EnrichmentData
|
||||
from .services.deduplication import Deduplicator
|
||||
from .services.discovery import DiscoveryService
|
||||
from .services.scraping import ScraperService
|
||||
from .services.classification import ClassificationService
|
||||
|
||||
# Initialize App
|
||||
app = FastAPI(
|
||||
title=settings.APP_NAME,
|
||||
version=settings.VERSION,
|
||||
description="Backend for Company Explorer (Robotics Edition)",
|
||||
root_path="/ce"
|
||||
)
|
||||
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
# Service Singletons
|
||||
scraper = ScraperService()
|
||||
classifier = ClassificationService()
|
||||
discovery = DiscoveryService()
|
||||
|
||||
# --- Pydantic Models ---
|
||||
class CompanyCreate(BaseModel):
|
||||
name: str
|
||||
city: Optional[str] = None
|
||||
country: str = "DE"
|
||||
website: Optional[str] = None
|
||||
|
||||
class BulkImportRequest(BaseModel):
|
||||
names: List[str]
|
||||
|
||||
class AnalysisRequest(BaseModel):
|
||||
company_id: int
|
||||
force_scrape: bool = False
|
||||
|
||||
# --- Events ---
|
||||
@app.on_event("startup")
|
||||
def on_startup():
|
||||
logger.info("Startup Event: Initializing Database...")
|
||||
try:
|
||||
init_db()
|
||||
logger.info("Database initialized successfully.")
|
||||
except Exception as e:
|
||||
logger.critical(f"Database init failed: {e}", exc_info=True)
|
||||
|
||||
# --- Routes ---
|
||||
|
||||
@app.get("/api/health")
|
||||
def health_check():
|
||||
return {"status": "ok", "version": settings.VERSION, "db": settings.DATABASE_URL}
|
||||
|
||||
@app.get("/api/companies")
|
||||
def list_companies(
|
||||
skip: int = 0,
|
||||
limit: int = 50,
|
||||
search: Optional[str] = None,
|
||||
db: Session = Depends(get_db)
|
||||
):
|
||||
try:
|
||||
query = db.query(Company)
|
||||
if search:
|
||||
query = query.filter(Company.name.ilike(f"%{search}%"))
|
||||
|
||||
total = query.count()
|
||||
# Sort by ID desc (newest first)
|
||||
items = query.order_by(Company.id.desc()).offset(skip).limit(limit).all()
|
||||
|
||||
return {"total": total, "items": items}
|
||||
except Exception as e:
|
||||
logger.error(f"List Companies Error: {e}", exc_info=True)
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
|
||||
@app.get("/api/companies/{company_id}")
|
||||
def get_company(company_id: int, db: Session = Depends(get_db)):
|
||||
company = db.query(Company).options(joinedload(Company.signals)).filter(Company.id == company_id).first()
|
||||
if not company:
|
||||
raise HTTPException(status_code=404, detail="Company not found")
|
||||
return company
|
||||
|
||||
@app.post("/api/companies/bulk")
|
||||
def bulk_import_names(req: BulkImportRequest, db: Session = Depends(get_db)):
|
||||
"""
|
||||
Quick import for testing. Just a list of names.
|
||||
"""
|
||||
logger.info(f"Starting bulk import of {len(req.names)} names.")
|
||||
try:
|
||||
added = 0
|
||||
skipped = 0
|
||||
|
||||
# Deduplicator init
|
||||
try:
|
||||
dedup = Deduplicator(db)
|
||||
logger.info("Deduplicator initialized.")
|
||||
except Exception as e:
|
||||
logger.warning(f"Deduplicator init failed: {e}")
|
||||
dedup = None
|
||||
|
||||
for name in req.names:
|
||||
clean_name = name.strip()
|
||||
if not clean_name: continue
|
||||
|
||||
# 1. Simple Deduplication (Exact Name)
|
||||
exists = db.query(Company).filter(Company.name == clean_name).first()
|
||||
if exists:
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
# 2. Smart Deduplication (if available)
|
||||
if dedup:
|
||||
matches = dedup.find_duplicates({"name": clean_name})
|
||||
if matches and matches[0]['score'] > 95:
|
||||
logger.info(f"Duplicate found for {clean_name}: {matches[0]['name']}")
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
# 3. Create
|
||||
new_comp = Company(
|
||||
name=clean_name,
|
||||
status="NEW" # This triggered the error before
|
||||
)
|
||||
db.add(new_comp)
|
||||
added += 1
|
||||
|
||||
db.commit()
|
||||
logger.info(f"Import success. Added: {added}, Skipped: {skipped}")
|
||||
return {"added": added, "skipped": skipped}
|
||||
except Exception as e:
|
||||
logger.error(f"Bulk Import Failed: {e}", exc_info=True)
|
||||
db.rollback()
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
|
||||
@app.post("/api/enrich/discover")
|
||||
def discover_company(req: AnalysisRequest, background_tasks: BackgroundTasks, db: Session = Depends(get_db)):
|
||||
"""
|
||||
Triggers Stage 1: Discovery (Website Search + Wikipedia Search)
|
||||
"""
|
||||
try:
|
||||
company = db.query(Company).filter(Company.id == req.company_id).first()
|
||||
if not company:
|
||||
raise HTTPException(404, "Company not found")
|
||||
|
||||
# Run in background
|
||||
background_tasks.add_task(run_discovery_task, company.id)
|
||||
|
||||
return {"status": "queued", "message": f"Discovery started for {company.name}"}
|
||||
except Exception as e:
|
||||
logger.error(f"Discovery Error: {e}")
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
|
||||
def run_discovery_task(company_id: int):
|
||||
# New Session for Background Task
|
||||
from .database import SessionLocal
|
||||
db = SessionLocal()
|
||||
try:
|
||||
company = db.query(Company).filter(Company.id == company_id).first()
|
||||
if not company: return
|
||||
|
||||
logger.info(f"Running Discovery Task for {company.name}")
|
||||
|
||||
# 1. Website Search
|
||||
if not company.website or company.website == "k.A.":
|
||||
found_url = discovery.find_company_website(company.name, company.city)
|
||||
if found_url and found_url != "k.A.":
|
||||
company.website = found_url
|
||||
logger.info(f"-> Found URL: {found_url}")
|
||||
|
||||
# 2. Wikipedia Search
|
||||
wiki_url = discovery.find_wikipedia_url(company.name)
|
||||
company.last_wiki_search_at = datetime.utcnow()
|
||||
|
||||
existing_wiki = db.query(EnrichmentData).filter(
|
||||
EnrichmentData.company_id == company.id,
|
||||
EnrichmentData.source_type == "wikipedia_url"
|
||||
).first()
|
||||
|
||||
if not existing_wiki:
|
||||
db.add(EnrichmentData(company_id=company.id, source_type="wikipedia_url", content={"url": wiki_url}))
|
||||
else:
|
||||
existing_wiki.content = {"url": wiki_url}
|
||||
existing_wiki.updated_at = datetime.utcnow()
|
||||
|
||||
if company.status == "NEW" and company.website and company.website != "k.A.":
|
||||
company.status = "DISCOVERED"
|
||||
|
||||
db.commit()
|
||||
logger.info(f"Discovery finished for {company.id}")
|
||||
except Exception as e:
|
||||
logger.error(f"Background Task Error: {e}", exc_info=True)
|
||||
db.rollback()
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
@app.post("/api/enrich/analyze")
|
||||
def analyze_company(req: AnalysisRequest, background_tasks: BackgroundTasks, db: Session = Depends(get_db)):
|
||||
company = db.query(Company).filter(Company.id == req.company_id).first()
|
||||
if not company:
|
||||
raise HTTPException(404, "Company not found")
|
||||
|
||||
if not company.website or company.website == "k.A.":
|
||||
return {"error": "No website to analyze. Run Discovery first."}
|
||||
|
||||
background_tasks.add_task(run_analysis_task, company.id, company.website)
|
||||
return {"status": "queued"}
|
||||
|
||||
def run_analysis_task(company_id: int, url: str):
|
||||
from .database import SessionLocal
|
||||
db = SessionLocal()
|
||||
try:
|
||||
company = db.query(Company).filter(Company.id == company_id).first()
|
||||
if not company: return
|
||||
|
||||
logger.info(f"Running Analysis Task for {company.name}")
|
||||
|
||||
# 1. Scrape Website
|
||||
scrape_result = scraper.scrape_url(url)
|
||||
|
||||
# Save Scrape Data
|
||||
existing_scrape_data = db.query(EnrichmentData).filter(
|
||||
EnrichmentData.company_id == company.id,
|
||||
EnrichmentData.source_type == "website_scrape"
|
||||
).first()
|
||||
|
||||
if "text" in scrape_result and scrape_result["text"]:
|
||||
if not existing_scrape_data:
|
||||
db.add(EnrichmentData(company_id=company.id, source_type="website_scrape", content=scrape_result))
|
||||
else:
|
||||
existing_scrape_data.content = scrape_result
|
||||
existing_scrape_data.updated_at = datetime.utcnow()
|
||||
elif "error" in scrape_result:
|
||||
logger.warning(f"Scraping failed for {company.name}: {scrape_result['error']}")
|
||||
|
||||
# 2. Classify Robotics Potential
|
||||
if "text" in scrape_result and scrape_result["text"]:
|
||||
analysis = classifier.analyze_robotics_potential(
|
||||
company_name=company.name,
|
||||
website_text=scrape_result["text"]
|
||||
)
|
||||
|
||||
if "error" in analysis:
|
||||
logger.error(f"Robotics classification failed for {company.name}: {analysis['error']}")
|
||||
else:
|
||||
industry = analysis.get("industry")
|
||||
if industry:
|
||||
company.industry_ai = industry
|
||||
|
||||
# Delete old signals
|
||||
db.query(Signal).filter(Signal.company_id == company.id).delete()
|
||||
|
||||
# Save new signals
|
||||
potentials = analysis.get("potentials", {})
|
||||
for signal_type, data in potentials.items():
|
||||
new_signal = Signal(
|
||||
company_id=company.id,
|
||||
signal_type=f"robotics_{signal_type}_potential",
|
||||
confidence=data.get("score", 0),
|
||||
value="High" if data.get("score", 0) > 70 else "Medium" if data.get("score", 0) > 30 else "Low",
|
||||
proof_text=data.get("reason")
|
||||
)
|
||||
db.add(new_signal)
|
||||
|
||||
company.status = "ENRICHED"
|
||||
company.last_classification_at = datetime.utcnow()
|
||||
logger.info(f"Robotics analysis complete for {company.name}.")
|
||||
|
||||
db.commit()
|
||||
logger.info(f"Analysis finished for {company.id}")
|
||||
except Exception as e:
|
||||
logger.error(f"Analyze Task Error: {e}", exc_info=True)
|
||||
db.rollback()
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
# --- Serve Frontend ---
|
||||
# Priority 1: Container Path (outside of /app volume)
|
||||
static_path = "/frontend_static"
|
||||
|
||||
# Priority 2: Local Dev Path (relative to this file)
|
||||
if not os.path.exists(static_path):
|
||||
static_path = os.path.join(os.path.dirname(__file__), "../static")
|
||||
|
||||
if os.path.exists(static_path):
|
||||
logger.info(f"Serving frontend from {static_path}")
|
||||
app.mount("/", StaticFiles(directory=static_path, html=True), name="static")
|
||||
else:
|
||||
logger.warning(f"Frontend static files not found at {static_path} or local fallback.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
uvicorn.run("backend.app:app", host="0.0.0.0", port=8000, reload=True)
|
||||
63
company-explorer/backend/config.py
Normal file
63
company-explorer/backend/config.py
Normal file
@@ -0,0 +1,63 @@
|
||||
import os
|
||||
import logging
|
||||
from typing import Optional
|
||||
|
||||
# Versuche Pydantic zu nutzen, Fallback auf os.environ
|
||||
try:
|
||||
from pydantic_settings import BaseSettings
|
||||
|
||||
class Settings(BaseSettings):
|
||||
# App Info
|
||||
APP_NAME: str = "Company Explorer"
|
||||
VERSION: str = "0.2.2"
|
||||
DEBUG: bool = True
|
||||
|
||||
# Database (Store in App dir for simplicity)
|
||||
DATABASE_URL: str = "sqlite:////app/companies_v3_final.db"
|
||||
|
||||
# API Keys
|
||||
GEMINI_API_KEY: Optional[str] = None
|
||||
OPENAI_API_KEY: Optional[str] = None
|
||||
SERP_API_KEY: Optional[str] = None
|
||||
|
||||
# Paths
|
||||
LOG_DIR: str = "/app/logs_debug"
|
||||
|
||||
class Config:
|
||||
env_file = ".env"
|
||||
|
||||
settings = Settings()
|
||||
|
||||
except ImportError:
|
||||
# Fallback wenn pydantic-settings nicht installiert ist
|
||||
class Settings:
|
||||
APP_NAME = "Company Explorer"
|
||||
VERSION = "0.2.1"
|
||||
DEBUG = True
|
||||
DATABASE_URL = "sqlite:////app/logs_debug/companies_debug.db"
|
||||
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
|
||||
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
|
||||
SERP_API_KEY = os.getenv("SERP_API_KEY")
|
||||
LOG_DIR = "/app/logs_debug"
|
||||
|
||||
settings = Settings()
|
||||
|
||||
# Ensure Log Dir
|
||||
os.makedirs(settings.LOG_DIR, exist_ok=True)
|
||||
|
||||
# API Key Loading Helper (from file if env missing)
|
||||
def load_api_key_from_file(filename: str) -> Optional[str]:
|
||||
try:
|
||||
if os.path.exists(filename):
|
||||
with open(filename, 'r') as f:
|
||||
return f.read().strip()
|
||||
except Exception as e:
|
||||
print(f"Could not load key from {filename}: {e}") # Print because logging might not be ready
|
||||
return None
|
||||
|
||||
# Auto-load keys if not in env
|
||||
if not settings.GEMINI_API_KEY:
|
||||
settings.GEMINI_API_KEY = load_api_key_from_file("/app/gemini_api_key.txt")
|
||||
|
||||
if not settings.SERP_API_KEY:
|
||||
settings.SERP_API_KEY = load_api_key_from_file("/app/serpapikey.txt")
|
||||
113
company-explorer/backend/database.py
Normal file
113
company-explorer/backend/database.py
Normal file
@@ -0,0 +1,113 @@
|
||||
from sqlalchemy import create_engine, Column, Integer, String, Text, DateTime, ForeignKey, Float, Boolean, JSON
|
||||
from sqlalchemy.ext.declarative import declarative_base
|
||||
from sqlalchemy.orm import sessionmaker, relationship
|
||||
from datetime import datetime
|
||||
from .config import settings
|
||||
|
||||
# Setup
|
||||
engine = create_engine(settings.DATABASE_URL, connect_args={"check_same_thread": False})
|
||||
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
|
||||
Base = declarative_base()
|
||||
|
||||
# ==============================================================================
|
||||
# MODELS
|
||||
# ==============================================================================
|
||||
|
||||
class Company(Base):
|
||||
__tablename__ = "companies"
|
||||
|
||||
id = Column(Integer, primary_key=True, index=True)
|
||||
|
||||
# Core Identity
|
||||
name = Column(String, index=True)
|
||||
website = Column(String, index=True) # Normalized Domain preferred
|
||||
crm_id = Column(String, unique=True, index=True, nullable=True) # Link to D365
|
||||
|
||||
# Classification
|
||||
industry_crm = Column(String, nullable=True) # The "allowed" industry
|
||||
industry_ai = Column(String, nullable=True) # The AI suggested industry
|
||||
|
||||
# Location
|
||||
city = Column(String, nullable=True)
|
||||
country = Column(String, default="DE")
|
||||
|
||||
# Workflow Status
|
||||
status = Column(String, default="NEW", index=True)
|
||||
|
||||
# Granular Process Tracking (Timestamps)
|
||||
created_at = Column(DateTime, default=datetime.utcnow)
|
||||
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
|
||||
|
||||
last_scraped_at = Column(DateTime, nullable=True)
|
||||
last_wiki_search_at = Column(DateTime, nullable=True)
|
||||
last_classification_at = Column(DateTime, nullable=True)
|
||||
last_signal_check_at = Column(DateTime, nullable=True)
|
||||
|
||||
# Relationships
|
||||
signals = relationship("Signal", back_populates="company", cascade="all, delete-orphan")
|
||||
enrichment_data = relationship("EnrichmentData", back_populates="company", cascade="all, delete-orphan")
|
||||
|
||||
|
||||
class Signal(Base):
|
||||
"""
|
||||
Represents a specific sales signal or potential.
|
||||
Example: type='has_spa', value='true', proof='Wellnessbereich mit 2000qm'
|
||||
"""
|
||||
__tablename__ = "signals"
|
||||
|
||||
id = Column(Integer, primary_key=True, index=True)
|
||||
company_id = Column(Integer, ForeignKey("companies.id"))
|
||||
|
||||
signal_type = Column(String, index=True) # e.g. "robotics_cleaning_potential"
|
||||
confidence = Column(Float, default=0.0) # 0.0 to 1.0
|
||||
value = Column(String) # "High", "Medium", "Yes", "No"
|
||||
proof_text = Column(Text, nullable=True) # Snippet from website/source
|
||||
|
||||
created_at = Column(DateTime, default=datetime.utcnow)
|
||||
|
||||
company = relationship("Company", back_populates="signals")
|
||||
|
||||
|
||||
class EnrichmentData(Base):
|
||||
"""
|
||||
Stores raw data blobs (HTML, API responses) to allow re-processing.
|
||||
"""
|
||||
__tablename__ = "enrichment_data"
|
||||
|
||||
id = Column(Integer, primary_key=True, index=True)
|
||||
company_id = Column(Integer, ForeignKey("companies.id"))
|
||||
|
||||
source_type = Column(String) # "website_scrape", "wikipedia_api", "google_serp"
|
||||
content = Column(JSON) # The raw data
|
||||
|
||||
created_at = Column(DateTime, default=datetime.utcnow)
|
||||
|
||||
company = relationship("Company", back_populates="enrichment_data")
|
||||
|
||||
class ImportLog(Base):
|
||||
"""
|
||||
Logs bulk imports (e.g. from Excel lists).
|
||||
"""
|
||||
__tablename__ = "import_logs"
|
||||
|
||||
id = Column(Integer, primary_key=True)
|
||||
filename = Column(String)
|
||||
import_type = Column(String) # "crm_dump" or "event_list"
|
||||
total_rows = Column(Integer)
|
||||
imported_rows = Column(Integer)
|
||||
duplicate_rows = Column(Integer)
|
||||
created_at = Column(DateTime, default=datetime.utcnow)
|
||||
|
||||
# ==============================================================================
|
||||
# UTILS
|
||||
# ==============================================================================
|
||||
|
||||
def init_db():
|
||||
Base.metadata.create_all(bind=engine)
|
||||
|
||||
def get_db():
|
||||
db = SessionLocal()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
56
company-explorer/backend/interfaces.py
Normal file
56
company-explorer/backend/interfaces.py
Normal file
@@ -0,0 +1,56 @@
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import List, Optional, Dict, Any
|
||||
from pydantic import BaseModel
|
||||
|
||||
# --- Generisches Datenmodell ---
|
||||
# Damit ist unsere App unabhängig davon, wie SuperOffice Felder benennt.
|
||||
class LeadData(BaseModel):
|
||||
name: str
|
||||
website: Optional[str] = None
|
||||
city: Optional[str] = None
|
||||
country: str = "DE"
|
||||
industry: Optional[str] = None
|
||||
|
||||
# Enrichment Data
|
||||
robotics_potential_score: int = 0
|
||||
robotics_potential_reason: Optional[str] = None
|
||||
|
||||
# Meta
|
||||
source_id: Optional[str] = None # ID im Quellsystem (z.B. SuperOffice ID)
|
||||
|
||||
class TaskData(BaseModel):
|
||||
subject: str
|
||||
description: str
|
||||
deadline: Optional[str] = None
|
||||
|
||||
# --- Der Vertrag (Repository Interface) ---
|
||||
class CRMRepository(ABC):
|
||||
"""
|
||||
Abstrakte Basisklasse für alle CRM-Integrationen.
|
||||
Egal ob Notion, SuperOffice oder Odoo - alle müssen diese Methoden haben.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def get_name(self) -> str:
|
||||
"""Gibt den Namen des Systems zurück (z.B. 'SuperOffice')"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def find_company(self, name: str, email: str = None) -> Optional[str]:
|
||||
"""Sucht eine Firma und gibt die externe ID zurück, falls gefunden."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def create_lead(self, lead: LeadData) -> str:
|
||||
"""Erstellt einen neuen Lead und gibt die externe ID zurück."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def update_lead(self, external_id: str, lead: LeadData) -> bool:
|
||||
"""Aktualisiert einen bestehenden Lead mit neuen Enrichment-Daten."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def create_task(self, external_id: str, task: TaskData) -> bool:
|
||||
"""Erstellt eine Aufgabe/Wiedervorlage für den Vertriebler beim Lead."""
|
||||
pass
|
||||
144
company-explorer/backend/lib/core_utils.py
Normal file
144
company-explorer/backend/lib/core_utils.py
Normal file
@@ -0,0 +1,144 @@
|
||||
import time
|
||||
import logging
|
||||
import random
|
||||
import os
|
||||
import re
|
||||
from functools import wraps
|
||||
from typing import Optional, Union, List
|
||||
|
||||
# Versuche neue Google GenAI Lib (v1.0+)
|
||||
try:
|
||||
from google import genai
|
||||
from google.genai import types
|
||||
HAS_NEW_GENAI = True
|
||||
except ImportError:
|
||||
HAS_NEW_GENAI = False
|
||||
|
||||
# Fallback auf alte Lib
|
||||
try:
|
||||
import google.generativeai as old_genai
|
||||
HAS_OLD_GENAI = True
|
||||
except ImportError:
|
||||
HAS_OLD_GENAI = False
|
||||
|
||||
from ..config import settings
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ==============================================================================
|
||||
# 1. DECORATORS
|
||||
# ==============================================================================
|
||||
|
||||
def retry_on_failure(max_retries: int = 3, delay: float = 2.0):
|
||||
"""
|
||||
Decorator for retrying functions with exponential backoff.
|
||||
"""
|
||||
def decorator(func):
|
||||
@wraps(func)
|
||||
def wrapper(*args, **kwargs):
|
||||
last_exception = None
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
return func(*args, **kwargs)
|
||||
except Exception as e:
|
||||
last_exception = e
|
||||
# Don't retry on certain fatal errors (can be extended)
|
||||
if isinstance(e, ValueError) and "API Key" in str(e):
|
||||
raise e
|
||||
|
||||
wait_time = delay * (2 ** attempt) + random.uniform(0, 1)
|
||||
logger.warning(f"Retry {attempt + 1}/{max_retries} for '{func.__name__}' after error: {e}. Waiting {wait_time:.1f}s")
|
||||
time.sleep(wait_time)
|
||||
|
||||
logger.error(f"Function '{func.__name__}' failed after {max_retries} attempts.")
|
||||
raise last_exception
|
||||
return wrapper
|
||||
return decorator
|
||||
|
||||
# ==============================================================================
|
||||
# 2. TEXT TOOLS
|
||||
# ==============================================================================
|
||||
|
||||
def clean_text(text: str) -> str:
|
||||
"""Removes excess whitespace and control characters."""
|
||||
if not text:
|
||||
return ""
|
||||
text = str(text).strip()
|
||||
text = re.sub(r'\s+', ' ', text)
|
||||
return text
|
||||
|
||||
def normalize_string(s: str) -> str:
|
||||
"""Basic normalization (lowercase, stripped)."""
|
||||
return s.lower().strip() if s else ""
|
||||
|
||||
# ==============================================================================
|
||||
# 3. LLM WRAPPER (GEMINI)
|
||||
# ==============================================================================
|
||||
|
||||
@retry_on_failure(max_retries=3)
|
||||
def call_gemini(
|
||||
prompt: Union[str, List[str]],
|
||||
model_name: str = "gemini-2.0-flash",
|
||||
temperature: float = 0.3,
|
||||
json_mode: bool = False,
|
||||
system_instruction: Optional[str] = None
|
||||
) -> str:
|
||||
"""
|
||||
Unified caller for Gemini API. Prefers new `google.genai` library.
|
||||
"""
|
||||
api_key = settings.GEMINI_API_KEY
|
||||
if not api_key:
|
||||
raise ValueError("GEMINI_API_KEY is missing in configuration.")
|
||||
|
||||
# Option A: New Library (google-genai)
|
||||
if HAS_NEW_GENAI:
|
||||
try:
|
||||
client = genai.Client(api_key=api_key)
|
||||
config = {
|
||||
"temperature": temperature,
|
||||
"top_p": 0.95,
|
||||
"top_k": 40,
|
||||
"max_output_tokens": 8192,
|
||||
}
|
||||
if json_mode:
|
||||
config["response_mime_type"] = "application/json"
|
||||
|
||||
response = client.models.generate_content(
|
||||
model=model_name,
|
||||
contents=[prompt] if isinstance(prompt, str) else prompt,
|
||||
config=config,
|
||||
)
|
||||
if not response.text:
|
||||
raise ValueError("Empty response from Gemini")
|
||||
return response.text.strip()
|
||||
except Exception as e:
|
||||
logger.error(f"Error with google-genai lib: {e}")
|
||||
if not HAS_OLD_GENAI:
|
||||
raise e
|
||||
# Fallthrough to Option B
|
||||
|
||||
# Option B: Old Library (google-generativeai)
|
||||
if HAS_OLD_GENAI:
|
||||
try:
|
||||
old_genai.configure(api_key=api_key)
|
||||
generation_config = {
|
||||
"temperature": temperature,
|
||||
"top_p": 0.95,
|
||||
"top_k": 40,
|
||||
"max_output_tokens": 8192,
|
||||
}
|
||||
if json_mode:
|
||||
generation_config["response_mime_type"] = "application/json"
|
||||
|
||||
model = old_genai.GenerativeModel(
|
||||
model_name=model_name,
|
||||
generation_config=generation_config,
|
||||
system_instruction=system_instruction
|
||||
)
|
||||
response = model.generate_content(prompt)
|
||||
return response.text.strip()
|
||||
except Exception as e:
|
||||
logger.error(f"Error with google-generativeai lib: {e}")
|
||||
raise e
|
||||
|
||||
raise ImportError("No Google GenAI library installed (neither google-genai nor google-generativeai).")
|
||||
39
company-explorer/backend/lib/logging_setup.py
Normal file
39
company-explorer/backend/lib/logging_setup.py
Normal file
@@ -0,0 +1,39 @@
|
||||
import logging
|
||||
import sys
|
||||
import os
|
||||
from logging.handlers import RotatingFileHandler
|
||||
from ..config import settings
|
||||
|
||||
def setup_logging():
|
||||
log_file = os.path.join(settings.LOG_DIR, "company_explorer_debug.log")
|
||||
|
||||
# Create Formatter
|
||||
formatter = logging.Formatter(
|
||||
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
|
||||
)
|
||||
|
||||
# File Handler
|
||||
try:
|
||||
file_handler = RotatingFileHandler(log_file, maxBytes=10*1024*1024, backupCount=5)
|
||||
file_handler.setFormatter(formatter)
|
||||
file_handler.setLevel(logging.DEBUG)
|
||||
except Exception as e:
|
||||
print(f"FATAL: Could not create log file at {log_file}: {e}")
|
||||
return
|
||||
|
||||
# Console Handler
|
||||
console_handler = logging.StreamHandler(sys.stdout)
|
||||
console_handler.setFormatter(formatter)
|
||||
console_handler.setLevel(logging.INFO) # Keep console clean
|
||||
|
||||
# Root Logger Config
|
||||
root_logger = logging.getLogger()
|
||||
root_logger.setLevel(logging.DEBUG) # Catch ALL
|
||||
root_logger.addHandler(file_handler)
|
||||
root_logger.addHandler(console_handler)
|
||||
|
||||
# Silence noisy libs partially
|
||||
logging.getLogger("uvicorn.access").setLevel(logging.INFO)
|
||||
logging.getLogger("sqlalchemy.engine").setLevel(logging.INFO) # Set to DEBUG to see SQL queries!
|
||||
|
||||
logging.info(f"Logging initialized. Writing to {log_file}")
|
||||
42
company-explorer/backend/repositories/mock.py
Normal file
42
company-explorer/backend/repositories/mock.py
Normal file
@@ -0,0 +1,42 @@
|
||||
import logging
|
||||
import uuid
|
||||
from typing import Optional
|
||||
from ..interfaces import CRMRepository, LeadData, TaskData
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class MockRepository(CRMRepository):
|
||||
"""
|
||||
Simulates a CRM. Use this for local dev or tests.
|
||||
Stores data in memory (lost on restart).
|
||||
"""
|
||||
def __init__(self):
|
||||
self._store = {}
|
||||
|
||||
def get_name(self) -> str:
|
||||
return "Local Mock CRM"
|
||||
|
||||
def find_company(self, name: str, email: str = None) -> Optional[str]:
|
||||
# Simple Exact Match Simulation
|
||||
for lead_id, lead in self._store.items():
|
||||
if lead.name.lower() == name.lower():
|
||||
logger.info(f"[MockCRM] Found existing company '{name}' with ID {lead_id}")
|
||||
return lead_id
|
||||
return None
|
||||
|
||||
def create_lead(self, lead: LeadData) -> str:
|
||||
new_id = f"MOCK_{uuid.uuid4().hex[:8]}"
|
||||
self._store[new_id] = lead
|
||||
logger.info(f"[MockCRM] Created company '{lead.name}' (ID: {new_id}). Total records: {len(self._store)}")
|
||||
return new_id
|
||||
|
||||
def update_lead(self, external_id: str, lead: LeadData) -> bool:
|
||||
if external_id in self._store:
|
||||
self._store[external_id] = lead
|
||||
logger.info(f"[MockCRM] Updated company {external_id} with robotics score: {lead.robotics_potential_score}")
|
||||
return True
|
||||
return False
|
||||
|
||||
def create_task(self, external_id: str, task: TaskData) -> bool:
|
||||
logger.info(f"[MockCRM] 🔔 TASK CREATED for {external_id}: '{task.subject}'")
|
||||
return True
|
||||
40
company-explorer/backend/repositories/superoffice.py
Normal file
40
company-explorer/backend/repositories/superoffice.py
Normal file
@@ -0,0 +1,40 @@
|
||||
import logging
|
||||
import requests
|
||||
from typing import Optional
|
||||
from ..interfaces import CRMRepository, LeadData, TaskData
|
||||
from ..config import settings
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class SuperOfficeRepository(CRMRepository):
|
||||
def __init__(self, tenant_id: str, api_token: str):
|
||||
self.base_url = f"https://{tenant_id}.superoffice.com/api/v1"
|
||||
self.headers = {
|
||||
"Authorization": f"Bearer {api_token}",
|
||||
"Accept": "application/json"
|
||||
}
|
||||
|
||||
def get_name(self) -> str:
|
||||
return "SuperOffice"
|
||||
|
||||
def find_company(self, name: str, email: str = None) -> Optional[str]:
|
||||
# TODO: Implement actual OData query
|
||||
# Example: GET /Contact?$filter=Name eq '{name}'
|
||||
logger.info(f"[SuperOffice] Searching for '{name}'...")
|
||||
return None
|
||||
|
||||
def create_lead(self, lead: LeadData) -> str:
|
||||
logger.info(f"[SuperOffice] Creating Lead: {lead.name}")
|
||||
# TODO: POST /Contact
|
||||
# Payload mapping: lead.industry -> SuperOffice BusinessId
|
||||
return "SO_DUMMY_ID_123"
|
||||
|
||||
def update_lead(self, external_id: str, lead: LeadData) -> bool:
|
||||
logger.info(f"[SuperOffice] Updating Lead {external_id} with Score {lead.robotics_potential_score}")
|
||||
# TODO: PUT /Contact/{id}
|
||||
# Wir schreiben das Robotics-Potential z.B. in ein benutzerdefiniertes Feld (UserDefinedField)
|
||||
return True
|
||||
|
||||
def create_task(self, external_id: str, task: TaskData) -> bool:
|
||||
logger.info(f"[SuperOffice] Creating Task for {external_id}: {task.subject}")
|
||||
return True
|
||||
91
company-explorer/backend/scripts/import_legacy.py
Normal file
91
company-explorer/backend/scripts/import_legacy.py
Normal file
@@ -0,0 +1,91 @@
|
||||
import sys
|
||||
import os
|
||||
import logging
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
# Add paths to access legacy and new modules
|
||||
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../"))) # Root for legacy
|
||||
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "../../"))) # Company Explorer Root
|
||||
|
||||
# Legacy Import
|
||||
try:
|
||||
from _legacy_gsheets_system.google_sheet_handler import GoogleSheetHandler
|
||||
from _legacy_gsheets_system.config import Config as LegacyConfig
|
||||
except ImportError as e:
|
||||
print(f"Failed to import legacy modules: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
# New DB
|
||||
from backend.database import SessionLocal, Company, init_db
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger("LegacyImporter")
|
||||
|
||||
def migrate():
|
||||
logger.info("Starting migration from Google Sheets...")
|
||||
|
||||
# 1. Connect to GSheets
|
||||
LegacyConfig.load_api_keys() # Ensure keys are loaded
|
||||
try:
|
||||
handler = GoogleSheetHandler()
|
||||
df = handler.get_sheet_as_dataframe("CRM_Accounts") # Assuming standard sheet name
|
||||
except Exception as e:
|
||||
logger.error(f"GSheet Connection failed: {e}")
|
||||
return
|
||||
|
||||
if df is None or df.empty:
|
||||
logger.warning("No data found in sheet.")
|
||||
return
|
||||
|
||||
logger.info(f"Found {len(df)} rows. Transforming...")
|
||||
|
||||
# 2. Connect to New DB
|
||||
init_db() # Ensure tables exist
|
||||
db = SessionLocal()
|
||||
|
||||
count = 0
|
||||
skipped = 0
|
||||
|
||||
try:
|
||||
for _, row in df.iterrows():
|
||||
name = str(row.get('CRM Name', '')).strip()
|
||||
if not name or name.lower() in ['nan', 'none', '']:
|
||||
continue
|
||||
|
||||
# Check duplicate (simple check by name for migration)
|
||||
exists = db.query(Company).filter(Company.name == name).first()
|
||||
if exists:
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
# Create Company
|
||||
comp = Company(
|
||||
name=name,
|
||||
website=str(row.get('CRM Website', '')).strip() or None,
|
||||
crm_id=str(row.get('CRM ID', '')).strip() or None,
|
||||
city=str(row.get('CRM Ort', '')).strip() or None,
|
||||
country=str(row.get('CRM Land', 'DE')).strip(),
|
||||
status="IMPORTED" # Mark as imported so we know to enrich them
|
||||
)
|
||||
|
||||
# Map old industry if useful, otherwise leave blank for re-classification
|
||||
# comp.industry_ai = str(row.get('Chat Vorschlag Branche', ''))
|
||||
|
||||
db.add(comp)
|
||||
count += 1
|
||||
|
||||
if count % 100 == 0:
|
||||
logger.info(f"Committed {count}...")
|
||||
db.commit()
|
||||
|
||||
db.commit()
|
||||
logger.info(f"Migration finished. Imported: {count}, Skipped: {skipped}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Migration error: {e}")
|
||||
db.rollback()
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
if __name__ == "__main__":
|
||||
migrate()
|
||||
77
company-explorer/backend/services/classification.py
Normal file
77
company-explorer/backend/services/classification.py
Normal file
@@ -0,0 +1,77 @@
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from typing import Dict, Any, List
|
||||
from ..lib.core_utils import call_gemini
|
||||
from ..config import settings
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
ALLOWED_INDUSTRIES_FILE = os.path.join(os.path.dirname(__file__), "../data/allowed_industries.json")
|
||||
|
||||
class ClassificationService:
|
||||
def __init__(self):
|
||||
self.allowed_industries = self._load_allowed_industries()
|
||||
|
||||
def _load_allowed_industries(self) -> List[str]:
|
||||
try:
|
||||
with open(ALLOWED_INDUSTRIES_FILE, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to load allowed industries: {e}")
|
||||
return ["Sonstige"]
|
||||
|
||||
def analyze_robotics_potential(self, company_name: str, website_text: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Analyzes the company for robotics potential based on website content.
|
||||
Returns strict JSON.
|
||||
"""
|
||||
if not website_text or len(website_text) < 100:
|
||||
return {"error": "Insufficient text content"}
|
||||
|
||||
prompt = f"""
|
||||
You are a Senior B2B Market Analyst for 'Roboplanet', a robotics distributor.
|
||||
Your job is to analyze a target company based on their website text and determine their potential for using robots.
|
||||
|
||||
--- TARGET COMPANY ---
|
||||
Name: {company_name}
|
||||
Website Content (Excerpt):
|
||||
{website_text[:15000]}
|
||||
|
||||
--- ALLOWED INDUSTRIES (STRICT) ---
|
||||
You MUST assign the company to exactly ONE of these industries. If unsure, choose the closest match or "Sonstige".
|
||||
{json.dumps(self.allowed_industries, ensure_ascii=False)}
|
||||
|
||||
--- ANALYSIS TASKS ---
|
||||
1. **Industry Classification:** Pick one from the list.
|
||||
2. **Robotics Potential Scoring (0-100):**
|
||||
- **Cleaning:** Does the company manage large floors, hospitals, hotels, or public spaces? (Keywords: Hygiene, Cleaning, SPA, Facility Management)
|
||||
- **Transport/Logistics:** Do they move goods internally? (Keywords: Warehouse, Intralogistics, Production line, Hospital logistics)
|
||||
- **Security:** Do they have large perimeters or night patrols? (Keywords: Werkschutz, Security, Monitoring)
|
||||
- **Service:** Do they interact with guests/patients? (Keywords: Reception, Restaurant, Nursing)
|
||||
|
||||
3. **Explanation:** A short, strategic reason for the scoring (German).
|
||||
|
||||
--- OUTPUT FORMAT (JSON ONLY) ---
|
||||
{{
|
||||
"industry": "String (from list)",
|
||||
"summary": "Short business summary (German)",
|
||||
"potentials": {{
|
||||
"cleaning": {{ "score": 0-100, "reason": "..." }},
|
||||
"transport": {{ "score": 0-100, "reason": "..." }},
|
||||
"security": {{ "score": 0-100, "reason": "..." }},
|
||||
"service": {{ "score": 0-100, "reason": "..." }}
|
||||
}}
|
||||
}}
|
||||
"""
|
||||
|
||||
try:
|
||||
response_text = call_gemini(
|
||||
prompt=prompt,
|
||||
json_mode=True,
|
||||
temperature=0.2 # Low temp for consistency
|
||||
)
|
||||
return json.loads(response_text)
|
||||
except Exception as e:
|
||||
logger.error(f"Classification failed: {e}")
|
||||
return {"error": str(e)}
|
||||
209
company-explorer/backend/services/deduplication.py
Normal file
209
company-explorer/backend/services/deduplication.py
Normal file
@@ -0,0 +1,209 @@
|
||||
import logging
|
||||
import re
|
||||
from collections import Counter
|
||||
from typing import List, Tuple, Dict, Any, Optional
|
||||
from sqlalchemy.orm import Session
|
||||
from sqlalchemy import select
|
||||
|
||||
# External libs (must be in requirements.txt)
|
||||
from thefuzz import fuzz
|
||||
from ..database import Company
|
||||
from ..lib.core_utils import clean_text, normalize_string
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# --- Configuration (Ported from Legacy) ---
|
||||
SCORE_THRESHOLD = 80
|
||||
SCORE_THRESHOLD_WEAK = 95
|
||||
MIN_NAME_FOR_DOMAIN = 70
|
||||
CITY_MISMATCH_PENALTY = 30
|
||||
COUNTRY_MISMATCH_PENALTY = 40
|
||||
|
||||
STOP_TOKENS_BASE = {
|
||||
'gmbh','mbh','ag','kg','ug','ohg','se','co','kgaa','inc','llc','ltd','sarl',
|
||||
'holding','gruppe','group','international','solutions','solution','service','services',
|
||||
'deutschland','austria','germany','technik','technology','technologies','systems','systeme',
|
||||
'logistik','logistics','industries','industrie','management','consulting','vertrieb','handel',
|
||||
'international','company','gesellschaft','mbh&co','mbhco','werke','werk'
|
||||
}
|
||||
|
||||
# ==============================================================================
|
||||
# Helpers
|
||||
# ==============================================================================
|
||||
|
||||
def _tokenize(s: str) -> List[str]:
|
||||
if not s: return []
|
||||
return re.split(r"[^a-z0-9]+", str(s).lower())
|
||||
|
||||
def split_tokens(name: str) -> List[str]:
|
||||
if not name: return []
|
||||
tokens = [t for t in _tokenize(name) if len(t) >= 3]
|
||||
return [t for t in tokens if t not in STOP_TOKENS_BASE]
|
||||
|
||||
def clean_name_for_scoring(norm_name: str) -> Tuple[str, set]:
|
||||
toks = split_tokens(norm_name)
|
||||
return " ".join(toks), set(toks)
|
||||
|
||||
# ==============================================================================
|
||||
# Core Deduplication Logic
|
||||
# ==============================================================================
|
||||
|
||||
class Deduplicator:
|
||||
def __init__(self, db: Session):
|
||||
self.db = db
|
||||
self.reference_data = [] # Cache for DB records
|
||||
self.domain_index = {}
|
||||
self.token_freq = Counter()
|
||||
self.token_index = {}
|
||||
self._load_reference_data()
|
||||
|
||||
def _load_reference_data(self):
|
||||
"""
|
||||
Loads minimal dataset from DB into RAM for fast fuzzy matching.
|
||||
Optimized for 10k-50k records.
|
||||
"""
|
||||
logger.info("Loading reference data for deduplication...")
|
||||
query = self.db.query(Company.id, Company.name, Company.website, Company.city, Company.country)
|
||||
companies = query.all()
|
||||
|
||||
for c in companies:
|
||||
norm_name = normalize_string(c.name)
|
||||
norm_domain = normalize_string(c.website) # Simplified, should extract domain
|
||||
|
||||
record = {
|
||||
'id': c.id,
|
||||
'name': c.name,
|
||||
'normalized_name': norm_name,
|
||||
'normalized_domain': norm_domain,
|
||||
'city': normalize_string(c.city),
|
||||
'country': normalize_string(c.country)
|
||||
}
|
||||
self.reference_data.append(record)
|
||||
|
||||
# Build Indexes
|
||||
if norm_domain:
|
||||
self.domain_index.setdefault(norm_domain, []).append(record)
|
||||
|
||||
# Token Frequency
|
||||
_, toks = clean_name_for_scoring(norm_name)
|
||||
for t in toks:
|
||||
self.token_freq[t] += 1
|
||||
self.token_index.setdefault(t, []).append(record)
|
||||
|
||||
logger.info(f"Loaded {len(self.reference_data)} records for deduplication.")
|
||||
|
||||
def _choose_rarest_token(self, norm_name: str) -> Optional[str]:
|
||||
_, toks = clean_name_for_scoring(norm_name)
|
||||
if not toks: return None
|
||||
# Sort by frequency (asc) then length (desc)
|
||||
lst = sorted(list(toks), key=lambda x: (self.token_freq.get(x, 10**9), -len(x)))
|
||||
return lst[0] if lst else None
|
||||
|
||||
def find_duplicates(self, candidate: Dict[str, Any]) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Checks a single candidate against the loaded index.
|
||||
Returns list of matches with score >= Threshold.
|
||||
"""
|
||||
# Prepare Candidate
|
||||
c_norm_name = normalize_string(candidate.get('name', ''))
|
||||
c_norm_domain = normalize_string(candidate.get('website', ''))
|
||||
c_city = normalize_string(candidate.get('city', ''))
|
||||
c_country = normalize_string(candidate.get('country', ''))
|
||||
|
||||
candidates_to_check = {} # Map ID -> Record
|
||||
|
||||
# 1. Domain Match (Fastest)
|
||||
if c_norm_domain and c_norm_domain in self.domain_index:
|
||||
for r in self.domain_index[c_norm_domain]:
|
||||
candidates_to_check[r['id']] = r
|
||||
|
||||
# 2. Rarest Token Match (Blocking)
|
||||
rtok = self._choose_rarest_token(c_norm_name)
|
||||
if rtok and rtok in self.token_index:
|
||||
for r in self.token_index[rtok]:
|
||||
candidates_to_check[r['id']] = r
|
||||
|
||||
if not candidates_to_check:
|
||||
return []
|
||||
|
||||
# 3. Scoring
|
||||
matches = []
|
||||
for db_rec in candidates_to_check.values():
|
||||
score, details = self._calculate_similarity(
|
||||
cand={'n': c_norm_name, 'd': c_norm_domain, 'c': c_city, 'ct': c_country},
|
||||
ref=db_rec
|
||||
)
|
||||
|
||||
# Threshold Logic (Weak vs Strong)
|
||||
is_weak = (details['domain_match'] == 0 and not (details['loc_match']))
|
||||
threshold = SCORE_THRESHOLD_WEAK if is_weak else SCORE_THRESHOLD
|
||||
|
||||
if score >= threshold:
|
||||
matches.append({
|
||||
'company_id': db_rec['id'],
|
||||
'name': db_rec['name'],
|
||||
'score': score,
|
||||
'details': details
|
||||
})
|
||||
|
||||
matches.sort(key=lambda x: x['score'], reverse=True)
|
||||
return matches
|
||||
|
||||
def _calculate_similarity(self, cand, ref):
|
||||
# Data Prep
|
||||
n1, n2 = cand['n'], ref['normalized_name']
|
||||
|
||||
# Exact Name Shortcut
|
||||
if n1 and n1 == n2:
|
||||
return 100, {'exact': True, 'domain_match': 0, 'loc_match': 0}
|
||||
|
||||
# Domain
|
||||
d1, d2 = cand['d'], ref['normalized_domain']
|
||||
domain_match = 1 if (d1 and d2 and d1 == d2) else 0
|
||||
|
||||
# Location
|
||||
city_match = 1 if (cand['c'] and ref['city'] and cand['c'] == ref['city']) else 0
|
||||
country_match = 1 if (cand['ct'] and ref['country'] and cand['ct'] == ref['country']) else 0
|
||||
loc_match = city_match and country_match
|
||||
|
||||
# Name Fuzzy Score
|
||||
clean1, _ = clean_name_for_scoring(n1)
|
||||
clean2, _ = clean_name_for_scoring(n2)
|
||||
|
||||
if clean1 and clean2:
|
||||
ts = fuzz.token_set_ratio(clean1, clean2)
|
||||
pr = fuzz.partial_ratio(clean1, clean2)
|
||||
ss = fuzz.token_sort_ratio(clean1, clean2)
|
||||
name_score = max(ts, pr, ss)
|
||||
else:
|
||||
name_score = 0
|
||||
|
||||
# Penalties
|
||||
penalties = 0
|
||||
if cand['ct'] and ref['country'] and not country_match:
|
||||
penalties += COUNTRY_MISMATCH_PENALTY
|
||||
if cand['c'] and ref['city'] and not city_match:
|
||||
penalties += CITY_MISMATCH_PENALTY
|
||||
|
||||
# Final Calc
|
||||
# Base weights: Domain is king (100), Name is mandatory (unless domain match)
|
||||
total = 0
|
||||
if domain_match:
|
||||
total = 100
|
||||
else:
|
||||
total = name_score
|
||||
|
||||
if loc_match:
|
||||
total += 10 # Bonus
|
||||
|
||||
total -= penalties
|
||||
|
||||
# Capping
|
||||
total = min(100, max(0, total))
|
||||
|
||||
return total, {
|
||||
'name_score': name_score,
|
||||
'domain_match': domain_match,
|
||||
'loc_match': loc_match,
|
||||
'penalties': penalties
|
||||
}
|
||||
126
company-explorer/backend/services/discovery.py
Normal file
126
company-explorer/backend/services/discovery.py
Normal file
@@ -0,0 +1,126 @@
|
||||
import logging
|
||||
import requests
|
||||
import re
|
||||
from typing import Optional, Dict, Tuple
|
||||
from urllib.parse import urlparse
|
||||
from ..config import settings
|
||||
from ..lib.core_utils import retry_on_failure, normalize_string
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Domains to ignore when looking for official company homepage
|
||||
BLACKLIST_DOMAINS = {
|
||||
"linkedin.com", "xing.com", "facebook.com", "instagram.com", "twitter.com",
|
||||
"northdata.de", "northdata.com", "firmenwissen.de", "creditreform.de",
|
||||
"dnb.com", "kompass.com", "wer-zu-wem.de", "kununu.com", "glassdoor.com",
|
||||
"stepstone.de", "indeed.com", "monster.de", "youtube.com", "wikipedia.org"
|
||||
}
|
||||
|
||||
class DiscoveryService:
|
||||
def __init__(self):
|
||||
self.api_key = settings.SERP_API_KEY
|
||||
if not self.api_key:
|
||||
logger.warning("SERP_API_KEY not set. Discovery features will fail.")
|
||||
|
||||
@retry_on_failure(max_retries=2)
|
||||
def find_company_website(self, company_name: str, city: Optional[str] = None) -> str:
|
||||
"""
|
||||
Uses Google Search via SerpAPI to find the most likely official homepage.
|
||||
Returns "k.A." if nothing credible is found.
|
||||
"""
|
||||
if not self.api_key:
|
||||
return "k.A."
|
||||
|
||||
query = f"{company_name} offizielle Website"
|
||||
if city:
|
||||
query += f" {city}"
|
||||
|
||||
logger.info(f"Searching website for: {query}")
|
||||
|
||||
try:
|
||||
params = {
|
||||
"engine": "google",
|
||||
"q": query,
|
||||
"api_key": self.api_key,
|
||||
"num": 5,
|
||||
"gl": "de",
|
||||
"hl": "de"
|
||||
}
|
||||
response = requests.get("https://serpapi.com/search", params=params, timeout=15)
|
||||
response.raise_for_status()
|
||||
data = response.json()
|
||||
|
||||
if "organic_results" not in data:
|
||||
return "k.A."
|
||||
|
||||
for result in data["organic_results"]:
|
||||
link = result.get("link", "")
|
||||
if self._is_credible_url(link):
|
||||
# Simple heuristic: If the company name is part of the domain, high confidence
|
||||
# Otherwise, take the first credible result.
|
||||
return link
|
||||
|
||||
return "k.A."
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"SerpAPI Error: {e}")
|
||||
return "k.A."
|
||||
|
||||
@retry_on_failure(max_retries=2)
|
||||
def find_wikipedia_url(self, company_name: str) -> str:
|
||||
"""
|
||||
Searches for a specific German Wikipedia article.
|
||||
"""
|
||||
if not self.api_key:
|
||||
return "k.A."
|
||||
|
||||
query = f"{company_name} Wikipedia"
|
||||
|
||||
try:
|
||||
params = {
|
||||
"engine": "google",
|
||||
"q": query,
|
||||
"api_key": self.api_key,
|
||||
"num": 3,
|
||||
"gl": "de",
|
||||
"hl": "de"
|
||||
}
|
||||
response = requests.get("https://serpapi.com/search", params=params, timeout=15)
|
||||
response.raise_for_status()
|
||||
data = response.json()
|
||||
|
||||
for result in data.get("organic_results", []):
|
||||
link = result.get("link", "")
|
||||
if "de.wikipedia.org/wiki/" in link:
|
||||
# Basic validation: Is the title roughly the company?
|
||||
title = result.get("title", "").replace(" – Wikipedia", "")
|
||||
if self._check_name_similarity(company_name, title):
|
||||
return link
|
||||
|
||||
return "k.A."
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Wiki Search Error: {e}")
|
||||
return "k.A."
|
||||
|
||||
def _is_credible_url(self, url: str) -> bool:
|
||||
"""Filters out social media, directories, and junk."""
|
||||
if not url: return False
|
||||
try:
|
||||
domain = urlparse(url).netloc.lower().replace("www.", "")
|
||||
if domain in BLACKLIST_DOMAINS:
|
||||
return False
|
||||
# Check for subdomains of blacklist (e.g. de.linkedin.com)
|
||||
for bad in BLACKLIST_DOMAINS:
|
||||
if domain.endswith("." + bad):
|
||||
return False
|
||||
return True
|
||||
except:
|
||||
return False
|
||||
|
||||
def _check_name_similarity(self, name1: str, name2: str) -> bool:
|
||||
"""Simple fuzzy check for validation."""
|
||||
n1 = normalize_string(name1)
|
||||
n2 = normalize_string(name2)
|
||||
# Very permissive: if one is contained in the other
|
||||
return n1 in n2 or n2 in n1
|
||||
82
company-explorer/backend/services/scraping.py
Normal file
82
company-explorer/backend/services/scraping.py
Normal file
@@ -0,0 +1,82 @@
|
||||
import logging
|
||||
import requests
|
||||
import random
|
||||
import re
|
||||
from bs4 import BeautifulSoup
|
||||
from typing import Optional, Dict
|
||||
from ..lib.core_utils import clean_text, retry_on_failure
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
USER_AGENTS = [
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
|
||||
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15',
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0'
|
||||
]
|
||||
|
||||
class ScraperService:
|
||||
def __init__(self, timeout: int = 15):
|
||||
self.timeout = timeout
|
||||
|
||||
@retry_on_failure(max_retries=2)
|
||||
def scrape_url(self, url: str) -> Dict[str, str]:
|
||||
"""
|
||||
Fetches a URL and returns cleaned text content + meta info.
|
||||
"""
|
||||
if not url.startswith("http"):
|
||||
url = "https://" + url
|
||||
|
||||
try:
|
||||
headers = {'User-Agent': random.choice(USER_AGENTS)}
|
||||
# verify=False is risky but often needed for poorly configured corporate sites
|
||||
response = requests.get(url, headers=headers, timeout=self.timeout, verify=False)
|
||||
response.raise_for_status()
|
||||
|
||||
# Check Content Type
|
||||
content_type = response.headers.get('Content-Type', '').lower()
|
||||
if 'text/html' not in content_type:
|
||||
logger.warning(f"Skipping non-HTML content for {url}: {content_type}")
|
||||
return {"error": "Not HTML"}
|
||||
|
||||
return self._parse_html(response.content)
|
||||
|
||||
except requests.exceptions.SSLError:
|
||||
# Retry with HTTP if HTTPS fails
|
||||
if url.startswith("https://"):
|
||||
logger.info(f"SSL failed for {url}, retrying with http://...")
|
||||
return self.scrape_url(url.replace("https://", "http://"))
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Scraping failed for {url}: {e}")
|
||||
return {"error": str(e)}
|
||||
|
||||
def _parse_html(self, html_content: bytes) -> Dict[str, str]:
|
||||
soup = BeautifulSoup(html_content, 'html.parser')
|
||||
|
||||
# 1. Cleanup Junk
|
||||
for element in soup(['script', 'style', 'noscript', 'iframe', 'svg', 'header', 'footer', 'nav', 'aside', 'form', 'button']):
|
||||
element.decompose()
|
||||
|
||||
# 2. Extract Title & Meta Description
|
||||
title = soup.title.string if soup.title else ""
|
||||
meta_desc = ""
|
||||
meta_tag = soup.find('meta', attrs={'name': 'description'})
|
||||
if meta_tag:
|
||||
meta_desc = meta_tag.get('content', '')
|
||||
|
||||
# 3. Extract Main Text
|
||||
# Prefer body, fallback to full soup
|
||||
body = soup.find('body')
|
||||
raw_text = body.get_text(separator=' ', strip=True) if body else soup.get_text(separator=' ', strip=True)
|
||||
|
||||
cleaned_text = clean_text(raw_text)
|
||||
|
||||
# 4. Extract Emails (Basic Regex)
|
||||
emails = set(re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', raw_text))
|
||||
|
||||
return {
|
||||
"title": clean_text(title),
|
||||
"description": clean_text(meta_desc),
|
||||
"text": cleaned_text[:25000], # Limit to avoid context overflow
|
||||
"emails": list(emails)[:5] # Limit to 5
|
||||
}
|
||||
103
company-explorer/backend/services/sync.py
Normal file
103
company-explorer/backend/services/sync.py
Normal file
@@ -0,0 +1,103 @@
|
||||
import os
|
||||
import logging
|
||||
from sqlalchemy.orm import Session
|
||||
from ..database import Company
|
||||
from ..interfaces import LeadData, TaskData, CRMRepository
|
||||
from ..repositories.mock import MockRepository
|
||||
from ..repositories.superoffice import SuperOfficeRepository
|
||||
from ..config import settings
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class CRMFactory:
|
||||
_instance: CRMRepository = None
|
||||
|
||||
@classmethod
|
||||
def get_repository(cls) -> CRMRepository:
|
||||
if cls._instance:
|
||||
return cls._instance
|
||||
|
||||
crm_type = os.getenv("CRM_TYPE", "MOCK").upper()
|
||||
|
||||
if crm_type == "SUPEROFFICE":
|
||||
# Load credentials securely from settings/env
|
||||
tenant = os.getenv("SO_TENANT_ID", "")
|
||||
token = os.getenv("SO_API_TOKEN", "")
|
||||
logger.info("Initializing SuperOffice Repository...")
|
||||
cls._instance = SuperOfficeRepository(tenant, token)
|
||||
else:
|
||||
logger.info("Initializing Mock Repository (Default)...")
|
||||
cls._instance = MockRepository()
|
||||
|
||||
return cls._instance
|
||||
|
||||
class SyncService:
|
||||
def __init__(self, db: Session):
|
||||
self.db = db
|
||||
self.repo = CRMFactory.get_repository()
|
||||
|
||||
def sync_company(self, company_id: int) -> dict:
|
||||
"""
|
||||
Pushes a local company to the external CRM.
|
||||
"""
|
||||
local_company = self.db.query(Company).filter(Company.id == company_id).first()
|
||||
if not local_company:
|
||||
return {"error": "Company not found"}
|
||||
|
||||
# 1. Map Data
|
||||
# Extract highest robotics potential score
|
||||
max_score = 0
|
||||
reason = ""
|
||||
for sig in local_company.signals:
|
||||
if sig.confidence > max_score:
|
||||
max_score = int(sig.confidence)
|
||||
reason = f"{sig.signal_type} ({sig.value})"
|
||||
|
||||
lead_data = LeadData(
|
||||
name=local_company.name,
|
||||
website=local_company.website,
|
||||
city=local_company.city,
|
||||
country=local_company.country,
|
||||
industry=local_company.industry_ai, # We suggest our AI industry
|
||||
robotics_potential_score=max_score,
|
||||
robotics_potential_reason=reason
|
||||
)
|
||||
|
||||
# 2. Check if already linked
|
||||
external_id = local_company.crm_id
|
||||
|
||||
# 3. Check if exists in CRM (by name) if not linked yet
|
||||
if not external_id:
|
||||
external_id = self.repo.find_company(local_company.name)
|
||||
|
||||
action = "none"
|
||||
if external_id:
|
||||
# Update
|
||||
success = self.repo.update_lead(external_id, lead_data)
|
||||
if success:
|
||||
action = "updated"
|
||||
# If we found it by search, link it locally
|
||||
if not local_company.crm_id:
|
||||
local_company.crm_id = external_id
|
||||
self.db.commit()
|
||||
else:
|
||||
# Create
|
||||
new_id = self.repo.create_lead(lead_data)
|
||||
if new_id:
|
||||
action = "created"
|
||||
local_company.crm_id = new_id
|
||||
self.db.commit()
|
||||
|
||||
# Create a task for the sales rep if high potential
|
||||
if max_score > 70:
|
||||
self.repo.create_task(new_id, TaskData(
|
||||
subject="🔥 Hot Robotics Lead",
|
||||
description=f"AI detected high potential ({max_score}%). Reason: {reason}. Please check website."
|
||||
))
|
||||
|
||||
return {
|
||||
"status": "success",
|
||||
"action": action,
|
||||
"crm": self.repo.get_name(),
|
||||
"external_id": local_company.crm_id
|
||||
}
|
||||
12
company-explorer/frontend/index.html
Normal file
12
company-explorer/frontend/index.html
Normal file
@@ -0,0 +1,12 @@
|
||||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Company Explorer (Robotics)</title>
|
||||
</head>
|
||||
<body class="bg-slate-950 text-slate-100">
|
||||
<div id="root"></div>
|
||||
<script type="module" src="/src/main.tsx"></script>
|
||||
</body>
|
||||
</html>
|
||||
31
company-explorer/frontend/package.json
Normal file
31
company-explorer/frontend/package.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"name": "company-explorer-frontend",
|
||||
"private": true,
|
||||
"version": "0.1.0",
|
||||
"type": "module",
|
||||
"scripts": {
|
||||
"dev": "vite",
|
||||
"build": "tsc && vite build",
|
||||
"preview": "vite preview"
|
||||
},
|
||||
"dependencies": {
|
||||
"@tanstack/react-table": "^8.10.7",
|
||||
"axios": "^1.6.2",
|
||||
"clsx": "^2.0.0",
|
||||
"lucide-react": "^0.294.0",
|
||||
"react": "^18.2.0",
|
||||
"react-dom": "^18.2.0",
|
||||
"tailwind-merge": "^2.1.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^20.10.4",
|
||||
"@types/react": "^18.2.43",
|
||||
"@types/react-dom": "^18.2.17",
|
||||
"@vitejs/plugin-react": "^4.2.1",
|
||||
"autoprefixer": "^10.4.16",
|
||||
"postcss": "^8.4.32",
|
||||
"tailwindcss": "^3.3.6",
|
||||
"typescript": "^5.3.3",
|
||||
"vite": "^5.0.8"
|
||||
}
|
||||
}
|
||||
6
company-explorer/frontend/postcss.config.js
Normal file
6
company-explorer/frontend/postcss.config.js
Normal file
@@ -0,0 +1,6 @@
|
||||
export default {
|
||||
plugins: {
|
||||
tailwindcss: {},
|
||||
autoprefixer: {},
|
||||
},
|
||||
}
|
||||
116
company-explorer/frontend/src/App.tsx
Normal file
116
company-explorer/frontend/src/App.tsx
Normal file
@@ -0,0 +1,116 @@
|
||||
import { useState, useEffect } from 'react'
|
||||
import axios from 'axios'
|
||||
import { CompanyTable } from './components/CompanyTable'
|
||||
import { ImportWizard } from './components/ImportWizard'
|
||||
import { Inspector } from './components/Inspector' // NEW
|
||||
import { LayoutDashboard, UploadCloud, Search, RefreshCw } from 'lucide-react'
|
||||
|
||||
// Base URL detection (Production vs Dev)
|
||||
const API_BASE = import.meta.env.BASE_URL === '/ce/' ? '/ce/api' : '/api';
|
||||
|
||||
interface Stats {
|
||||
total: number;
|
||||
}
|
||||
|
||||
function App() {
|
||||
const [stats, setStats] = useState<Stats>({ total: 0 })
|
||||
const [refreshKey, setRefreshKey] = useState(0)
|
||||
const [isImportOpen, setIsImportOpen] = useState(false)
|
||||
const [selectedCompanyId, setSelectedCompanyId] = useState<number | null>(null) // NEW
|
||||
|
||||
const fetchStats = async () => {
|
||||
try {
|
||||
const res = await axios.get(`${API_BASE}/companies?limit=1`)
|
||||
setStats({ total: res.data.total })
|
||||
} catch (e) {
|
||||
console.error("Failed to fetch stats", e)
|
||||
}
|
||||
}
|
||||
|
||||
useEffect(() => {
|
||||
fetchStats()
|
||||
}, [refreshKey])
|
||||
|
||||
const handleCompanySelect = (id: number) => {
|
||||
setSelectedCompanyId(id)
|
||||
}
|
||||
|
||||
const handleCloseInspector = () => {
|
||||
setSelectedCompanyId(null)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="min-h-screen bg-slate-950 text-slate-200 font-sans">
|
||||
<ImportWizard
|
||||
isOpen={isImportOpen}
|
||||
onClose={() => setIsImportOpen(false)}
|
||||
apiBase={API_BASE}
|
||||
onSuccess={() => setRefreshKey(k => k + 1)}
|
||||
/>
|
||||
|
||||
{/* Inspector Sidebar */}
|
||||
<Inspector
|
||||
companyId={selectedCompanyId}
|
||||
onClose={handleCloseInspector}
|
||||
apiBase={API_BASE}
|
||||
/>
|
||||
|
||||
{/* Header */}
|
||||
<header className="border-b border-slate-800 bg-slate-900/50 sticky top-0 z-10 backdrop-blur-md">
|
||||
<div className="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8 h-16 flex items-center justify-between">
|
||||
<div className="flex items-center gap-3">
|
||||
<div className="p-2 bg-blue-600 rounded-lg">
|
||||
<LayoutDashboard className="h-6 w-6 text-white" />
|
||||
</div>
|
||||
<div>
|
||||
<h1 className="text-xl font-bold text-white tracking-tight">Company Explorer</h1>
|
||||
<p className="text-xs text-blue-400 font-medium">ROBOTICS EDITION <span className="text-slate-600 ml-2">v0.2.2 (New DB Path)</span></p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="flex items-center gap-4">
|
||||
<div className="text-sm text-slate-400">
|
||||
<span className="text-white font-bold">{stats.total}</span> Companies
|
||||
</div>
|
||||
|
||||
<button
|
||||
onClick={() => setRefreshKey(k => k + 1)}
|
||||
className="p-2 hover:bg-slate-800 rounded-full transition-colors text-slate-400 hover:text-white"
|
||||
title="Refresh Data"
|
||||
>
|
||||
<RefreshCw className="h-5 w-5" />
|
||||
</button>
|
||||
|
||||
<button
|
||||
className="flex items-center gap-2 bg-blue-600 hover:bg-blue-500 text-white px-4 py-2 rounded-md font-medium text-sm transition-all shadow-lg shadow-blue-900/20"
|
||||
onClick={() => setIsImportOpen(true)}
|
||||
>
|
||||
<UploadCloud className="h-4 w-4" />
|
||||
Import List
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</header>
|
||||
|
||||
{/* Main Content */}
|
||||
<main className="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8 py-8">
|
||||
<div className="mb-6 flex gap-4">
|
||||
<div className="relative flex-1 max-w-md">
|
||||
<Search className="absolute left-3 top-2.5 h-5 w-5 text-slate-500" />
|
||||
<input
|
||||
type="text"
|
||||
placeholder="Search companies..."
|
||||
className="w-full bg-slate-900 border border-slate-700 text-slate-200 rounded-md pl-10 pr-4 py-2 focus:ring-2 focus:ring-blue-500 focus:border-transparent outline-none"
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="bg-slate-900 border border-slate-800 rounded-xl overflow-hidden shadow-xl">
|
||||
<CompanyTable key={refreshKey} apiBase={API_BASE} onRowClick={handleCompanySelect} /> {/* NEW PROP */}
|
||||
</div>
|
||||
</main>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
export default App
|
||||
205
company-explorer/frontend/src/components/CompanyTable.tsx
Normal file
205
company-explorer/frontend/src/components/CompanyTable.tsx
Normal file
@@ -0,0 +1,205 @@
|
||||
import { useState, useEffect, useMemo } from 'react'
|
||||
import {
|
||||
useReactTable,
|
||||
getCoreRowModel,
|
||||
flexRender,
|
||||
createColumnHelper,
|
||||
} from '@tanstack/react-table'
|
||||
import axios from 'axios'
|
||||
import { Play, Globe, AlertCircle, Search as SearchIcon, Loader2 } from 'lucide-react'
|
||||
import clsx from 'clsx'
|
||||
|
||||
type Company = {
|
||||
id: number
|
||||
name: string
|
||||
city: string | null
|
||||
country: string
|
||||
website: string | null
|
||||
status: string
|
||||
industry_ai: string | null
|
||||
}
|
||||
|
||||
const columnHelper = createColumnHelper<Company>()
|
||||
|
||||
interface CompanyTableProps {
|
||||
apiBase: string
|
||||
onRowClick: (companyId: number) => void // NEW PROP
|
||||
}
|
||||
|
||||
export function CompanyTable({ apiBase, onRowClick }: CompanyTableProps) {
|
||||
const [data, setData] = useState<Company[]>([])
|
||||
const [loading, setLoading] = useState(true)
|
||||
const [processingId, setProcessingId] = useState<number | null>(null)
|
||||
|
||||
const fetchData = async () => {
|
||||
setLoading(true)
|
||||
try {
|
||||
const res = await axios.get(`${apiBase}/companies?limit=100`)
|
||||
setData(res.data.items)
|
||||
} catch (e) {
|
||||
console.error(e)
|
||||
} finally {
|
||||
setLoading(false)
|
||||
}
|
||||
}
|
||||
|
||||
useEffect(() => {
|
||||
fetchData()
|
||||
}, [])
|
||||
|
||||
const triggerDiscovery = async (id: number) => {
|
||||
setProcessingId(id)
|
||||
try {
|
||||
await axios.post(`${apiBase}/enrich/discover`, { company_id: id })
|
||||
// Optimistic update or wait for refresh? Let's refresh shortly after to see results
|
||||
setTimeout(fetchData, 2000)
|
||||
} catch (e) {
|
||||
alert("Discovery Error")
|
||||
setProcessingId(null)
|
||||
}
|
||||
}
|
||||
|
||||
const triggerAnalysis = async (id: number) => {
|
||||
setProcessingId(id)
|
||||
try {
|
||||
await axios.post(`${apiBase}/enrich/analyze`, { company_id: id })
|
||||
setTimeout(fetchData, 2000)
|
||||
} catch (e) {
|
||||
alert("Analysis Error")
|
||||
setProcessingId(null)
|
||||
}
|
||||
}
|
||||
|
||||
const columns = useMemo(() => [
|
||||
columnHelper.accessor('name', {
|
||||
header: 'Company',
|
||||
cell: info => <span className="font-semibold text-white">{info.getValue()}</span>,
|
||||
}),
|
||||
columnHelper.accessor('city', {
|
||||
header: 'Location',
|
||||
cell: info => (
|
||||
<div className="text-slate-400 text-sm">
|
||||
{info.getValue() || '-'} <span className="text-slate-600">({info.row.original.country})</span>
|
||||
</div>
|
||||
),
|
||||
}),
|
||||
columnHelper.accessor('website', {
|
||||
header: 'Website',
|
||||
cell: info => {
|
||||
const url = info.getValue()
|
||||
if (url && url !== "k.A.") {
|
||||
return (
|
||||
<a href={url} target="_blank" rel="noreferrer" className="flex items-center gap-1 text-blue-400 hover:underline text-sm">
|
||||
<Globe className="h-3 w-3" /> {new URL(url).hostname.replace('www.', '')}
|
||||
</a>
|
||||
)
|
||||
}
|
||||
return <span className="text-slate-600 text-sm italic">Not found</span>
|
||||
},
|
||||
}),
|
||||
columnHelper.accessor('status', {
|
||||
header: 'Status',
|
||||
cell: info => {
|
||||
const s = info.getValue()
|
||||
return (
|
||||
<span className={clsx(
|
||||
"px-2 py-0.5 rounded-full text-[10px] font-bold uppercase tracking-wider",
|
||||
s === 'NEW' && "bg-slate-800 text-slate-400 border border-slate-700",
|
||||
s === 'DISCOVERED' && "bg-blue-500/10 text-blue-400 border border-blue-500/20",
|
||||
s === 'ENRICHED' && "bg-green-500/10 text-green-400 border border-green-500/20",
|
||||
)}>
|
||||
{s}
|
||||
</span>
|
||||
)
|
||||
}
|
||||
}),
|
||||
columnHelper.display({
|
||||
id: 'actions',
|
||||
header: '',
|
||||
cell: info => {
|
||||
const c = info.row.original
|
||||
const isProcessing = processingId === c.id
|
||||
|
||||
if (isProcessing) {
|
||||
return <Loader2 className="h-4 w-4 animate-spin text-blue-500" />
|
||||
}
|
||||
|
||||
// Action Logic
|
||||
if (c.status === 'NEW' || !c.website || c.website === "k.A.") {
|
||||
return (
|
||||
<button
|
||||
onClick={(e) => { e.stopPropagation(); triggerDiscovery(c.id); }}
|
||||
className="flex items-center gap-1 px-2 py-1 bg-slate-800 hover:bg-slate-700 text-xs font-medium text-slate-300 rounded border border-slate-700 transition-colors"
|
||||
title="Search Website & Wiki"
|
||||
>
|
||||
<SearchIcon className="h-3 w-3" /> Find
|
||||
</button>
|
||||
)
|
||||
}
|
||||
|
||||
// Ready for Analysis
|
||||
return (
|
||||
<button
|
||||
onClick={(e) => { e.stopPropagation(); triggerAnalysis(c.id); }}
|
||||
className="flex items-center gap-1 px-2 py-1 bg-blue-600/10 hover:bg-blue-600/20 text-blue-400 text-xs font-medium rounded border border-blue-500/20 transition-colors"
|
||||
title="Run AI Analysis"
|
||||
>
|
||||
<Play className="h-3 w-3 fill-current" /> Analyze
|
||||
</button>
|
||||
)
|
||||
}
|
||||
})
|
||||
], [processingId])
|
||||
|
||||
const table = useReactTable({
|
||||
data,
|
||||
columns,
|
||||
getCoreRowModel: getCoreRowModel(),
|
||||
})
|
||||
|
||||
if (loading && data.length === 0) return <div className="p-8 text-center text-slate-500">Loading companies...</div>
|
||||
|
||||
if (data.length === 0) return (
|
||||
<div className="p-12 text-center">
|
||||
<div className="inline-block p-4 bg-slate-800 rounded-full mb-4">
|
||||
<AlertCircle className="h-8 w-8 text-slate-500" />
|
||||
</div>
|
||||
<h3 className="text-lg font-medium text-white">No companies found</h3>
|
||||
<p className="text-slate-400 mt-2">Import a list to get started.</p>
|
||||
</div>
|
||||
)
|
||||
|
||||
return (
|
||||
<div className="overflow-x-auto">
|
||||
<table className="w-full text-left border-collapse">
|
||||
<thead>
|
||||
{table.getHeaderGroups().map(headerGroup => (
|
||||
<tr key={headerGroup.id} className="border-b border-slate-800 bg-slate-900/50">
|
||||
{headerGroup.headers.map(header => (
|
||||
<th key={header.id} className="p-4 text-xs font-medium text-slate-500 uppercase tracking-wider">
|
||||
{flexRender(header.column.columnDef.header, header.getContext())}
|
||||
</th>
|
||||
))}
|
||||
</tr>
|
||||
))}
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-800/50">
|
||||
{table.getRowModel().rows.map(row => (
|
||||
// Make row clickable
|
||||
<tr
|
||||
key={row.id}
|
||||
onClick={() => onRowClick(row.original.id)} // NEW: Row Click Handler
|
||||
className="hover:bg-slate-800/30 transition-colors cursor-pointer"
|
||||
>
|
||||
{row.getVisibleCells().map(cell => (
|
||||
<td key={cell.id} className="p-4 align-middle">
|
||||
{flexRender(cell.column.columnDef.cell, cell.getContext())}
|
||||
</td>
|
||||
))}
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
85
company-explorer/frontend/src/components/ImportWizard.tsx
Normal file
85
company-explorer/frontend/src/components/ImportWizard.tsx
Normal file
@@ -0,0 +1,85 @@
|
||||
import { useState } from 'react'
|
||||
import axios from 'axios'
|
||||
import { X, UploadCloud } from 'lucide-react'
|
||||
|
||||
interface ImportWizardProps {
|
||||
isOpen: boolean
|
||||
onClose: () => void
|
||||
onSuccess: () => void
|
||||
apiBase: string
|
||||
}
|
||||
|
||||
export function ImportWizard({ isOpen, onClose, onSuccess, apiBase }: ImportWizardProps) {
|
||||
const [text, setText] = useState("")
|
||||
const [loading, setLoading] = useState(false)
|
||||
|
||||
if (!isOpen) return null
|
||||
|
||||
const handleImport = async () => {
|
||||
const lines = text.split('\n').map(l => l.trim()).filter(l => l.length > 0)
|
||||
if (lines.length === 0) return
|
||||
|
||||
setLoading(true)
|
||||
try {
|
||||
await axios.post(`${apiBase}/companies/bulk`, { names: lines })
|
||||
setText("")
|
||||
onSuccess()
|
||||
onClose()
|
||||
} catch (e: any) {
|
||||
console.error(e)
|
||||
const msg = e.response?.data?.detail || e.message || "Unknown Error"
|
||||
alert(`Import failed: ${msg}`)
|
||||
} finally {
|
||||
setLoading(false)
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="fixed inset-0 bg-black/70 backdrop-blur-sm z-50 flex items-center justify-center p-4">
|
||||
<div className="bg-slate-900 border border-slate-700 rounded-xl w-full max-w-lg shadow-2xl">
|
||||
|
||||
{/* Header */}
|
||||
<div className="flex items-center justify-between p-4 border-b border-slate-800">
|
||||
<h3 className="text-lg font-semibold text-white flex items-center gap-2">
|
||||
<UploadCloud className="h-5 w-5 text-blue-400" />
|
||||
Quick Import
|
||||
</h3>
|
||||
<button onClick={onClose} className="text-slate-400 hover:text-white">
|
||||
<X className="h-5 w-5" />
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{/* Body */}
|
||||
<div className="p-4 space-y-4">
|
||||
<p className="text-sm text-slate-400">
|
||||
Paste company names below (one per line). Duplicates in the database will be skipped automatically.
|
||||
</p>
|
||||
<textarea
|
||||
className="w-full h-64 bg-slate-950 border border-slate-700 rounded-lg p-3 text-sm text-slate-200 focus:ring-2 focus:ring-blue-600 outline-none font-mono"
|
||||
placeholder="Company A Company B Company C..."
|
||||
value={text}
|
||||
onChange={e => setText(e.target.value)}
|
||||
/>
|
||||
</div>
|
||||
|
||||
{/* Footer */}
|
||||
<div className="p-4 border-t border-slate-800 flex justify-end gap-3">
|
||||
<button
|
||||
onClick={onClose}
|
||||
className="px-4 py-2 text-sm font-medium text-slate-400 hover:text-white"
|
||||
>
|
||||
Cancel
|
||||
</button>
|
||||
<button
|
||||
onClick={handleImport}
|
||||
disabled={loading || !text.trim()}
|
||||
className="px-4 py-2 bg-blue-600 hover:bg-blue-500 text-white rounded-md text-sm font-medium disabled:opacity-50 disabled:cursor-not-allowed"
|
||||
>
|
||||
{loading ? "Importing..." : "Import Companies"}
|
||||
</button>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
123
company-explorer/frontend/src/components/Inspector.tsx
Normal file
123
company-explorer/frontend/src/components/Inspector.tsx
Normal file
@@ -0,0 +1,123 @@
|
||||
import { useEffect, useState } from 'react'
|
||||
import axios from 'axios'
|
||||
import { X, ExternalLink, Robot, Briefcase, Calendar } from 'lucide-react'
|
||||
import clsx from 'clsx'
|
||||
|
||||
interface InspectorProps {
|
||||
companyId: number | null
|
||||
onClose: () => void
|
||||
apiBase: string
|
||||
}
|
||||
|
||||
type Signal = {
|
||||
signal_type: string
|
||||
confidence: number
|
||||
value: string
|
||||
proof_text: string
|
||||
}
|
||||
|
||||
type CompanyDetail = {
|
||||
id: number
|
||||
name: string
|
||||
website: string | null
|
||||
industry_ai: string | null
|
||||
status: string
|
||||
created_at: string
|
||||
signals: Signal[]
|
||||
}
|
||||
|
||||
export function Inspector({ companyId, onClose, apiBase }: InspectorProps) {
|
||||
const [data, setData] = useState<CompanyDetail | null>(null)
|
||||
const [loading, setLoading] = useState(false)
|
||||
|
||||
useEffect(() => {
|
||||
if (!companyId) return
|
||||
setLoading(true)
|
||||
axios.get(`${apiBase}/companies/${companyId}`)
|
||||
.then(res => setData(res.data))
|
||||
.catch(console.error)
|
||||
.finally(() => setLoading(false))
|
||||
}, [companyId])
|
||||
|
||||
if (!companyId) return null
|
||||
|
||||
return (
|
||||
<div className="fixed inset-y-0 right-0 w-[500px] bg-slate-900 border-l border-slate-800 shadow-2xl transform transition-transform duration-300 ease-in-out z-40 overflow-y-auto">
|
||||
{loading ? (
|
||||
<div className="p-8 text-slate-500">Loading details...</div>
|
||||
) : !data ? (
|
||||
<div className="p-8 text-red-400">Failed to load data.</div>
|
||||
) : (
|
||||
<div className="flex flex-col h-full">
|
||||
{/* Header */}
|
||||
<div className="p-6 border-b border-slate-800 bg-slate-950/50">
|
||||
<div className="flex justify-between items-start mb-4">
|
||||
<h2 className="text-xl font-bold text-white leading-tight">{data.name}</h2>
|
||||
<button onClick={onClose} className="text-slate-400 hover:text-white">
|
||||
<X className="h-6 w-6" />
|
||||
</button>
|
||||
</div>
|
||||
|
||||
<div className="flex flex-wrap gap-2 text-sm">
|
||||
{data.website && (
|
||||
<a href={data.website} target="_blank" className="flex items-center gap-1 text-blue-400 hover:underline">
|
||||
<ExternalLink className="h-3 w-3" /> {new URL(data.website).hostname.replace('www.', '')}
|
||||
</a>
|
||||
)}
|
||||
{data.industry_ai && (
|
||||
<span className="flex items-center gap-1 px-2 py-0.5 bg-slate-800 text-slate-300 rounded border border-slate-700">
|
||||
<Briefcase className="h-3 w-3" /> {data.industry_ai}
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Robotics Scorecard */}
|
||||
<div className="p-6 space-y-6">
|
||||
<div>
|
||||
<h3 className="text-sm font-semibold text-slate-400 uppercase tracking-wider mb-3 flex items-center gap-2">
|
||||
<Robot className="h-4 w-4" /> Robotics Potential
|
||||
</h3>
|
||||
|
||||
<div className="grid grid-cols-2 gap-4">
|
||||
{['cleaning', 'transport', 'security', 'service'].map(type => {
|
||||
const sig = data.signals.find(s => s.signal_type.includes(type))
|
||||
const score = sig ? sig.confidence : 0
|
||||
|
||||
return (
|
||||
<div key={type} className="bg-slate-800/50 p-3 rounded-lg border border-slate-700">
|
||||
<div className="flex justify-between mb-1">
|
||||
<span className="text-sm text-slate-300 capitalize">{type}</span>
|
||||
<span className={clsx("text-sm font-bold", score > 70 ? "text-green-400" : score > 30 ? "text-yellow-400" : "text-slate-500")}>
|
||||
{score}%
|
||||
</span>
|
||||
</div>
|
||||
<div className="w-full bg-slate-700 h-1.5 rounded-full overflow-hidden">
|
||||
<div
|
||||
className={clsx("h-full rounded-full", score > 70 ? "bg-green-500" : score > 30 ? "bg-yellow-500" : "bg-slate-600")}
|
||||
style={{ width: `${score}%` }}
|
||||
/>
|
||||
</div>
|
||||
{sig?.proof_text && (
|
||||
<p className="text-xs text-slate-500 mt-2 line-clamp-2" title={sig.proof_text}>
|
||||
"{sig.proof_text}"
|
||||
</p>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Meta Info */}
|
||||
<div className="pt-6 border-t border-slate-800">
|
||||
<div className="text-xs text-slate-500 flex items-center gap-2">
|
||||
<Calendar className="h-3 w-3" /> Added: {new Date(data.created_at).toLocaleDateString()}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
19
company-explorer/frontend/src/index.css
Normal file
19
company-explorer/frontend/src/index.css
Normal file
@@ -0,0 +1,19 @@
|
||||
@tailwind base;
|
||||
@tailwind components;
|
||||
@tailwind utilities;
|
||||
|
||||
/* Custom Scrollbar for dark theme */
|
||||
::-webkit-scrollbar {
|
||||
width: 8px;
|
||||
height: 8px;
|
||||
}
|
||||
::-webkit-scrollbar-track {
|
||||
background: #1e293b;
|
||||
}
|
||||
::-webkit-scrollbar-thumb {
|
||||
background: #475569;
|
||||
border-radius: 4px;
|
||||
}
|
||||
::-webkit-scrollbar-thumb:hover {
|
||||
background: #64748b;
|
||||
}
|
||||
10
company-explorer/frontend/src/main.tsx
Normal file
10
company-explorer/frontend/src/main.tsx
Normal file
@@ -0,0 +1,10 @@
|
||||
import React from 'react'
|
||||
import ReactDOM from 'react-dom/client'
|
||||
import App from './App.tsx'
|
||||
import './index.css'
|
||||
|
||||
ReactDOM.createRoot(document.getElementById('root')!).render(
|
||||
<React.StrictMode>
|
||||
<App />
|
||||
</React.StrictMode>,
|
||||
)
|
||||
1
company-explorer/frontend/src/vite-env.d.ts
vendored
Normal file
1
company-explorer/frontend/src/vite-env.d.ts
vendored
Normal file
@@ -0,0 +1 @@
|
||||
/// <reference types="vite/client" />
|
||||
11
company-explorer/frontend/tailwind.config.js
Normal file
11
company-explorer/frontend/tailwind.config.js
Normal file
@@ -0,0 +1,11 @@
|
||||
/** @type {import('tailwindcss').Config} */
|
||||
export default {
|
||||
content: [
|
||||
"./index.html",
|
||||
"./src/**/*.{js,ts,jsx,tsx}",
|
||||
],
|
||||
theme: {
|
||||
extend: {},
|
||||
},
|
||||
plugins: [],
|
||||
}
|
||||
16
company-explorer/frontend/vite.config.ts
Normal file
16
company-explorer/frontend/vite.config.ts
Normal file
@@ -0,0 +1,16 @@
|
||||
import { defineConfig } from 'vite'
|
||||
import react from '@vitejs/plugin-react'
|
||||
|
||||
// https://vitejs.dev/config/
|
||||
export default defineConfig({
|
||||
plugins: [react()],
|
||||
base: '/ce/', // Critical for Nginx Reverse Proxy
|
||||
server: {
|
||||
proxy: {
|
||||
'/api': {
|
||||
target: 'http://localhost:8000', // Forward API calls to FastAPI during dev
|
||||
changeOrigin: true
|
||||
}
|
||||
}
|
||||
}
|
||||
})
|
||||
15
company-explorer/requirements.txt
Normal file
15
company-explorer/requirements.txt
Normal file
@@ -0,0 +1,15 @@
|
||||
fastapi
|
||||
uvicorn
|
||||
sqlalchemy
|
||||
pydantic
|
||||
pydantic-settings
|
||||
requests
|
||||
beautifulsoup4
|
||||
pandas
|
||||
openpyxl
|
||||
thefuzz
|
||||
python-Levenshtein
|
||||
google-genai
|
||||
pillow
|
||||
python-multipart
|
||||
python-dotenv
|
||||
39
create_dashboard.py
Normal file
39
create_dashboard.py
Normal file
@@ -0,0 +1,39 @@
|
||||
import time
|
||||
from notion_client import Client
|
||||
|
||||
def final_push():
|
||||
# --- KONFIGURATION DIREKT IN DER FUNKTION ---
|
||||
token = "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"
|
||||
database_id = "acf0e7e1-fff2-425b-81a1-00fbc76085b8"
|
||||
|
||||
notion = Client(auth=token)
|
||||
|
||||
print(f"🚀 Starte Injektion in DB: {database_id}")
|
||||
|
||||
sectors = [
|
||||
{"name": "Hotellerie", "desc": "Relevant für Empfang, Reinigung Zimmer, Parkplatz & Spa. Fokus auf Wellness vs. Business."},
|
||||
{"name": "Pflege & Kliniken", "desc": "Hohe Hygienestandards, Desinfektion, Transport von Mahlzeiten/Wäsche."},
|
||||
{"name": "Lager & Produktion", "desc": "Großflächenreinigung, Objektschutz (Security), Intralogistik-Transport."},
|
||||
{"name": "Einzelhandel", "desc": "Frequenzorientierte Reinigung, interaktive Verkaufsförderung (Ads), Nachtreinigung."}
|
||||
]
|
||||
|
||||
for s in sectors:
|
||||
try:
|
||||
notion.pages.create(
|
||||
parent={"database_id": database_id},
|
||||
properties={
|
||||
"Name": {"title": [{"text": {"content": s["name"]}}]},
|
||||
"Beschreibung": {"rich_text": [{"text": {"content": s["desc"]}}]},
|
||||
"Art": {"select": {"name": "Sector"}}
|
||||
}
|
||||
)
|
||||
print(f" ✅ {s['name']} wurde erfolgreich angelegt.")
|
||||
time.sleep(0.5)
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ Fehler bei {s['name']}: {e}")
|
||||
|
||||
print("\n🏁 FERTIG. Schau jetzt in dein Notion Dashboard!")
|
||||
|
||||
if __name__ == "__main__":
|
||||
final_push()
|
||||
@@ -153,6 +153,17 @@
|
||||
<a href="/gtm/" class="btn">Starten →</a>
|
||||
</div>
|
||||
|
||||
<!-- Company Explorer (Robotics) -->
|
||||
<div class="card">
|
||||
<span class="card-icon">🤖</span>
|
||||
<h2>Company Explorer</h2>
|
||||
<p>
|
||||
Das zentrale CRM-Data-Mining Tool. Importieren, Deduplizieren und Anreichern von Firmenlisten mit Fokus auf Robotik-Potential.
|
||||
</p>
|
||||
<!-- Jetzt direkt zum Frontend -->
|
||||
<a href="/ce/" class="btn">Starten →</a>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
<footer>
|
||||
|
||||
@@ -16,6 +16,7 @@ services:
|
||||
- dashboard
|
||||
- b2b-app
|
||||
- market-frontend
|
||||
- company-explorer # NEW
|
||||
|
||||
# --- DASHBOARD (Landing Page) ---
|
||||
dashboard:
|
||||
@@ -25,6 +26,25 @@ services:
|
||||
container_name: gemini-dashboard
|
||||
restart: unless-stopped
|
||||
|
||||
# --- COMPANY EXPLORER (Robotics Edition) ---
|
||||
company-explorer:
|
||||
build:
|
||||
context: ./company-explorer
|
||||
dockerfile: Dockerfile
|
||||
container_name: company-explorer
|
||||
restart: unless-stopped
|
||||
volumes:
|
||||
# Sideloading: Source Code (Hot Reload)
|
||||
- ./company-explorer:/app
|
||||
# Keys
|
||||
- ./gemini_api_key.txt:/app/gemini_api_key.txt
|
||||
- ./serpapikey.txt:/app/serpapikey.txt
|
||||
# Logs (Debug)
|
||||
- ./Log_from_docker:/app/logs_debug
|
||||
environment:
|
||||
- PYTHONUNBUFFERED=1
|
||||
# Port 8000 is internal only
|
||||
|
||||
# --- B2B MARKETING ASSISTANT ---
|
||||
b2b-app:
|
||||
build:
|
||||
@@ -124,6 +144,9 @@ services:
|
||||
dns-monitor:
|
||||
image: alpine
|
||||
container_name: dns-monitor
|
||||
dns:
|
||||
- 8.8.8.8
|
||||
- 1.1.1.1
|
||||
environment:
|
||||
- SUBDOMAINS=floke,floke-ai,floke-gitea,floke-ha,floke-n8n
|
||||
- TZ=Europe/Berlin
|
||||
@@ -131,4 +154,3 @@ services:
|
||||
- ./dns-monitor:/app
|
||||
command: /app/monitor.sh
|
||||
restart: unless-stopped
|
||||
|
||||
@@ -72,5 +72,20 @@ http {
|
||||
proxy_connect_timeout 1200s;
|
||||
proxy_send_timeout 1200s;
|
||||
}
|
||||
|
||||
location /ce/ {
|
||||
# Company Explorer (Robotics Edition)
|
||||
# Der Trailing Slash am Ende ist wichtig!
|
||||
proxy_pass http://company-explorer:8000/;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection "upgrade";
|
||||
|
||||
# Explicit timeouts
|
||||
proxy_read_timeout 1200s;
|
||||
proxy_connect_timeout 1200s;
|
||||
proxy_send_timeout 1200s;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
28
readme.md
28
readme.md
@@ -861,3 +861,31 @@ Um eine stabile Erreichbarkeit der Dienste zu gewährleisten, wurde eine Docker-
|
||||
|
||||
* **[DuckDNS & Monitoring Setup](duckdns_setup.md):** Dokumentation zur Einrichtung des `duckdns` Updaters und des `dns-monitor` Sidecars, um Verbindungsprobleme und Caching-Fehler zu beheben.
|
||||
|
||||
---
|
||||
|
||||
## 13. Funktionsweise im Detail
|
||||
|
||||
### Analyse-Tiefe der "Digital Signals" im Market Intelligence Tool
|
||||
|
||||
Die Identifizierung von digitalen Signalen bei Zielunternehmen erfolgt über einen pragmatischen, zweistufigen Prozess, um ein Gleichgewicht zwischen analytischer Tiefe und Performance zu gewährleisten:
|
||||
|
||||
1. **Vollständiges Parsen der Unternehmens-Homepage:**
|
||||
* Die Haupt-URL des zu analysierenden Unternehmens wird einmalig vollständig gecrawlt. Der extrahierte Text (`homepage_text`) dient dem Sprachmodell als grundlegender Kontext, um das Geschäftsmodell und die Kernaussagen des Unternehmens zu verstehen.
|
||||
|
||||
2. **Analyse von Suchergebnis-Snippets für Signale:**
|
||||
* Für die gezielte Suche nach spezifischen Signalen (z.B. eingesetzte Konkurrenzprodukte, offene Stellen, strategische Initiativen) wird die SerpAPI (Google-Suche) genutzt.
|
||||
* **Wichtig:** Die in den Suchergebnissen gefundenen Ziel-URLs werden **nicht** erneut besucht und geparst. Stattdessen werden ausschließlich der **Titel (`title`)** und der von der Suchmaschine generierte **Textschnipsel (`snippet`)** als "Beweismittel" (`evidence`) an das Sprachmodell übergeben.
|
||||
* Dieser Ansatz ist ein bewusster Kompromiss: Er ist extrem schnell und kosteneffizient, da er aufwändiges Crawling vermeidet. Die Snippets sind in der Regel aussagekräftig genug, um das Vorhandensein eines Signals mit hoher Wahrscheinlichkeit zu validieren.
|
||||
|
||||
### Zukünftige Erweiterung: Detaillierte Analyse von Stellenausschreibungen
|
||||
|
||||
Als eine zukünftige, sehr wertvolle Erweiterung ist die detaillierte, automatisierte Analyse von Stellenausschreibungen vorgemerkt.
|
||||
|
||||
* **Strategischer Mehrwert:**
|
||||
* **Einblick in die Wirtschaftslage:** Die Art und Anzahl der offenen Stellen (z.B. Vertrieb vs. Entwicklung vs. Verwaltung) kann Aufschluss über die aktuelle Wachstums-, Konsolidierungs- oder Krisenphase eines Unternehmens geben.
|
||||
* **IT-Landkarte & Tech-Stack:** Insbesondere IT-Stellenanzeigen sind eine Goldgrube für Technographic-Daten. Sie listen oft explizit die eingesetzten Programmiersprachen, Frameworks, Datenbanken, ERP-Systeme (z.B. SAP, D365) und Cloud-Anbieter auf. Dies erlaubt eine einzigartig detaillierte Erstellung der "IT-Landkarte" eines Zielunternehmens.
|
||||
* **Herausforderung:**
|
||||
* Der technische Aufwand für ein robustes System, das Karriereseiten findet, die verschiedenen Job-Portale parst und die relevanten Informationen extrahiert, ist immens.
|
||||
* **Status:**
|
||||
* Diese Erweiterung wird für eine spätere Entwicklungsphase vorgemerkt und sollte aufgrund der Komplexität in einem klar abgegrenzten, überschaubaren Rahmen umgesetzt werden.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user