feat(company-explorer): Initial Web UI & Backend with Enrichment Flow
This commit introduces the foundational elements for the new "Company Explorer" web application, marking a significant step away from the legacy Google Sheets / CLI system. Key changes include: - Project Structure: A new directory with separate (FastAPI) and (React/Vite) components. - Data Persistence: Migration from Google Sheets to a local SQLite database () using SQLAlchemy. - Core Utilities: Extraction and cleanup of essential helper functions (LLM wrappers, text utilities) into . - Backend Services: , , for AI-powered analysis, and logic. - Frontend UI: Basic React application with company table, import wizard, and dynamic inspector sidebar. - Docker Integration: Updated and for multi-stage builds and sideloading. - Deployment & Access: Integrated into central Nginx proxy and dashboard, accessible via . Lessons Learned & Fixed during development: - Frontend Asset Loading: Addressed issues with Vite's path and FastAPI's . - TypeScript Configuration: Added and . - Database Schema Evolution: Solved errors by forcing a new database file and correcting override. - Logging: Implemented robust file-based logging (). This new foundation provides a powerful and maintainable platform for future B2B robotics lead generation.
This commit is contained in:
80
MIGRATION_PLAN.md
Normal file
80
MIGRATION_PLAN.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# Migrations-Plan: Legacy GSheets -> Company Explorer (Robotics Edition)
|
||||
|
||||
**Kontext:** Neuanfang für die Branche **Robotik & Facility Management**.
|
||||
**Ziel:** Ablösung von Google Sheets/CLI durch eine Web-App ("Company Explorer") mit SQLite-Backend.
|
||||
|
||||
## 1. Strategische Neuausrichtung
|
||||
|
||||
| Bereich | Alt (Legacy) | Neu (Robotics Edition) |
|
||||
| :--- | :--- | :--- |
|
||||
| **Daten-Basis** | Google Sheets | **SQLite** (Lokal, performant, filterbar). |
|
||||
| **Ziel-Daten** | Allgemein / Kundenservice | **Robotics-Signale** (SPA-Bereich? Intralogistik? Werkschutz?). |
|
||||
| **Branchen** | KI-Vorschlag (Freitext) | **Strict Mode:** Mapping auf feste CRM-Liste (z.B. "Hotellerie", "Maschinenbau"). |
|
||||
| **Texterstellung** | Pain/Gain Matrix (Service) | **Pain/Gain Matrix (Robotics)**. "Übersetzung" des alten Wissens auf Roboter. |
|
||||
| **Analytics** | Techniker-ML-Modell | **Deaktiviert**. Vorerst keine Relevanz. |
|
||||
| **Operations** | D365 Sync (Broken) | **Excel-Import & Deduplizierung**. Fokus auf Matching externer Listen gegen Bestand. |
|
||||
|
||||
## 2. Architektur & Komponenten-Mapping
|
||||
|
||||
Das System wird in `company-explorer/` neu aufgebaut. Wir lösen Abhängigkeiten zur Root `helpers.py` auf.
|
||||
|
||||
### A. Core Backend (`backend/`)
|
||||
|
||||
| Komponente | Aufgabe & Neue Logik | Prio |
|
||||
| :--- | :--- | :--- |
|
||||
| **Database** | Ersetzt `GoogleSheetHandler`. Speichert Firmen & "Enrichment Blobs". | 1 |
|
||||
| **Importer** | Ersetzt `SyncManager`. Importiert Excel-Dumps (CRM) und Event-Listen. | 1 |
|
||||
| **Deduplicator** | Ersetzt `company_deduplicator.py`. **Kern-Feature:** Checkt Event-Listen gegen DB. Muss "intelligent" matchen (Name + Ort + Web). | 1 |
|
||||
| **Scraper (Base)** | Extrahiert Text von Websites. Basis für alle Analysen. | 1 |
|
||||
| **Signal Detector** | **NEU.** Analysiert Website-Text auf Roboter-Potential. <br> *Logik:* Wenn Branche = Hotel & Keyword = "Wellness" -> Potential: Reinigungsroboter. | 1 |
|
||||
| **Classifier** | Brancheneinstufung. **Strict Mode:** Prüft gegen `config/allowed_industries.json`. | 2 |
|
||||
| **Marketing Engine** | Ersetzt `generate_marketing_text.py`. Nutzt neue `marketing_wissen_robotics.yaml`. | 3 |
|
||||
|
||||
### B. Frontend (`frontend/`) - React
|
||||
|
||||
* **View 1: Der "Explorer":** DataGrid aller Firmen. Filterbar nach "Roboter-Potential" und Status.
|
||||
* **View 2: Der "Inspector":** Detailansicht einer Firma. Zeigt gefundene Signale ("Hat SPA Bereich"). Manuelle Korrektur-Möglichkeit.
|
||||
* **View 3: "List Matcher":** Upload einer Excel-Liste -> Anzeige von Duplikaten -> Button "Neue importieren".
|
||||
|
||||
## 3. Umgang mit Shared Code (`helpers.py` & Co.)
|
||||
|
||||
Wir kapseln das neue Projekt vollständig ab ("Fork & Clean").
|
||||
|
||||
* **Quelle:** `helpers.py` (Root)
|
||||
* **Ziel:** `company-explorer/backend/lib/core_utils.py`
|
||||
* **Aktion:** Wir kopieren nur:
|
||||
* OpenAI/Gemini Wrapper (Retry Logic).
|
||||
* Text Cleaning (`clean_text`, `normalize_string`).
|
||||
* URL Normalization.
|
||||
|
||||
* **Quelle:** Andere Gemini Apps (`duckdns`, `gtm-architect`, `market-intel`)
|
||||
* **Aktion:** Wir betrachten diese als Referenz. Nützliche Logik (z.B. die "Grit"-Prompts aus `market-intel`) wird explizit in die neuen Service-Module kopiert.
|
||||
|
||||
## 4. Datenstruktur (SQLite Schema)
|
||||
|
||||
### Tabelle `companies` (Stammdaten)
|
||||
* `id` (PK)
|
||||
* `name` (String)
|
||||
* `website` (String)
|
||||
* `crm_id` (String, nullable - Link zum D365)
|
||||
* `industry_crm` (String - Die "erlaubte" Branche)
|
||||
* `city` (String)
|
||||
* `country` (String)
|
||||
* `status` (Enum: NEW, IMPORTED, ENRICHED, QUALIFIED)
|
||||
|
||||
### Tabelle `signals` (Roboter-Potential)
|
||||
* `company_id` (FK)
|
||||
* `signal_type` (z.B. "has_spa", "has_large_warehouse", "has_security_needs")
|
||||
* `confidence` (Float)
|
||||
* `proof_text` (Snippet von der Website)
|
||||
|
||||
### Tabelle `duplicates_log`
|
||||
* Speichert Ergebnisse von Listen-Abgleichen ("Upload X enthielt 20 bekannte Firmen").
|
||||
|
||||
## 5. Phasenplan Umsetzung
|
||||
|
||||
1. **Housekeeping:** Archivierung des Legacy-Codes (`_legacy_gsheets_system`).
|
||||
2. **Setup:** Init `company-explorer` (Backend + Frontend Skeleton).
|
||||
3. **Foundation:** DB-Schema + "List Matcher" (Deduplizierung ist Prio A für Operations).
|
||||
4. **Enrichment:** Implementierung des Scrapers + Signal Detector (Robotics).
|
||||
5. **UI:** React Interface für die Daten.
|
||||
Reference in New Issue
Block a user