# Gemini Code Assistant Context ## Wichtige Hinweise - **Projektdokumentation:** Die primäre und umfassendste Dokumentation für dieses Projekt befindet sich in der Datei `readme.md`. Bitte ziehen Sie diese Datei für ein detailliertes Verständnis der Architektur und der einzelnen Module zu Rate. - **Git-Repository:** Dieses Projekt wird über ein Git-Repository verwaltet. Alle Änderungen am Code werden versioniert. Beachten Sie den Abschnitt "Git Workflow & Conventions" für unsere Arbeitsregeln. ## Project Overview This project is a Python-based system for automated company data enrichment and lead generation. It uses a variety of data sources, including web scraping, Wikipedia, and the OpenAI API, to enrich company data from a CRM system. The project is designed to run in a Docker container and can be controlled via a Flask API. The system is modular and consists of the following key components: * **`brancheneinstufung_167.py`:** The core module for data enrichment, including web scraping, Wikipedia lookups, and AI-based analysis. * **`company_deduplicator.py`:** A module for intelligent duplicate checking, both for external lists and internal CRM data. * **`generate_marketing_text.py`:** An engine for creating personalized marketing texts. * **`app.py`:** A Flask application that provides an API to run the different modules. * **`company-explorer/`:** A new React/FastAPI-based application (v2.x) replacing the legacy CLI tools. It focuses on identifying robotics potential in companies. ## Git Workflow & Conventions - **Commit-Nachrichten:** Commits sollen einem klaren Format folgen: - Titel: Eine prägnante Zusammenfassung unter 100 Zeichen. - Beschreibung: Detaillierte Änderungen als Liste mit `- ` am Zeilenanfang (keine Bulletpoints). - **Datei-Umbenennungen:** Um die Git-Historie einer Datei zu erhalten, muss sie zwingend mit `git mv alter_name.py neuer_name.py` umbenannt werden. - **Commit & Push Prozess:** Änderungen werden zuerst lokal committet. Das Pushen auf den Remote-Server erfolgt erst nach expliziter Bestätigung durch Sie. ## Current Status (Jan 08, 2026) - Company Explorer (Robotics Edition) * **Robotics Potential Analysis (v2.3):** * **Logic Overhaul:** Switched from keyword-based scanning to a **"Chain-of-Thought" Infrastructure Analysis**. The AI now evaluates physical assets (factories, warehouses, solar parks) to determine robotics needs. * **Provider vs. User:** Implemented strict reasoning to distinguish between companies *selling* cleaning products (providers) and those *operating* factories (users/potential clients). * **Configurable Logic:** Added a database-backed configuration system for robotics categories (`cleaning`, `transport`, `security`, `service`). Users can now define the "Trigger Logic" and "Scoring Guide" directly in the frontend settings. * **Wikipedia Integration (v2.1):** * **Deep Extraction:** Implemented the "Legacy" extraction logic (`WikipediaService`). It now pulls the **first paragraph** (cleaned of references), **categories** (filtered for relevance), revenue, employees, and HQ location. * **Google-First Discovery:** Uses SerpAPI to find the correct Wikipedia article, validating via domain match and city. * **Visual Inspector:** The frontend `Inspector` now displays a comprehensive Wikipedia profile including category tags. * **Manual Overrides & Control:** * **Wikipedia Override:** Added a UI to manually correct the Wikipedia URL. This triggers a re-scan and **locks** the record (`is_locked` flag) to prevent auto-overwrite. * **Website Override:** Added a UI to manually correct the company website. This automatically clears old scraping data to force a fresh analysis on the next run. * **Architecture & DB:** * **Database:** Updated `companies_v3_final.db` schema to include `RoboticsCategory` and `EnrichmentData.is_locked`. * **Services:** Refactored `ClassificationService` and `DiscoveryService` for better modularity and robustness. ## Next Steps * **Quality Assurance:** Implement a dedicated "Review Mode" to validate high-potential leads. * **Data Import:** Finalize the "List Matcher" to import and deduplicate Excel lists against the new DB.