Files
Brancheneinstufung2/GEMINI.md

84 lines
6.3 KiB
Markdown

# Gemini Code Assistant Context
## Wichtige Hinweise
- **Projektdokumentation:** Die primäre und umfassendste Dokumentation für dieses Projekt befindet sich in der Datei `readme.md`. Bitte ziehen Sie diese Datei für ein detailliertes Verständnis der Architektur und der einzelnen Module zu Rate.
- **Git-Repository:** Dieses Projekt wird über ein Git-Repository verwaltet. Alle Änderungen am Code werden versioniert. Beachten Sie den Abschnitt "Git Workflow & Conventions" für unsere Arbeitsregeln.
- **WICHTIG:** Der AI-Agent kann Änderungen committen, aber aus Sicherheitsgründen oft nicht `git push` ausführen. Bitte führen Sie `git push` manuell aus, wenn der Agent dies meldet.
## Project Overview
This project is a Python-based system for automated company data enrichment and lead generation. It focuses on identifying B2B companies with high potential for robotics automation (Cleaning, Transport, Security, Service).
The system architecture has evolved from a CLI-based toolset to a modern web application (`company-explorer`) backed by Docker containers.
## Current Status (Jan 15, 2026) - Company Explorer (Robotics Edition v0.5.0)
### 1. Contacts Management (v0.5)
* **Full CRUD:** Integrated Contact Management system with direct editing capabilities.
* **Global List View:** Dedicated view for all contacts across all companies with search and filter.
* **Data Model:** Supports advanced fields like Academic Title, Role Interpretation (Decision Maker vs. User), and Marketing Automation Status.
* **Bulk Import:** CSV-based bulk import for contacts that automatically creates missing companies and prevents duplicates via email matching.
### 2. UI/UX Modernization
* **Light/Dark Mode:** Full theme support with toggle.
* **Grid Layout:** Unified card-based layout for both Company and Contact lists.
* **Mobile Responsiveness:** Optimized Inspector overlay and navigation for mobile devices.
* **Tabbed Inspector:** Clean separation between Company Overview and Contact Management within the details pane.
### 3. Advanced Configuration (Settings)
* **Industry Verticals:** Database-backed configuration for target industries (Description, Focus Flag, Primary Product).
* **Job Role Mapping:** Configurable patterns (Regex/Text) to map job titles on business cards to internal roles (e.g., "CTO" -> "Innovation Driver").
* **Robotics Categories:** Existing AI reasoning logic remains configurable via the UI.
### 4. Robotics Potential Analysis (v2.3)
* **Chain-of-Thought Logic:** The AI analysis (`ClassificationService`) uses multi-step reasoning to evaluate physical infrastructure.
* **Provider vs. User:** Strict differentiation logic implemented.
### 5. Web Scraping & Legal Data (v2.2)
* **Impressum Scraping:** 2-Hop Strategy and Root Fallback logic.
* **Manual Overrides:** Users can manually correct Wikipedia, Website, and Impressum URLs directly in the UI.
## Lessons Learned & Best Practices
## Lessons Learned & Best Practices
1. **Numeric Extraction (German Locale):**
* **Problem:** "1.005 Mitarbeiter" was extracted as "1" (treating dot as decimal).
* **Solution:** Implemented context-aware logic. If a number has a dot followed by exactly 3 digits (and no comma), it is treated as a thousands separator.
* **Revenue:** For revenue (`is_umsatz=True`), dots are generally treated as decimals (e.g. "375.6 Mio") unless unambiguous multiple dots exist. Billion/Mrd is converted to 1000 Million.
2. **LLM JSON Stability:**
* **Problem:** LLMs often wrap JSON in Markdown blocks (` ```json `), causing `json.loads()` to fail.
* **Solution:** ALWAYS use a `clean_json_response` helper that strips markers before parsing. Never trust raw LLM output.
3. **LLM Structure Inconsistency:**
* **Problem:** Even with `json_mode=True`, models sometimes wrap the result in a list `[...]` instead of a flat object `{...}`, breaking frontend property access.
* **Solution:** Implement a check: `if isinstance(result, list): result = result[0]`.
4. **Scraping Navigation:**
* **Problem:** Searching for "Impressum" only on the *scraped* URL (which might be a subpage found via Google) often fails.
* **Solution:** Always implement a fallback to the **Root Domain** AND a **2-Hop check** via the "Kontakt" page.
5. **Frontend State Management:**
* **Problem:** Users didn't see when a background job finished.
* **Solution:** Implementing a polling mechanism (`setInterval`) tied to a `isProcessing` state is superior to static timeouts for long-running AI tasks.
6. **Notion API - Schema First:**
* **Problem:** Scripts failed when trying to write data to a Notion database property (column) that did not exist.
* **Solution:** ALWAYS ensure the database schema is correct *before* attempting to import or update data. Use the `databases.update` endpoint to add the required properties (e.g., "Key Features", "Constraints") programmatically as a preliminary step. The API will not create them on the fly.
7. **Notion API - Character Limits:**
* **Problem:** API calls failed with a `400 Bad Request` error when a rich text field exceeded the maximum length.
* **Solution:** Be aware of the **2000-character limit** for rich text properties. Implement logic to truncate text content before sending the payload to the Notion API to prevent validation errors.
8. **Notion API - Response Structures:**
* **Problem:** Parsing functions failed with `TypeError` or `AttributeError` because the JSON structure for a property differed depending on how it was requested.
* **Solution:** Write robust helper functions that can handle multiple possible JSON structures. A property object retrieved via a direct property endpoint (`/pages/{id}/properties/{prop_id}`) is structured differently from the same property when it's part of a full page object (`/pages/{id}`). The parsing logic must account for these variations.
## Next Steps
* **Marketing Automation:** Implement the actual sending logic (or export) based on the contact status.
* **Job Role Mapping Engine:** Connect the configured patterns to the contact import/creation process to auto-assign roles.
* **Industry Classification Engine:** Connect the configured industries to the AI Analysis prompt to enforce the "Strict Mode" mapping.
* **Export:** Generate Excel/CSV enriched reports (already partially implemented via JSON export).