feat(ce): upgrade to v0.5.0 with contacts management, advanced settings and ui modernization

2026-01-15 09:23:58 +00:00
parent 2b7c072ddc
commit 4b815c6510
16 changed files with 2794 additions and 828 deletions
--- a/GEMINI.md
+++ b/GEMINI.md
@@ -12,31 +12,34 @@ This project is a Python-based system for automated company data enrichment and

 The system architecture has evolved from a CLI-based toolset to a modern web application (`company-explorer`) backed by Docker containers.

-## Current Status (Jan 08, 2026) - Company Explorer (Robotics Edition v0.3.0)
+## Current Status (Jan 15, 2026) - Company Explorer (Robotics Edition v0.5.0)

-### 1. Robotics Potential Analysis (v2.3)
-*   **Chain-of-Thought Logic:** The AI analysis (`ClassificationService`) now uses a multi-step reasoning process to evaluate companies based on their **physical infrastructure** (factories, warehouses) rather than just keywords.
-*   **Provider vs. User:** Strict logic implemented to distinguish between companies *selling* automation products and those *needing* them for their own operations.
-*   **Configurable Settings:** A database-driven configuration (`RoboticsCategory`) allows users to edit the definition and scoring logic for each robotics category directly via the frontend settings menu.
+### 1. Contacts Management (v0.5)
+*   **Full CRUD:** Integrated Contact Management system with direct editing capabilities.
+*   **Global List View:** Dedicated view for all contacts across all companies with search and filter.
+*   **Data Model:** Supports advanced fields like Academic Title, Role Interpretation (Decision Maker vs. User), and Marketing Automation Status.
+*   **Bulk Import:** CSV-based bulk import for contacts that automatically creates missing companies and prevents duplicates via email matching.

-### 2. Deep Wikipedia Integration (v2.1)
-*   **Extraction:** The system extracts the first paragraph (cleaned of artifacts), industry, revenue (normalized to Mio €), employee count, and Wikipedia categories.
-*   **Validation:** Uses a "Google-First" strategy via SerpAPI, validating candidates by checking for domain matches and city/HQ location in the article.
-*   **UI:** The Inspector displays a dedicated Wikipedia profile section with visual tags.
+### 2. UI/UX Modernization
+*   **Light/Dark Mode:** Full theme support with toggle.
+*   **Grid Layout:** Unified card-based layout for both Company and Contact lists.
+*   **Mobile Responsiveness:** Optimized Inspector overlay and navigation for mobile devices.
+*   **Tabbed Inspector:** Clean separation between Company Overview and Contact Management within the details pane.

-### 3. Web Scraping & Legal Data (v2.2)
-*   **Impressum Scraping:**
-    *   **2-Hop Strategy:** If no "Impressum" link is found on the landing page, the scraper automatically searches for a "Kontakt" page and checks for the link there.
-    *   **Root Fallback:** If deep links (e.g. `/about-us`) fail, the scraper checks the root domain (`/`).
-    *   **LLM Extraction:** Unstructured legal text is parsed by Gemini to extract structured JSON (Legal Name, Address, CEO, VAT ID).
-*   **Robustness:**
-    *   **JSON Cleaning:** A helper (`clean_json_response`) strips Markdown code blocks from LLM responses to prevent parsing errors.
-    *   **Schema Enforcement:** Added logic to handle inconsistent LLM responses (e.g., returning a list `[{...}]` instead of a flat object `{...}`).
+### 3. Advanced Configuration (Settings)
+*   **Industry Verticals:** Database-backed configuration for target industries (Description, Focus Flag, Primary Product).
+*   **Job Role Mapping:** Configurable patterns (Regex/Text) to map job titles on business cards to internal roles (e.g., "CTO" -> "Innovation Driver").
+*   **Robotics Categories:** Existing AI reasoning logic remains configurable via the UI.

-### 4. User Control & Ops
-*   **Manual Overrides:** Users can manually correct the Wikipedia URL (locking the data) and the Company Website (triggering a fresh re-scrape).
-*   **Polling UI:** The frontend uses intelligent polling to auto-refresh data when background jobs (Discovery/Analysis) complete.
-*   **Forced Refresh:** The "Analyze" endpoint now clears old cache data to ensure a fresh scrape on every user request.
+### 4. Robotics Potential Analysis (v2.3)
+*   **Chain-of-Thought Logic:** The AI analysis (`ClassificationService`) uses multi-step reasoning to evaluate physical infrastructure.
+*   **Provider vs. User:** Strict differentiation logic implemented.
+
+### 5. Web Scraping & Legal Data (v2.2)
+*   **Impressum Scraping:** 2-Hop Strategy and Root Fallback logic.
+*   **Manual Overrides:** Users can manually correct Wikipedia, Website, and Impressum URLs directly in the UI.
+
+## Lessons Learned & Best Practices

 ## Lessons Learned & Best Practices

@@ -74,5 +77,7 @@ The system architecture has evolved from a CLI-based toolset to a modern web app
    *   **Solution:** Write robust helper functions that can handle multiple possible JSON structures. A property object retrieved via a direct property endpoint (`/pages/{id}/properties/{prop_id}`) is structured differently from the same property when it's part of a full page object (`/pages/{id}`). The parsing logic must account for these variations.

 ## Next Steps
-*   **Quality Assurance:** Implement a dedicated "Review Mode" to validate high-potential leads.
-*   **Export:** Generate Excel/CSV enriched reports.
+*   **Marketing Automation:** Implement the actual sending logic (or export) based on the contact status.
+*   **Job Role Mapping Engine:** Connect the configured patterns to the contact import/creation process to auto-assign roles.
+*   **Industry Classification Engine:** Connect the configured industries to the AI Analysis prompt to enforce the "Strict Mode" mapping.
+*   **Export:** Generate Excel/CSV enriched reports (already partially implemented via JSON export).