feat(ce): upgrade to v0.5.0 with contacts management, advanced settings and ui modernization

This commit is contained in:
2026-01-15 09:23:58 +00:00
parent 2b7c072ddc
commit 4b815c6510
16 changed files with 2794 additions and 828 deletions

View File

@@ -12,31 +12,34 @@ This project is a Python-based system for automated company data enrichment and
The system architecture has evolved from a CLI-based toolset to a modern web application (`company-explorer`) backed by Docker containers.
## Current Status (Jan 08, 2026) - Company Explorer (Robotics Edition v0.3.0)
## Current Status (Jan 15, 2026) - Company Explorer (Robotics Edition v0.5.0)
### 1. Robotics Potential Analysis (v2.3)
* **Chain-of-Thought Logic:** The AI analysis (`ClassificationService`) now uses a multi-step reasoning process to evaluate companies based on their **physical infrastructure** (factories, warehouses) rather than just keywords.
* **Provider vs. User:** Strict logic implemented to distinguish between companies *selling* automation products and those *needing* them for their own operations.
* **Configurable Settings:** A database-driven configuration (`RoboticsCategory`) allows users to edit the definition and scoring logic for each robotics category directly via the frontend settings menu.
### 1. Contacts Management (v0.5)
* **Full CRUD:** Integrated Contact Management system with direct editing capabilities.
* **Global List View:** Dedicated view for all contacts across all companies with search and filter.
* **Data Model:** Supports advanced fields like Academic Title, Role Interpretation (Decision Maker vs. User), and Marketing Automation Status.
* **Bulk Import:** CSV-based bulk import for contacts that automatically creates missing companies and prevents duplicates via email matching.
### 2. Deep Wikipedia Integration (v2.1)
* **Extraction:** The system extracts the first paragraph (cleaned of artifacts), industry, revenue (normalized to Mio €), employee count, and Wikipedia categories.
* **Validation:** Uses a "Google-First" strategy via SerpAPI, validating candidates by checking for domain matches and city/HQ location in the article.
* **UI:** The Inspector displays a dedicated Wikipedia profile section with visual tags.
### 2. UI/UX Modernization
* **Light/Dark Mode:** Full theme support with toggle.
* **Grid Layout:** Unified card-based layout for both Company and Contact lists.
* **Mobile Responsiveness:** Optimized Inspector overlay and navigation for mobile devices.
* **Tabbed Inspector:** Clean separation between Company Overview and Contact Management within the details pane.
### 3. Web Scraping & Legal Data (v2.2)
* **Impressum Scraping:**
* **2-Hop Strategy:** If no "Impressum" link is found on the landing page, the scraper automatically searches for a "Kontakt" page and checks for the link there.
* **Root Fallback:** If deep links (e.g. `/about-us`) fail, the scraper checks the root domain (`/`).
* **LLM Extraction:** Unstructured legal text is parsed by Gemini to extract structured JSON (Legal Name, Address, CEO, VAT ID).
* **Robustness:**
* **JSON Cleaning:** A helper (`clean_json_response`) strips Markdown code blocks from LLM responses to prevent parsing errors.
* **Schema Enforcement:** Added logic to handle inconsistent LLM responses (e.g., returning a list `[{...}]` instead of a flat object `{...}`).
### 3. Advanced Configuration (Settings)
* **Industry Verticals:** Database-backed configuration for target industries (Description, Focus Flag, Primary Product).
* **Job Role Mapping:** Configurable patterns (Regex/Text) to map job titles on business cards to internal roles (e.g., "CTO" -> "Innovation Driver").
* **Robotics Categories:** Existing AI reasoning logic remains configurable via the UI.
### 4. User Control & Ops
* **Manual Overrides:** Users can manually correct the Wikipedia URL (locking the data) and the Company Website (triggering a fresh re-scrape).
* **Polling UI:** The frontend uses intelligent polling to auto-refresh data when background jobs (Discovery/Analysis) complete.
* **Forced Refresh:** The "Analyze" endpoint now clears old cache data to ensure a fresh scrape on every user request.
### 4. Robotics Potential Analysis (v2.3)
* **Chain-of-Thought Logic:** The AI analysis (`ClassificationService`) uses multi-step reasoning to evaluate physical infrastructure.
* **Provider vs. User:** Strict differentiation logic implemented.
### 5. Web Scraping & Legal Data (v2.2)
* **Impressum Scraping:** 2-Hop Strategy and Root Fallback logic.
* **Manual Overrides:** Users can manually correct Wikipedia, Website, and Impressum URLs directly in the UI.
## Lessons Learned & Best Practices
## Lessons Learned & Best Practices
@@ -74,5 +77,7 @@ The system architecture has evolved from a CLI-based toolset to a modern web app
* **Solution:** Write robust helper functions that can handle multiple possible JSON structures. A property object retrieved via a direct property endpoint (`/pages/{id}/properties/{prop_id}`) is structured differently from the same property when it's part of a full page object (`/pages/{id}`). The parsing logic must account for these variations.
## Next Steps
* **Quality Assurance:** Implement a dedicated "Review Mode" to validate high-potential leads.
* **Export:** Generate Excel/CSV enriched reports.
* **Marketing Automation:** Implement the actual sending logic (or export) based on the contact status.
* **Job Role Mapping Engine:** Connect the configured patterns to the contact import/creation process to auto-assign roles.
* **Industry Classification Engine:** Connect the configured industries to the AI Analysis prompt to enforce the "Strict Mode" mapping.
* **Export:** Generate Excel/CSV enriched reports (already partially implemented via JSON export).