feat(ce): upgrade to v0.5.0 with contacts management, advanced settings and ui modernization
This commit is contained in:
51
GEMINI.md
51
GEMINI.md
@@ -12,31 +12,34 @@ This project is a Python-based system for automated company data enrichment and
|
||||
|
||||
The system architecture has evolved from a CLI-based toolset to a modern web application (`company-explorer`) backed by Docker containers.
|
||||
|
||||
## Current Status (Jan 08, 2026) - Company Explorer (Robotics Edition v0.3.0)
|
||||
## Current Status (Jan 15, 2026) - Company Explorer (Robotics Edition v0.5.0)
|
||||
|
||||
### 1. Robotics Potential Analysis (v2.3)
|
||||
* **Chain-of-Thought Logic:** The AI analysis (`ClassificationService`) now uses a multi-step reasoning process to evaluate companies based on their **physical infrastructure** (factories, warehouses) rather than just keywords.
|
||||
* **Provider vs. User:** Strict logic implemented to distinguish between companies *selling* automation products and those *needing* them for their own operations.
|
||||
* **Configurable Settings:** A database-driven configuration (`RoboticsCategory`) allows users to edit the definition and scoring logic for each robotics category directly via the frontend settings menu.
|
||||
### 1. Contacts Management (v0.5)
|
||||
* **Full CRUD:** Integrated Contact Management system with direct editing capabilities.
|
||||
* **Global List View:** Dedicated view for all contacts across all companies with search and filter.
|
||||
* **Data Model:** Supports advanced fields like Academic Title, Role Interpretation (Decision Maker vs. User), and Marketing Automation Status.
|
||||
* **Bulk Import:** CSV-based bulk import for contacts that automatically creates missing companies and prevents duplicates via email matching.
|
||||
|
||||
### 2. Deep Wikipedia Integration (v2.1)
|
||||
* **Extraction:** The system extracts the first paragraph (cleaned of artifacts), industry, revenue (normalized to Mio €), employee count, and Wikipedia categories.
|
||||
* **Validation:** Uses a "Google-First" strategy via SerpAPI, validating candidates by checking for domain matches and city/HQ location in the article.
|
||||
* **UI:** The Inspector displays a dedicated Wikipedia profile section with visual tags.
|
||||
### 2. UI/UX Modernization
|
||||
* **Light/Dark Mode:** Full theme support with toggle.
|
||||
* **Grid Layout:** Unified card-based layout for both Company and Contact lists.
|
||||
* **Mobile Responsiveness:** Optimized Inspector overlay and navigation for mobile devices.
|
||||
* **Tabbed Inspector:** Clean separation between Company Overview and Contact Management within the details pane.
|
||||
|
||||
### 3. Web Scraping & Legal Data (v2.2)
|
||||
* **Impressum Scraping:**
|
||||
* **2-Hop Strategy:** If no "Impressum" link is found on the landing page, the scraper automatically searches for a "Kontakt" page and checks for the link there.
|
||||
* **Root Fallback:** If deep links (e.g. `/about-us`) fail, the scraper checks the root domain (`/`).
|
||||
* **LLM Extraction:** Unstructured legal text is parsed by Gemini to extract structured JSON (Legal Name, Address, CEO, VAT ID).
|
||||
* **Robustness:**
|
||||
* **JSON Cleaning:** A helper (`clean_json_response`) strips Markdown code blocks from LLM responses to prevent parsing errors.
|
||||
* **Schema Enforcement:** Added logic to handle inconsistent LLM responses (e.g., returning a list `[{...}]` instead of a flat object `{...}`).
|
||||
### 3. Advanced Configuration (Settings)
|
||||
* **Industry Verticals:** Database-backed configuration for target industries (Description, Focus Flag, Primary Product).
|
||||
* **Job Role Mapping:** Configurable patterns (Regex/Text) to map job titles on business cards to internal roles (e.g., "CTO" -> "Innovation Driver").
|
||||
* **Robotics Categories:** Existing AI reasoning logic remains configurable via the UI.
|
||||
|
||||
### 4. User Control & Ops
|
||||
* **Manual Overrides:** Users can manually correct the Wikipedia URL (locking the data) and the Company Website (triggering a fresh re-scrape).
|
||||
* **Polling UI:** The frontend uses intelligent polling to auto-refresh data when background jobs (Discovery/Analysis) complete.
|
||||
* **Forced Refresh:** The "Analyze" endpoint now clears old cache data to ensure a fresh scrape on every user request.
|
||||
### 4. Robotics Potential Analysis (v2.3)
|
||||
* **Chain-of-Thought Logic:** The AI analysis (`ClassificationService`) uses multi-step reasoning to evaluate physical infrastructure.
|
||||
* **Provider vs. User:** Strict differentiation logic implemented.
|
||||
|
||||
### 5. Web Scraping & Legal Data (v2.2)
|
||||
* **Impressum Scraping:** 2-Hop Strategy and Root Fallback logic.
|
||||
* **Manual Overrides:** Users can manually correct Wikipedia, Website, and Impressum URLs directly in the UI.
|
||||
|
||||
## Lessons Learned & Best Practices
|
||||
|
||||
## Lessons Learned & Best Practices
|
||||
|
||||
@@ -74,5 +77,7 @@ The system architecture has evolved from a CLI-based toolset to a modern web app
|
||||
* **Solution:** Write robust helper functions that can handle multiple possible JSON structures. A property object retrieved via a direct property endpoint (`/pages/{id}/properties/{prop_id}`) is structured differently from the same property when it's part of a full page object (`/pages/{id}`). The parsing logic must account for these variations.
|
||||
|
||||
## Next Steps
|
||||
* **Quality Assurance:** Implement a dedicated "Review Mode" to validate high-potential leads.
|
||||
* **Export:** Generate Excel/CSV enriched reports.
|
||||
* **Marketing Automation:** Implement the actual sending logic (or export) based on the contact status.
|
||||
* **Job Role Mapping Engine:** Connect the configured patterns to the contact import/creation process to auto-assign roles.
|
||||
* **Industry Classification Engine:** Connect the configured industries to the AI Analysis prompt to enforce the "Strict Mode" mapping.
|
||||
* **Export:** Generate Excel/CSV enriched reports (already partially implemented via JSON export).
|
||||
|
||||
Reference in New Issue
Block a user