feat(company-explorer): add wikipedia integration, robotics settings, and manual overrides
- Ported robust Wikipedia extraction logic (categories, first paragraph) from legacy system. - Implemented database-driven Robotics Category configuration with frontend settings UI. - Updated Robotics Potential analysis to use Chain-of-Thought infrastructure reasoning. - Added Manual Override features for Wikipedia URL (with locking) and Website URL (with re-scrape trigger). - Enhanced Inspector UI with Wikipedia profile, category tags, and action buttons.
This commit is contained in:
69
GEMINI.md
69
GEMINI.md
@@ -15,6 +15,7 @@ The system is modular and consists of the following key components:
|
||||
* **`company_deduplicator.py`:** A module for intelligent duplicate checking, both for external lists and internal CRM data.
|
||||
* **`generate_marketing_text.py`:** An engine for creating personalized marketing texts.
|
||||
* **`app.py`:** A Flask application that provides an API to run the different modules.
|
||||
* **`company-explorer/`:** A new React/FastAPI-based application (v2.x) replacing the legacy CLI tools. It focuses on identifying robotics potential in companies.
|
||||
|
||||
## Git Workflow & Conventions
|
||||
|
||||
@@ -23,61 +24,27 @@ The system is modular and consists of the following key components:
|
||||
- Beschreibung: Detaillierte Änderungen als Liste mit `- ` am Zeilenanfang (keine Bulletpoints).
|
||||
- **Datei-Umbenennungen:** Um die Git-Historie einer Datei zu erhalten, muss sie zwingend mit `git mv alter_name.py neuer_name.py` umbenannt werden.
|
||||
- **Commit & Push Prozess:** Änderungen werden zuerst lokal committet. Das Pushen auf den Remote-Server erfolgt erst nach expliziter Bestätigung durch Sie.
|
||||
- **Anzeige der Historie:** Web-Oberflächen wie Gitea zeigen die Historie einer umbenannten Datei möglicherweise nicht vollständig an. Die korrekte und vollständige Historie kann auf der Kommandozeile mit `git log --follow <dateiname>` eingesehen werden.
|
||||
|
||||
## Building and Running
|
||||
## Current Status (Jan 08, 2026) - Company Explorer (Robotics Edition)
|
||||
|
||||
The project is designed to be run in a Docker container. The `Dockerfile` contains the instructions to build the container.
|
||||
* **Robotics Potential Analysis (v2.3):**
|
||||
* **Logic Overhaul:** Switched from keyword-based scanning to a **"Chain-of-Thought" Infrastructure Analysis**. The AI now evaluates physical assets (factories, warehouses, solar parks) to determine robotics needs.
|
||||
* **Provider vs. User:** Implemented strict reasoning to distinguish between companies *selling* cleaning products (providers) and those *operating* factories (users/potential clients).
|
||||
* **Configurable Logic:** Added a database-backed configuration system for robotics categories (`cleaning`, `transport`, `security`, `service`). Users can now define the "Trigger Logic" and "Scoring Guide" directly in the frontend settings.
|
||||
|
||||
**To build the Docker container:**
|
||||
* **Wikipedia Integration (v2.1):**
|
||||
* **Deep Extraction:** Implemented the "Legacy" extraction logic (`WikipediaService`). It now pulls the **first paragraph** (cleaned of references), **categories** (filtered for relevance), revenue, employees, and HQ location.
|
||||
* **Google-First Discovery:** Uses SerpAPI to find the correct Wikipedia article, validating via domain match and city.
|
||||
* **Visual Inspector:** The frontend `Inspector` now displays a comprehensive Wikipedia profile including category tags.
|
||||
|
||||
```bash
|
||||
docker build -t company-enrichment .
|
||||
```
|
||||
* **Manual Overrides & Control:**
|
||||
* **Wikipedia Override:** Added a UI to manually correct the Wikipedia URL. This triggers a re-scan and **locks** the record (`is_locked` flag) to prevent auto-overwrite.
|
||||
* **Website Override:** Added a UI to manually correct the company website. This automatically clears old scraping data to force a fresh analysis on the next run.
|
||||
|
||||
**To run the Docker container:**
|
||||
|
||||
```bash
|
||||
docker run -p 8080:8080 company-enrichment
|
||||
```
|
||||
|
||||
The application will be available at `http://localhost:8080`.
|
||||
|
||||
## Development Conventions
|
||||
|
||||
* **Configuration:** The project uses a `config.py` file to manage configuration settings.
|
||||
* **Dependencies:** Python dependencies are listed in the `requirements.txt` file.
|
||||
* **Modularity:** The code is modular and well-structured, with helper functions and classes to handle specific tasks.
|
||||
* **API:** The Flask application in `app.py` provides an API to interact with the system.
|
||||
* **Logging:** The project uses the `logging` module to log information and errors.
|
||||
* **Error Handling:** The `readme.md` indicates a critical error related to the `openai` library. The next step is to downgrade the library to a compatible version.
|
||||
|
||||
## Current Status (Jan 05, 2026) - GTM & Market Intel Fixes
|
||||
|
||||
* **GTM Architect (v2.4) - UI/UX Refinement:**
|
||||
* **Corporate Design Integration:** A central, customizable `CORPORATE_DESIGN_PROMPT` was introduced in `config.py` to ensure all generated images strictly follow a "clean, professional, photorealistic" B2B style, avoiding comic aesthetics.
|
||||
* **Aspect Ratio Control:** Implemented user-selectable aspect ratios (16:9, 9:16, 1:1, 4:3) in the frontend (Phase 6), passing through to the Google Imagen/Gemini 2.5 API.
|
||||
* **Frontend Fix:** Resolved a double-declaration bug in `App.tsx` that prevented the build.
|
||||
|
||||
* **Market Intelligence Tool (v1.2) - Backend Hardening:**
|
||||
* **"Failed to fetch" Resolved:** Fixed a critical Nginx routing issue by forcing the frontend to use relative API paths (`./api`) instead of absolute ports, ensuring requests correctly pass through the reverse proxy in Docker.
|
||||
* **Large Payload Fix:** Increased `client_max_body_size` to 50M in both Nginx configurations (`nginx-proxy.conf` and frontend `nginx.conf`) to prevent 413 Errors when uploading large knowledge base files during campaign generation.
|
||||
* **JSON Stability:** The Python Orchestrator and Node.js bridge were hardened against invalid JSON output. The system now robustly handles stdout noise and logs full raw output to `/app/Log/server_dump.txt` in case of errors.
|
||||
* **Language Support:** Implemented a `--language` flag. The tool now correctly respects the frontend language selection (defaulting to German) and forces the LLM to output German text for signals, ICPs, and outreach campaigns.
|
||||
* **Logging:** Fixed log volume mounting paths to ensure debug logs are persisted and accessible.
|
||||
|
||||
## Current Status (Jan 2026) - GTM Architect & Core Updates
|
||||
|
||||
* **GTM Architect (v2.2) - FULLY OPERATIONAL:**
|
||||
* **Image Generation Fixed:** Successfully implemented a hybrid image generation pipeline.
|
||||
* **Text-to-Image:** Uses `imagen-4.0-generate-001` for generic scenes.
|
||||
* **Image-to-Image:** Uses `gemini-2.5-flash-image` with reference image upload for product-consistent visuals.
|
||||
* **Prompt Engineering:** Strict prompts ensure the product design remains unaltered.
|
||||
* **Library Upgrade:** Migrated core AI logic to `google-genai` (v1.x) to resolve deprecation warnings and access newer models. `Pillow` added for image processing.
|
||||
* **Model Update:** Switched text generation to `gemini-2.0-flash` due to regional unavailability of 1.5.
|
||||
* **Frontend Stability:** Fixed a critical React crash in Phase 3 by handling object-based role descriptions robustly.
|
||||
* **Infrastructure:** Updated Docker configurations (`gtm-architect/requirements.txt`) to support new dependencies.
|
||||
* **Architecture & DB:**
|
||||
* **Database:** Updated `companies_v3_final.db` schema to include `RoboticsCategory` and `EnrichmentData.is_locked`.
|
||||
* **Services:** Refactored `ClassificationService` and `DiscoveryService` for better modularity and robustness.
|
||||
|
||||
## Next Steps
|
||||
* **Monitor Logs:** Check `Log_from_docker/` for detailed execution traces of the GTM Architect.
|
||||
* **Feedback Loop:** Verify the quality of the generated GTM strategies and adjust prompts in `gtm_architect_orchestrator.py` if necessary.
|
||||
* **Quality Assurance:** Implement a dedicated "Review Mode" to validate high-potential leads.
|
||||
* **Data Import:** Finalize the "List Matcher" to import and deduplicate Excel lists against the new DB.
|
||||
|
||||
Reference in New Issue
Block a user