feat(company-explorer): add impressum scraping, robust json parsing, and enhanced ui polling
- Implemented Impressum scraping with Root-URL fallback and enhanced keyword detection. - Added 'clean_json_response' helper to strip Markdown from LLM outputs, preventing JSONDecodeErrors. - Improved numeric extraction for German formatting (thousands separators vs decimals). - Updated Inspector UI with Polling logic for auto-refresh and display of AI Dossier and Legal Data. - Added Manual Override for Website URL.
This commit is contained in:
@@ -2,7 +2,7 @@ import json
|
||||
import logging
|
||||
import os
|
||||
from typing import Dict, Any, List
|
||||
from ..lib.core_utils import call_gemini
|
||||
from ..lib.core_utils import call_gemini, clean_json_response
|
||||
from ..config import settings
|
||||
from ..database import SessionLocal, RoboticsCategory
|
||||
|
||||
@@ -55,7 +55,7 @@ class ClassificationService:
|
||||
|
||||
prompt = f"""
|
||||
You are a Senior B2B Market Analyst for 'Roboplanet', a specialized robotics distributor.
|
||||
Your task is to analyze a target company based on their website text to determine their **operational need** for service robotics.
|
||||
Your task is to analyze the target company based on their website text and create a concise **Dossier**.
|
||||
|
||||
--- TARGET COMPANY ---
|
||||
Name: {company_name}
|
||||
@@ -66,36 +66,33 @@ class ClassificationService:
|
||||
You MUST assign the company to exactly ONE of these industries. If unsure, choose the closest match or "Sonstige".
|
||||
{json.dumps(self.allowed_industries, ensure_ascii=False)}
|
||||
|
||||
--- ANALYSIS GUIDELINES (CHAIN OF THOUGHT) ---
|
||||
1. **Infrastructure Analysis:** What physical assets does this company likely operate based on their business model?
|
||||
- Factories / Production Plants? (-> Needs Cleaning, Security, Intralogistics)
|
||||
- Large Warehouses? (-> Needs Intralogistics, Security, Floor Washing)
|
||||
- Offices / Headquarters? (-> Needs Vacuuming, Window Cleaning)
|
||||
- Critical Infrastructure (Solar Parks, Wind Farms)? (-> Needs Perimeter Security, Inspection)
|
||||
- Hotels / Hospitals? (-> Needs Service, Cleaning, Transport)
|
||||
|
||||
2. **Provider vs. User Distinction (CRITICAL):**
|
||||
- If a company SELLS cleaning products (e.g., 3M, Henkel), they do NOT necessarily have a higher need for cleaning robots than any other manufacturer. Do not score them high just because the word "cleaning" appears. Score them based on their *factories*.
|
||||
- If a company SELLS security services, they might be a potential PARTNER, but check if they *manage* sites.
|
||||
|
||||
3. **Scale Assessment:**
|
||||
- 5 locations implies more need than 1.
|
||||
- "Global player" implies large facilities.
|
||||
--- ANALYSIS PART 1: BUSINESS MODEL ---
|
||||
1. Identify the core products/services.
|
||||
2. Summarize in 2-3 German sentences: What do they do and for whom? (Target: "business_model")
|
||||
|
||||
--- SCORING CATEGORIES (0-100) ---
|
||||
Based on the current strategic focus of Roboplanet:
|
||||
--- ANALYSIS PART 2: INFRASTRUCTURE & POTENTIAL (Chain of Thought) ---
|
||||
1. **Infrastructure Scan:** Look for evidence of physical assets like *Factories, Large Warehouses, Production Lines, Campuses, Hospitals*.
|
||||
2. **Provider vs. User Check:**
|
||||
- Does the company USE this infrastructure (Potential Customer)?
|
||||
- Or do they SELL products for it (Competitor/Partner)?
|
||||
- *Example:* "Cleaning" -> Do they sell soap (Provider) or do they have a 50,000sqm factory (User)?
|
||||
3. **Evidence Extraction:** Extract 1-2 key sentences from the text proving this infrastructure. (Target: "infrastructure_evidence")
|
||||
|
||||
--- ANALYSIS PART 3: SCORING (0-100) ---
|
||||
Based on the identified infrastructure, score the potential for these categories:
|
||||
|
||||
{category_guidance}
|
||||
|
||||
--- OUTPUT FORMAT (JSON ONLY) ---
|
||||
{{
|
||||
"industry": "String (from list)",
|
||||
"summary": "Concise analysis of their infrastructure and business model (German)",
|
||||
"business_model": "2-3 sentences summary (German)",
|
||||
"infrastructure_evidence": "1-2 key sentences proving physical assets (German)",
|
||||
"potentials": {{
|
||||
"cleaning": {{ "score": 0-100, "reason": "Specific reasoning based on infrastructure (e.g. 'Operates 5 production plants in DE')." }},
|
||||
"transport": {{ "score": 0-100, "reason": "..." }},
|
||||
"security": {{ "score": 0-100, "reason": "..." }},
|
||||
"service": {{ "score": 0-100, "reason": "..." }}
|
||||
"cleaning": {{ "score": 0-100, "reason": "Reasoning based on infrastructure." }},
|
||||
"transport": {{ "score": 0-100, "reason": "Reasoning based on logistics volume." }},
|
||||
"security": {{ "score": 0-100, "reason": "Reasoning based on perimeter/assets." }},
|
||||
"service": {{ "score": 0-100, "reason": "Reasoning based on guest interaction." }}
|
||||
}}
|
||||
}}
|
||||
"""
|
||||
@@ -106,7 +103,7 @@ class ClassificationService:
|
||||
json_mode=True,
|
||||
temperature=0.1 # Very low temp for analytical reasoning
|
||||
)
|
||||
return json.loads(response_text)
|
||||
return json.loads(clean_json_response(response_text))
|
||||
except Exception as e:
|
||||
logger.error(f"Classification failed: {e}")
|
||||
return {"error": str(e)}
|
||||
|
||||
Reference in New Issue
Block a user