From 6ad948e522a70e185f2d39a86d22a8209782e06f Mon Sep 17 00:00:00 2001 From: Floke Date: Mon, 19 Jan 2026 07:44:23 +0000 Subject: [PATCH] docs: Add core analysis prompts to migration plan --- MIGRATION_PLAN.md | 86 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) diff --git a/MIGRATION_PLAN.md b/MIGRATION_PLAN.md index 55638b16..8bc06231 100644 --- a/MIGRATION_PLAN.md +++ b/MIGRATION_PLAN.md @@ -221,3 +221,89 @@ Contacts stehen in 1:n Beziehung zu Accounts. Accounts können einen "Primary Co * Implementierung von Markdown-Cleaning (Stripping von Code-Blocks). * Prompt-Optimierung für tabellarische Markdown-Ausgaben in Phase 5. * Markdown-File Import Feature. + +## 8. Eingesetzte Prompts (Account-Analyse) + +Dieser Abschnitt dokumentiert die Prompts, die im Backend des **Company Explorers** zur automatisierten Analyse von Unternehmensdaten eingesetzt werden. + +### 8.1 Impressum Extraktion (aus `services/scraping.py`) + +Dient der Extraktion strukturierter Stammdaten aus dem rohen Text der Impressums-Seite. + +**Prompt:** + +```python +prompt = f""" +Extract the official company details from this German 'Impressum' text. +Return JSON ONLY. Keys: 'legal_name', 'street', 'zip', 'city', 'country_code', 'email', 'phone', 'ceo_name', 'vat_id'. +'country_code' should be the two-letter ISO code (e.g., "DE", "CH", "AT"). +If a field is missing, use null. + +Text: +{raw_text} +""" +``` + +**Variablen:** +* **`raw_text`**: Der bereinigte HTML-Text der gefundenen Impressums-URL (max. 10.000 Zeichen). + +--- + +### 8.2 Robotics Potential Analyse (aus `services/classification.py`) + +Der Kern-Prompt zur Bewertung des Automatisierungspotenzials. Er fasst das Geschäftsmodell zusammen, prüft auf physische Infrastruktur und bewertet spezifische Robotik-Anwendungsfälle. + +**Prompt:** + +```python +prompt = f""" +You are a Senior B2B Market Analyst for 'Roboplanet', a specialized robotics distributor. +Your task is to analyze the target company based on their website text and create a concise **Dossier**. + +--- TARGET COMPANY --- +Name: {company_name} +Website Content (Excerpt): +{website_text[:20000]} + +--- ALLOWED INDUSTRIES (STRICT) --- +You MUST assign the company to exactly ONE of these industries. If unsure, choose the closest match or "Sonstige". +{json.dumps(self.allowed_industries, ensure_ascii=False)} + +--- ANALYSIS PART 1: BUSINESS MODEL --- +1. Identify the core products/services. +2. Summarize in 2-3 German sentences: What do they do and for whom? (Target: "business_model") + +--- ANALYSIS PART 2: INFRASTRUCTURE & POTENTIAL (Chain of Thought) --- +1. **Infrastructure Scan:** Look for evidence of physical assets like *Factories, Large Warehouses, Production Lines, Campuses, Hospitals*. +2. **Provider vs. User Check:** + - Does the company USE this infrastructure (Potential Customer)? + - Or do they SELL products for it (Competitor/Partner)? + - *Example:* "Cleaning" -> Do they sell soap (Provider) or do they have a 50,000sqm factory (User)? +3. **Evidence Extraction:** Extract 1-2 key sentences from the text proving this infrastructure. (Target: "infrastructure_evidence") + +--- ANALYSIS PART 3: SCORING (0-100) --- +Based on the identified infrastructure, score the potential for these categories: + +{category_guidance} + +--- OUTPUT FORMAT (JSON ONLY) --- +{{ + "industry": "String (from list)", + "business_model": "2-3 sentences summary (German)", + "infrastructure_evidence": "1-2 key sentences proving physical assets (German)", + "potentials": {{ + "cleaning": {{ "score": 0-100, "reason": "Reasoning based on infrastructure." }},\ + "transport": {{ "score": 0-100, "reason": "Reasoning based on logistics volume." }},\ + "security": {{ "score": 0-100, "reason": "Reasoning based on perimeter/assets." }},\ + "service": {{ "score": 0-100, "reason": "Reasoning based on guest interaction." }}\ + }}\ +}} +""" +``` + +**Variablen:** +* **`company_name`**: Name des Unternehmens. +* **`website_text`**: Der gescrapte Text der Hauptseite (max. 20.000 Zeichen). +* **`allowed_industries`**: JSON-Liste der erlaubten Branchen (Strict Mode). +* **`category_guidance`**: Dynamisch generierte Definitionen und Scoring-Regeln für die Robotik-Kategorien (aus der Datenbank). +