Fix GTM Architect: Robust prompt syntax, Gemini API migration, Docker logging

2026-01-01 22:39:43 +00:00
parent 05753edbb1
commit 1fe17d88bc
6 changed files with 598 additions and 1651 deletions
--- a/BUILDER_APPS_MIGRATION.md
+++ b/BUILDER_APPS_MIGRATION.md
@@ -233,3 +233,21 @@ Achtung beim Routing. Wenn die App unter `/app/` laufen soll, muss der Trailing
 - [ ] `docker-compose.yml` mountet auch `helpers.py` und `config.py`?
 - [ ] Leere `.db` Datei auf dem Host erstellt?
 - [ ] Dockerfile nutzt Multi-Stage Build?
 ---
 ## Appendix A: GTM Architect Fixes & Gemini Migration (Jan 2026)
 ### A.1 Problemstellung
 - **SyntaxError bei großen Prompts:** Python-Parser (3.11) hatte massive Probleme mit f-Strings, die 100+ Zeilen lang waren und Sonderzeichen enthielten.
 - **Library Deprecation:** `google.generativeai` hat Support eingestellt? Nein, aber die Fehlermeldung im Log deutete auf einen Konflikt zwischen alten `openai`-Wrappern und neuen Gemini-Paketen hin.
 - **Lösung:** 
    1.  **Prompts ausgelagert:** System-Prompts liegen jetzt in `gtm_prompts.json` und werden zur Laufzeit geladen. Kein Code-Parsing mehr notwendig.
    2.  **Native Gemini Lib:** Statt OpenAI-Wrapper nutzen wir jetzt `google.generativeai` direkt via `helpers.call_gemini_flash`.
    3.  **Config:** `gtm-architect/Dockerfile` kopiert nun explizit `gtm_prompts.json`.
 ### A.2 Neuer Standard für KI-Apps
 Für zukünftige Apps gilt:
 1.  **Prompts in JSON/Text-Files:** Niemals riesige Strings im Python-Code hardcoden.
 2.  **`helpers.call_gemini_flash` nutzen:** Diese Funktion ist nun der Gold-Standard für einfache, stateless Calls.
 3.  **JSON im Dockerfile:** Vergesst nicht, die externen Prompt-Files mit `COPY` in den Container zu holen!
--- a/GEMINI.md
+++ b/GEMINI.md
@@ -51,3 +51,17 @@ The application will be available at `http://localhost:8080`.
 *   **API:** The Flask application in `app.py` provides an API to interact with the system.
 *   **Logging:** The project uses the `logging` module to log information and errors.
 *   **Error Handling:** The `readme.md` indicates a critical error related to the `openai` library. The next step is to downgrade the library to a compatible version.
 ## Current Status (Jan 2026) - GTM Architect & Core Updates
 *   **GTM Architect Fixed & Stabilized:** The "GTM Architect" service is now fully operational.
    *   **Prompt Engineering:** Switched from fragile f-strings to robust `"\n".join([...])` list construction for all system prompts. This eliminates syntax errors caused by mixed quotes and multi-line strings in Python.
    *   **AI Backend:** Migrated to `google.generativeai` via `helpers.call_gemini_flash` with JSON mode support.
    *   **Scraping:** Implemented robust URL scraping using `requests` and `BeautifulSoup` (with `html.parser` to avoid dependency issues).
    *   **Docker:** Container (`gtm-app`) updated with correct volume mounts for logs and config.
 *   **Gemini Migration:** The codebase is moving towards `google-generativeai` as the primary driver. `helpers.py` now supports this natively.
 *   **Deployment:** To apply these fixes, a rebuild of the `gtm-app` container is required.
 ## Next Steps
 *   **Monitor Logs:** Check `Log_from_docker/` for detailed execution traces of the GTM Architect.
 *   **Feedback Loop:** Verify the quality of the generated GTM strategies and adjust prompts in `gtm_architect_orchestrator.py` if necessary.
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -104,6 +104,9 @@ services:
      - ./gtm_projects.db:/app/gtm_projects.db
      # Mount the API key to the location expected by config.py
      - ./gemini_api_key.txt:/app/api_key.txt
      # Mount Logs
      - ./Log_from_docker:/app/Log_from_docker
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
-      - SERPAPI_API_KEY=${SERPAPI_API_KEY}
+      - SERPAPI_API_KEY=${SERPAPI_API_KEY}
      - PYTHONUNBUFFERED=1
--- a/gtm-architect/Dockerfile
+++ b/gtm-architect/Dockerfile
@@ -33,6 +33,9 @@ RUN apt-get update && \
 COPY gtm-architect/requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
 # Google Generative AI Library installieren
 RUN pip install --no-cache-dir google-generativeai
 # Copy the Node.js server script and package.json for runtime deps
 COPY gtm-architect/server.cjs .
 COPY gtm-architect/package.json .
@@ -45,6 +48,7 @@ COPY --from=frontend-builder /app/dist ./dist
 # Copy the Python Orchestrator and shared modules
 COPY gtm_architect_orchestrator.py .
 COPY gtm_prompts.json .
 COPY helpers.py .
 COPY config.py .
 COPY market_db_manager.py .
@@ -56,4 +60,4 @@ COPY market_db_manager.py .
 EXPOSE 3005
 # Start the server
-CMD ["node", "server.cjs"]
+CMD ["node", "server.cjs"]
--- a/gtm_architect_orchestrator.py
+++ b/gtm_architect_orchestrator.py
@@ -1,259 +1,431 @@
 import os
 import sys
 import json
 import argparse
-from pathlib import Path
+import json
 import logging
 import re
 import sys
 import os
 import requests
 from bs4 import BeautifulSoup
 from datetime import datetime
 from config import Config
-# Add project root to Python path
+# Append the current directory to sys.path
-project_root = Path(__file__).resolve().parents[1]
+sys.path.append(os.path.dirname(os.path.abspath(__file__)))
 sys.path.append(str(project_root))
-from helpers import call_openai_chat
+from helpers import call_gemini_flash
 import market_db_manager
-print("--- PYTHON ORCHESTRATOR V_20260101_1200_FINAL ---", file=sys.stderr)
+# Configure logging to file
 LOG_DIR = "Log_from_docker"
 if not os.path.exists(LOG_DIR):
    os.makedirs(LOG_DIR)
-__version__ = "1.4.0_UltimatePromptFix"
+timestamp = datetime.now().strftime("%Y-%m-%d")
 log_file = os.path.join(LOG_DIR, f"{timestamp}_gtm_architect.log")
-# Ensure DB is ready
+logging.basicConfig(
-market_db_manager.init_db()
+    level=logging.DEBUG,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler(log_file, mode='a', encoding='utf-8'),
        logging.StreamHandler(sys.stderr)
    ]
 )
 def log_to_stderr(msg):
    sys.stderr.write(f"[GTM-ORCHESTRATOR] {msg}\n")
    sys.stderr.flush()
 # --- SCRAPING HELPER ---
 def get_text_from_url(url):
    try:
        log_to_stderr(f"Scraping URL: {url}")
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}
        response = requests.get(url, headers=headers, timeout=15)
        response.raise_for_status()
        # Using html.parser
        soup = BeautifulSoup(response.content, 'html.parser')
        # Remove noise
        for element in soup(['script', 'style', 'noscript', 'iframe', 'svg', 'header', 'footer', 'nav', 'aside']):
            element.decompose()
        # Get text
        text = soup.get_text(separator=' ', strip=True)
        log_to_stderr(f"Scraping success. Length: {len(text)}")
        return text[:30000] # Limit length
    except Exception as e:
        log_to_stderr(f"Scraping failed: {e}")
        logging.warning(f"Could not scrape URL {url}: {e}")
        return ""
 # --- SYSTEM PROMPTS (Constructed reliably) ---
 def get_system_instruction(lang):
    if lang == 'de':
-        return """
+        return "\n".join([
-# IDENTITY & PURPOSE
+            "# IDENTITY & PURPOSE",
-Du bist die "GTM Architect Engine" für Roboplanet. Deine Aufgabe ist es, für neue technische Produkte (Roboter) eine präzise Go-to-Market-Strategie zu entwickeln.
+            'Du bist die "GTM Architect Engine" für Roboplanet. Deine Aufgabe ist es, für neue technische Produkte (Roboter) eine präzise Go-to-Market-Strategie zu entwickeln.',
-Du handelst nicht als kreativer Werbetexter, sondern als strategischer Analyst. Dein oberstes Ziel ist Product-Market-Fit und operative Umsetzbarkeit.
+            "Du handelst nicht als kreativer Werbetexter, sondern als strategischer Analyst. Dein oberstes Ziel ist Product-Market-Fit und operative Umsetzbarkeit.",
-Antworte IMMER auf DEUTSCH.
+            "Antworte IMMER auf DEUTSCH.",
-
+            "",
-# CONTEXT: THE PARENT COMPANY (WACKLER)
+            "# CONTEXT: THE PARENT COMPANY (WACKLER)",
-Wir sind Teil der Wackler Group, einem großen Facility-Management-Dienstleister.
+            "Wir sind Teil der Wackler Group, einem großen Facility-Management-Dienstleister.",
-Unsere Strategie ist NICHT "Roboter ersetzen Menschen", sondern "Hybrid-Reinigung":
+            'Unsere Strategie ist NICHT "Roboter ersetzen Menschen", sondern "Hybrid-Reinigung":',
- 80% der Arbeit (monotone Flächenleistung) = Roboter.
+            "- 80% der Arbeit (monotone Flächenleistung) = Roboter.",
- 20% der Arbeit (Edge Cases, Winterdienst, Treppen, Grobschmutz) = Manuelle Reinigung durch Wackler.
+            "- 20% der Arbeit (Edge Cases, Winterdienst, Treppen, Grobschmutz) = Manuelle Reinigung durch Wackler.",
-
+            "",
-# STRICT ANALYSIS RULES (MUST FOLLOW):
+            "# STRICT ANALYSIS RULES (MUST FOLLOW):",
-1. TECHNICAL FACT-CHECK (Keine Halluzinationen):
+            "1. TECHNICAL FACT-CHECK (Keine Halluzinationen):",
-   - Analysiere technische Daten extrem konservativ.
+            "   - Analysiere technische Daten extrem konservativ.",
-   - Vakuumsystem = Kein "Winterdienst" (Schnee) und keine "Schwerindustrie" (Metallspäne), außer explizit genannt.
+            '   - Vakuumsystem = Kein "Winterdienst" (Schnee) und keine "Schwerindustrie" (Metallspäne), außer explizit genannt.',
-   - Erfinde keine Features, nur um eine Zielgruppe passend zu machen.
+            "   - Erfinde keine Features, nur um eine Zielgruppe passend zu machen.",
-   
+            "   ",
-2. REGULATORY LOGIC (StVO-Check):
+            "2. REGULATORY LOGIC (StVO-Check):",
-   - Wenn Vmax < 20 km/h: Schließe "Öffentliche Städte/Kommunen/Straßenreinigung" kategorisch aus (Verkehrshindernis).
+            '   - Wenn Vmax < 20 km/h: Schließe "Öffentliche Städte/Kommunen/Straßenreinigung" kategorisch aus (Verkehrshindernis).',
-   - Fokusänderung: Konzentriere dich stattdessen ausschließlich auf "Große, zusammenhängende Privatflächen" (Gated Areas).
+            '   - Fokusänderung: Konzentriere dich stattdessen ausschließlich auf "Große, zusammenhängende Privatflächen" (Gated Areas).',
-
+            "",
-3. STRATEGIC TARGETING (Use-Case-Logik):
+            "3. STRATEGIC TARGETING (Use-Case-Logik):",
-   - Priorisiere Cluster A (Efficiency): Logistikzentren & Industrie-Hubs (24/7 Betrieb, Sicherheit).
+            "   - Priorisiere Cluster A (Efficiency): Logistikzentren & Industrie-Hubs (24/7 Betrieb, Sicherheit).",
-   - Priorisiere Cluster B (Experience): Shopping Center, Outlets & Freizeitparks (Sauberkeit als Visitenkarte).
+            "   - Priorisiere Cluster B (Experience): Shopping Center, Outlets & Freizeitparks (Sauberkeit als Visitenkarte).",
-   - Entferne reine E-Commerce-Händler ohne physische Kundenfläche.
+            "   - Entferne reine E-Commerce-Händler ohne physische Kundenfläche.",
-
+            "",
-4. THE "HYBRID SERVICE" LOGIC (RULE 5):
+            '4. THE "HYBRID SERVICE" LOGIC (RULE 5):',
-   Wann immer du ein "Hartes Constraint" oder eine technische Limitierung identifizierst (z.B. "Kein Winterdienst" oder "Kommt nicht in Ecken"), darfst du dies niemals als reines "Nein" stehen lassen.
+            'Wann immer du ein "Hartes Constraint" oder eine technische Limitierung identifizierst (z.B. "Kein Winterdienst" oder "Kommt nicht in Ecken"), darfst du dies niemals als reines "Nein" stehen lassen.',
-   Wende stattdessen die **"Yes, and..." Logik** an:
+            'Wende stattdessen die **"Yes, and..." Logik** an:',
-   1. **Identifiziere die Lücke:** (z.B. "Roboter kann bei Schnee nicht fahren").
+            '   1. **Identifiziere die Lücke:** (z.B. "Roboter kann bei Schnee nicht fahren").',
-   2. **Fülle die Lücke mit Service:** Schlage explizit vor, diesen Teil durch "Wackler Human Manpower" abzudecken.
+            '   2. **Fülle die Lücke mit Service:** Schlage explizit vor, diesen Teil durch "Wackler Human Manpower" abzudecken.',
-   3. **Formuliere den USP:** Positioniere das Gesamtpaket als "100% Coverage" (Roboter + Mensch aus einer Hand).
+            '   3. **Formuliere den USP:** Positioniere das Gesamtpaket als "100% Coverage" (Roboter + Mensch aus einer Hand).'
-
+        ])
 # THE PRODUCT MATRIX (CONTEXT)
 Behalte immer im Hinterkopf, dass wir bereits folgende Produkte im Portfolio haben:
 1. "Indoor Scrubber 50": Innenreinigung, Hartboden, Fokus: Supermärkte. Message: "Sauberkeit im laufenden Betrieb."
 2. "Service Bot Bella": Service/Gastro, Indoor. Fokus: Restaurants. Message: "Entlastung für Servicekräfte."
 """
    else:
-        return """
+        return "\n".join([
-# IDENTITY & PURPOSE
+            "# IDENTITY & PURPOSE",
-You are the "GTM Architect Engine" for Roboplanet. Your task is to develop a precise Go-to-Market strategy for new technical products (robots).
+            'You are the "GTM Architect Engine" for Roboplanet. Your task is to develop a precise Go-to-Market strategy for new technical products (robots).',
-You do not act as a creative copywriter, but as a strategic analyst. Your top goal is product-market fit and operational feasibility.
+            "You do not act as a creative copywriter, but as a strategic analyst. Your top goal is product-market fit and operational feasibility.",
-ALWAYS respond in ENGLISH.
+            "ALWAYS respond in ENGLISH.",
            "",
            "# CONTEXT: THE PARENT COMPANY (WACKLER)",
            "We are part of the Wackler Group, a major facility management service provider.",
            'Our strategy is NOT "Robots replace humans", but "Hybrid Cleaning":',
            "- 80% of work (monotonous area coverage) = Robots.",
            "- 20% of work (Edge cases, winter service, stairs, heavy debris) = Manual cleaning by Wackler.",
            "",
            "# STRICT ANALYSIS RULES (MUST FOLLOW):",
            "1. TECHNICAL FACT-CHECK (No Hallucinations):",
            "   - Analyze technical data extremely conservatively.",
            '   - Vacuum System = No "Winter Service" (snow) and no "Heavy Industry" (metal shavings), unless explicitly stated.',
            "   - Do not invent features just to fit a target audience.",
            "",
            "2. REGULATORY LOGIC (Traffic Regs):",
            '   - If Vmax < 20 km/h: Categorically exclude "Public Cities/Streets" (traffic obstruction).',
            '   - Change Focus: Concentrate exclusively on "Large, contiguous private areas" (Gated Areas).',
            "",
            "3. STRATEGIC TARGETING (Use Case Logic):",
            "   - Prioritize Cluster A (Efficiency): Logistics Centers & Industrial Hubs (24/7 ops, safety).",
            "   - Prioritize Cluster B (Experience): Shopping Centers, Outlets & Theme Parks (Cleanliness as a calling card).",
            "   - Remove pure E-commerce retailers without physical customer areas.",
            "",
            '4. THE "HYBRID SERVICE" LOGIC (RULE 5):',
            'Whenever you identify a "Hard Constraint" or technical limitation (e.g., "No winter service" or "Cannot reach corners"), never let this stand as a simple "No".',
            'Instead, apply the **"Yes, and..." logic**:',
            '   1. **Identify the gap:** (e.g., "Robot cannot operate in snow").',
            '   2. **Fill the gap with service:** Explicitly suggest covering this part with "Wackler Human Manpower".',
            '   3. **Formulate the USP:** Position the total package as "100% Coverage" (Robot + Human from a single source).'
        ])
-# CONTEXT: THE PARENT COMPANY (WACKLER)
+# --- ORCHESTRATOR LOGIC ---
 We are part of the Wackler Group, a major facility management service provider.
 Our strategy is NOT "Robots replace humans", but "Hybrid Cleaning":
 - 80% of work (monotonous area coverage) = Robots.
 - 20% of work (Edge cases, winter service, stairs, heavy debris) = Manual cleaning by Wackler.
-# STRICT ANALYSIS RULES (MUST FOLLOW):
+def analyze_product(product_input, lang):
-1. TECHNICAL FACT-CHECK (No Hallucinations):
+    # 1. Scraping if URL
-   - Analyze technical data extremely conservatively.
+    content = product_input
-   - Vacuum System = No "Winter Service" (snow) and no "Heavy Industry" (metal shavings), unless explicitly stated.
+    if re.match(r'^https?://', product_input.strip()):
-   - Do not invent features just to fit a target audience.
+        logging.info(f"Detected URL: {product_input}. Scraping...")
        scraped_text = get_text_from_url(product_input.strip())
        if scraped_text:
            content = scraped_text
            logging.info(f"Scraped {len(content)} chars.")
        else:
            logging.warning("Scraping failed, using URL as input.")
 2. REGULATORY LOGIC (Traffic Regs):
   - If Vmax < 20 km/h: Categorically exclude "Public Cities/Streets" (traffic obstruction).
   - Change Focus: Concentrate exclusively on "Large, contiguous private areas" (Gated Areas).
 3. STRATEGIC TARGETING (Use Case Logic):
   - Prioritize Cluster A (Efficiency): Logistics Centers & Industrial Hubs (24/7 ops, safety).
   - Prioritize Cluster B (Experience): Shopping Centers, Outlets & Theme Parks (Cleanliness as a calling card).
   - Remove pure E-commerce retailers without physical customer areas.
 4. THE "HYBRID SERVICE" LOGIC (RULE 5):
   Whenever you identify a "Hard Constraint" or a technical limitation (e.g., "No winter service" or "Cannot reach corners"), never let this stand as a simple "No".
   Instead, apply the **"Yes, and..." logic**:
   1. **Identify the gap:** (e.g., "Robot cannot operate in snow").
   2. **Fill the gap with service:** Explicitly suggest covering this part with "Wackler Human Manpower".
   3. **Formulate the USP:** Position the total package as "100% Coverage" (Robot + Human from a single source).
 # THE PRODUCT MATRIX (CONTEXT)
 Always keep in mind that we already have the following products in our portfolio:
 1. "Indoor Scrubber 50": Indoor cleaning, hard floor, supermarkets.
 2. "Service Bot Bella": Service/Hospitality, indoor. Focus: restaurants. Message: "Relief for service staff."
 """
 # --- Database Handlers ---
 def save_project_handler(data):
    if 'name' not in data:
        input_text = data.get('productInput', '')
        derived_name = input_text.split('\n')[0][:30] if input_text else "Untitled Strategy"
        data['name'] = derived_name
    result = market_db_manager.save_project(data)
    print(json.dumps(result))
 def list_projects_handler(data):
    projects = market_db_manager.get_all_projects()
    print(json.dumps(projects))
 def load_project_handler(data):
    project_id = data.get('id')
    project = market_db_manager.load_project(project_id)
    if project:
        print(json.dumps(project))
    else:
        print(json.dumps({"error": "Project not found"}))
 def delete_project_handler(data):
    project_id = data.get('id')
    result = market_db_manager.delete_project(project_id)
    print(json.dumps(result))
 # --- AI Handlers ---
 def analyze_product(data):
    product_input = data.get('productInput')
    lang = data.get('language', 'de')
    sys_instr = get_system_instruction(lang)
    # 1. Extraction
    prompt_extract = "\n".join([
        "PHASE 1-A: TECHNICAL EXTRACTION",
        f'Input Product Description: "{content[:25000]}"',
        "",
        "Task:",
        "1. Extract key technical features (specs, capabilities).",
        '2. Derive "Hard Constraints". IMPORTANT: Check Vmax (<20km/h = Private Grounds) and Cleaning Type (Vacuum != Heavy Debris/Snow).',
        "3. Create a short raw analysis summary.",
        "",
        "Output JSON format ONLY:",
        "{",
        '    "features": ["feature1", "feature2"],
        '    "constraints": ["constraint1", "constraint2"],
        '    "rawAnalysis": "summary text"',
        "}"
    ])
    log_to_stderr("Starting Phase 1-A: Technical Extraction...")
    raw_response = call_gemini_flash(prompt_extract, system_instruction=sys_instr, json_mode=True)
    try:
        data = json.loads(raw_response)
    except json.JSONDecodeError:
        logging.error(f"Failed to parse Phase 1 JSON: {raw_response}")
        return {"features": [], "constraints": [], "rawAnalysis": "Error parsing AI response."}
-    # Prompts als Liste von Strings konstruieren, um Syntax-Fehler zu vermeiden
+    # 2. Conflict Check
-    if lang == 'en':
+    prompt_conflict = "\n".join([
-        extraction_prompt_parts = [
+        "PHASE 1-B: PORTFOLIO CONFLICT CHECK",
-            "PHASE 1-A: TECHNICAL EXTRACTION",
+        "",
-            f"Input Product Description: \"{product_input[:25000]}\"",
+        f"New Product Features: {json.dumps(data.get('features'))}",
-            "",
+        f"New Product Constraints: {json.dumps(data.get('constraints'))}",
-            "Task:",
+        "",
-            "1) Extract key technical features (specs, capabilities).",
+        "Existing Portfolio:",
-            "2) Derive \"Hard Constraints\". IMPORTANT: Check Vmax (<20km/h = Private Grounds) and Cleaning Type (Vacuum != Heavy Debris/Snow).",
+        '1. "Indoor Scrubber 50": Indoor cleaning, hard floor, supermarkets.',
-            "3) Create a short raw analysis summary.",
+        '2. "Service Bot Bella": Service/Gastro, indoor, restaurants.',
-            "",
+        "",
-            "Output JSON format ONLY."
+        "Task:",
-        ]
+        "Check if the new product overlaps significantly with existing ones (is it just a clone?).",
-    else:
+        "",
-        extraction_prompt_parts = [
+        "Output JSON format ONLY:",
-            "PHASE 1-A: TECHNICAL EXTRACTION",
+        "{",
-            f"Input Product Description: \"{product_input[:25000]}\"",
+        '    "conflictCheck": {',
-            "",
+        '        "hasConflict": true/false,',
-            "Aufgabe:",
+        '        "details": "explanation",',
-            "1) Extrahiere technische Hauptmerkmale (Specs, Fähigkeiten).",
+        '        "relatedProduct": "name or null"',
-            "2) Leite \"Harte Constraints\" ab. WICHTIG: Prüfe Vmax (<20km/h = Privatgelände) und Reinigungstyp (Vakuum != Grobschmutz/Schnee).",
+        "    }
-            "3) Erstelle eine kurze Rohanalyse-Zusammenfassung.",
+    ])
-            "",
+    
-            "Output JSON format ONLY."
+    log_to_stderr("Starting Phase 1-B: Conflict Check...")
-        ]
+    conflict_response = call_gemini_flash(prompt_conflict, system_instruction=sys_instr, json_mode=True)
-    extraction_prompt = "\n".join(extraction_prompt_parts)
+    try:
        conflict_data = json.loads(conflict_response)
        data.update(conflict_data)
    except:
        pass # Ignore conflict check error
    return data
-    full_extraction_prompt = sys_instr + "\n\n" + extraction_prompt
+def discover_icps(phase1_result, lang):
-    print(f"DEBUG: Full Extraction Prompt: \n{full_extraction_prompt[:1000]}...\n", file=sys.stderr)
+    sys_instr = get_system_instruction(lang)
-    extraction_response = call_openai_chat(full_extraction_prompt, response_format_json=True)
+    prompt = "\n".join([
-    print(f"DEBUG: Raw Extraction Response from API: {extraction_response}", file=sys.stderr)
+        "PHASE 2: ICP DISCOVERY & DATA PROXIES",
-    extraction_data = json.loads(extraction_response)
+        f"Based on the product features: {json.dumps(phase1_result.get('features'))}",
        f"And constraints: {json.dumps(phase1_result.get('constraints'))}",
        "",
        "Task:",
        "1. Negative Selection: Which industries are impossible? (Remember Vmax & Vacuum rules!)",
        "2. High Pain: Identify Cluster A (Logistics/Industry) and Cluster B (Shopping/Outlets).",
        "3. Data Proxy Generation: How to find them digitally via data traces (e.g. satellite, registries).",
        "",
        "Output JSON format ONLY:",
        "{",
        '    "icps": [',
        '        { "name": "Industry Name", "rationale": "Why this is a good fit" }',
        "    ],
        '    "dataProxies": [',
        '        { "target": "Specific criteria", "method": "How to find" }',
        "    ]
    ])
    log_to_stderr("Starting Phase 2: ICP Discovery...")
    response = call_gemini_flash(prompt, system_instruction=sys_instr, json_mode=True)
    return json.loads(response)
-    features_json = json.dumps(extraction_data.get('features'))
+def hunt_whales(phase2_result, lang):
-    constraints_json = json.dumps(extraction_data.get('constraints'))
+    sys_instr = get_system_instruction(lang)
    prompt = "\n".join([
        "PHASE 3: WHALE HUNTING",
        f"Target ICPs (Industries): {json.dumps(phase2_result.get('icps'))}",
        "",
        "Task:",
        "1. Group 'Whales' (Key Accounts) strictly by the identified ICP industries.",
        "2. Identify 3-5 concrete top companies in the DACH market per industry.",
        "3. Define Buying Center Roles.",
        "",
        "Output JSON format ONLY:",
        "{",
        '    "whales": [',
        '        { "industry": "Name of ICP Industry", "accounts": ["Company A", "Company B"] }',
        "    ],
        '    "roles": ["Job Title 1", "Job Title 2"]
    ])
    log_to_stderr("Starting Phase 3: Whale Hunting...")
    response = call_gemini_flash(prompt, system_instruction=sys_instr, json_mode=True)
    return json.loads(response)
-    if lang == 'en':
+def develop_strategy(phase3_result, phase1_result, lang):
-        conflict_prompt_parts = [
+    sys_instr = get_system_instruction(lang)
-            "PHASE 1-B: PORTFOLIO CONFLICT CHECK",
+    
-            "",
+    all_accounts = []
-            f"New Product Features: {features_json}",
+    for w in phase3_result.get('whales', []):
-            f"New Product Constraints: {constraints_json}",
+        all_accounts.extend(w.get('accounts', []))
            "",
            "Existing Portfolio:",
            "1. 'Indoor Scrubber 50': Indoor, Hard Floor, Supermarkets.",
            "2. 'Service Bot Bella': Indoor, Service/Hospitality, Restaurants.",
            "",
            "Task: Check for cannibalization. Does the new product make an existing one obsolete? Explain briefly.",
            "Output JSON format ONLY."
        ]
    else:
        conflict_prompt_parts = [
            "PHASE 1-B: PORTFOLIO-KONFLIKT-CHECK",
            "",
            f"Neue Produktmerkmale: {features_json}",
            f"Neue Produkt-Constraints: {constraints_json}",
            "",
            "Bestehendes Portfolio:",
            "1. 'Indoor Scrubber 50': Innenreinigung, Hartboden, Supermärkte.",
            "2. 'Service Bot Bella': Indoor, Service/Gastro, Restaurants.",
            "",
            "Aufgabe: Prüfe auf Kannibalisierung. Macht das neue Produkt ein bestehendes obsolet? Begründe kurz.",
            "Output JSON format ONLY."
        ]
    conflict_prompt = "\n".join(conflict_prompt_parts)
-    full_conflict_prompt = sys_instr + "\n\n" + conflict_prompt
+    prompt = "\n".join([
-    print(f"DEBUG: Full Conflict Prompt: \n{full_conflict_prompt[:1000]}...\n", file=sys.stderr)
+        "PHASE 4: STRATEGY & ANGLE DEVELOPMENT",
-    conflict_response = call_openai_chat(full_conflict_prompt, response_format_json=True)
+        f"Accounts: {json.dumps(all_accounts)}",
-    conflict_data = json.loads(conflict_response)
+        f"Product Features: {json.dumps(phase1_result.get('features'))}",
        "",
        "Task:",
        "1. Develop specific 'Angle' per target/industry.",
        "2. Consistency Check against Product Matrix.",
        '3. **IMPORTANT:** Apply "Hybrid Service Logic" if technical constraints exist!',
        "",
        "Output JSON format ONLY:",
        "{",
        '    "strategyMatrix": [',
        "        {",
        '            "segment": "Target Segment",',
        '            "painPoint": "Specific Pain",',
        '            "angle": "Our Marketing Angle",',
        '            "differentiation": "How it differs"',
        "        }
    ])
    log_to_stderr("Starting Phase 4: Strategy...")
    response = call_gemini_flash(prompt, system_instruction=sys_instr, json_mode=True)
    return json.loads(response)
-    # --- Final Response Assembly ---
+def generate_assets(phase4_result, phase3_result, phase2_result, phase1_result, lang):
-    final_response = {
+    sys_instr = get_system_instruction(lang)
-        "phase1_technical_extraction": extraction_data,
+    prompt = "\n".join([
-        "phase2_portfolio_conflict": conflict_data,
+        "PHASE 5: ASSET GENERATION & FINAL REPORT",
-    }
+        "",
-    print(json.dumps(final_response, indent=2))
+        "CONTEXT DATA:",
        f"- Technical: {json.dumps(phase1_result)}",
        f"- ICPs: {json.dumps(phase2_result)}",
        f"- Targets (Whales): {json.dumps(phase3_result)}",
        f"- Strategy: {json.dumps(phase4_result)}",
        "",
        "TASK:",
        '1. Create a "GTM STRATEGY REPORT" in Markdown.',
        "2. Report Structure: Executive Summary, Product Analysis, Target Audience, Target Accounts, Strategy Matrix, Assets.",
        '3. Hybrid-Check: Ensure "Hybrid Service Logic" is visible.',
        "",
        "Output:",
        'Return strictly MARKDOWN formatted text. Start with "# GTM STRATEGY REPORT".'
    ])
    # For Phase 5, we expect TEXT (Markdown), not JSON. So json_mode=False.
    log_to_stderr("Starting Phase 5: Asset Generation...")
    response = call_gemini_flash(prompt, system_instruction=sys_instr, json_mode=False)
    # The frontend expects a string here, not a JSON object wrapping it?
    return response
 def generate_sales_enablement(phase4_result, phase3_result, phase1_result, lang):
    sys_instr = get_system_instruction(lang)
    prompt = "\n".join([
        "PHASE 6: SALES ENABLEMENT & VISUALS",
        "",
        "CONTEXT:",
        f"- Product Features: {json.dumps(phase1_result.get('features'))}",
        f"- Accounts (Personas): {json.dumps(phase3_result.get('roles'))}",
        f"- Strategy: {json.dumps(phase4_result.get('strategyMatrix'))}",
        "",
        "TASK:",
        "1. Anticipate Friction & Objections.",
        "2. Formulate Battlecards.",
        "3. Create Visual Prompts.",
        "",
        "Output JSON format ONLY:",
        "{",
        '    "battlecards": [',
        "        {",
        '            "persona": "Role",',
        '            "objection": "Objection quote",',
        '            "responseScript": "Response"',
        "        }
    ],
        '    "visualPrompts": [',
        "        {",
        '            "title": "Title",',
        '            "context": "Context",',
        '            "prompt": "Prompt Code"',
        "        }
    ],
    ])
    log_to_stderr("Starting Phase 6: Sales Enablement...")
    response = call_gemini_flash(prompt, system_instruction=sys_instr, json_mode=True)
    return json.loads(response)
-# --- Main Execution Logic ---
+# --- MAIN ---
 def main():
-    parser = argparse.ArgumentParser(description='GTM Architect main entry point.')
+    log_to_stderr("--- GTM Orchestrator Starting ---")
    parser.add_argument('--data', help='JSON data for the command.')
    parser.add_argument('--mode', required=True, help='The command to execute.')
    args = parser.parse_args()
-    command = args.mode
+    # --- CRITICAL FIXES FOR API KEY & SCRAPING ---
-    
+    # 1. Load API keys manually because helpers.py relies on Config class state
-    # Get data from --data arg or stdin
+    try:
-    if args.data:
+        Config.load_api_keys()
-        data = json.loads(args.data)
+        log_to_stderr("API Keys loaded.")
-    else:
+        logging.info("Config.load_api_keys() called successfully.")
-        # Read from stdin if no data arg provided
+    except Exception as e:
-        try:
+        log_to_stderr(f"CRITICAL: Failed to load API keys: {e}")
-            input_data = sys.stdin.read()
+        logging.critical(f"Failed to load API keys: {e}")
-            data = json.loads(input_data) if input_data.strip() else {}
+    # ---------------------------------------------
        except Exception:
            data = {}
-    if not command:
+    parser = argparse.ArgumentParser()
-        print(json.dumps({"error": "No command or mode provided"}))
+    parser.add_argument('--mode', required=True)
    parser.add_argument('--data', required=True)
    try:
        args = parser.parse_args()
        data_in = json.loads(args.data)
        mode = args.mode
        lang = data_in.get('language', 'de')
        log_to_stderr(f"Processing mode: {mode} in language: {lang}")
        logging.info(f"Processing mode: {mode} in language: {lang}")
        result = {}
        if mode == 'analyze_product':
            product_input = data_in.get('productInput')
            result = analyze_product(product_input, lang)
        elif mode == 'discover_icps':
            phase1_result = data_in.get('phase1Result')
            result = discover_icps(phase1_result, lang)
        elif mode == 'hunt_whales':
            phase2_result = data_in.get('phase2Result')
            result = hunt_whales(phase2_result, lang)
        elif mode == 'develop_strategy':
            phase3_result = data_in.get('phase3Result')
            phase1_result = data_in.get('phase1Result')
            result = develop_strategy(phase3_result, phase1_result, lang)
        elif mode == 'generate_assets':
            phase4_result = data_in.get('phase4Result')
            phase3_result = data_in.get('phase3Result')
            phase2_result = data_in.get('phase2Result')
            phase1_result = data_in.get('phase1Result')
            # Returns a string (Markdown)
            markdown_report = generate_assets(phase4_result, phase3_result, phase2_result, phase1_result, lang)
            print(json.dumps(markdown_report)) 
            log_to_stderr("Finished Phase 5. Output sent to stdout.")
            return
        elif mode == 'generate_sales_enablement':
            phase4_result = data_in.get('phase4Result')
            phase3_result = data_in.get('phase3Result')
            phase1_result = data_in.get('phase1Result')
            result = generate_sales_enablement(phase4_result, phase3_result, phase1_result, lang)
        else:
            logging.error(f"Unknown mode: {mode}")
            result = {"error": f"Unknown mode: {mode}"}
        print(json.dumps(result))
        log_to_stderr("Finished. Output sent to stdout.")
    except Exception as e:
        log_to_stderr(f"CRITICAL ERROR: {e}")
        logging.error(f"Error in orchestrator: {e}", exc_info=True)
        # Return error as JSON so server.cjs can handle it gracefully
        print(json.dumps({"error": str(e)}))
        sys.exit(1)
-    command_handlers = {
+if __name__ == "__main__":
-        'save': save_project_handler,
+    main()
        'list': list_projects_handler,
        'load': load_project_handler,
        'delete': delete_project_handler,
        'analyze': analyze_product,
    }
    handler = command_handlers.get(command)
    if handler:
        handler(data)
    else:
        print(json.dumps({"error": f"Unknown command: {command}"}))
 if __name__ == '__main__':
    print(f"GTM Architect Engine v{__version__}_FORCE_RELOAD_2 running.", file=sys.stderr)
    main()
--- a/helpers.py
+++ b/helpers.py