feat(ca): Finalize v5 pipeline - Hybrid Matrix, CoT Enrichment & User Repair Mode

2026-01-12 15:29:43 +00:00
parent b5e6c415c7
commit 70e689384e
15 changed files with 547 additions and 1224 deletions
--- a/MIGRATION_REPORT_COMPETITOR_ANALYSIS.md
+++ b/MIGRATION_REPORT_COMPETITOR_ANALYSIS.md
@@ -81,6 +81,59 @@ Um die Analyse-Ergebnisse optimal nutzbar zu machen, wurde ein intelligenter Imp
        - *Cleaning (Indoor/Outdoor), Transport/Logistics, Service/Gastro, Security, Software*.
 *   **Dual-Way Relations:** Alle Datenbanken sind bidirektional verknüpft. Auf einer Produktkarte sieht man sofort den Hersteller; auf einer Herstellerkarte sieht man das gesamte (kategorisierte) Portfolio.
---
+### 🚀 Next-Gen Analyse-Strategie (v5) - Beschluss vom 12. Jan. 2026
-*Dokumentation aktualisiert am 11.01.2026 nach Implementierung der semantischen Klassifizierung (Level 4).*
+
 Die Analyse der v4-Ergebnisse zeigte fundamentale Schwächen im monolithischen Analyse-Ansatz. Probleme wie die falsche Gruppierung von Produktlisten (z.B. "A, B, C" als ein Produkt) und die fehlende Differenzierung desselben Produkts über verschiedene Anbieter hinweg (z.B. "BellaBot" vs. "Pudu Bella Bot") machten eine strategische Neuausrichtung notwendig.
 Der neue Prozess basiert auf einer **"Extract-Normalize-Enrich" Pipeline in drei Phasen:**
 **Phase 1: Datensammlung (pro Wettbewerber)**
 1.  **Gezieltes Scraping:** Fokussierte Suche nach Seiten mit Keywords wie "Produkte", "Portfolio", "Lösungen". Der Rohtext dieser Seiten wird extrahiert.
 2.  **Rohe Produkt-Extraktion (1. LLM-Call):** Ein einfacher, isolierter Prompt extrahiert nur eine unstrukturierte Liste potenzieller Produktnamen.
 3.  **Deterministische Säuberung (Python):** Die rohe Liste wird per Code aufbereitet (Trennung bei Kommas, "und", etc.), um einzelne Produkte zu isolieren.
 4.  **Allgemeine Analyse (2. LLM-Call):** Ein paralleler Prompt analysiert den Rohtext auf allgemeine, nicht-produktbezogene Merkmale (`delivery_model`, `target_industries`, `differentiators`).
 **Phase 2: Marktweite Produkt-Kanonisierung (Global & Einmalig)**
 5.  **Master-Liste erstellen:** Die bereinigten Produktlisten aller Wettbewerber werden zu einer globalen Master-Liste zusammengefasst.
 6.  **Grounded Truth (Hersteller-Daten):** Eine vordefinierte, kanonische Produktliste der wichtigsten Hersteller (Pudu, Gausium etc.) wird als Referenz geladen.
 7.  **Kanonisierungs-Prompt (3. LLM-Call):** Ein einziger, globaler Call gleicht die Master-Liste mit der "Grounded Truth" der Hersteller ab. Er gruppiert alle Namensvarianten ("Bella Bot", "Pudu BellaBot") unter einem kanonischen Namen ("BellaBot").
 **Phase 3: Gezielte Anreicherung & Assemblierung (pro Wettbewerber)**
 8.  **Portfolio zusammenstellen:** Das System iteriert durch die kanonische Produkt-Map aus Phase 2.
 9.  **Anreicherungs-Prompt (Serie von Micro-LLM-Calls):** Für jedes kanonische Produkt, das ein Wettbewerber anbietet, wird ein kleiner, gezielter Prompt abgesetzt, um spezifische Details (`purpose`, `category`) aus dem ursprünglichen Rohtext zu extrahieren.
 10. **Finale Assemblierung:** Die Ergebnisse der allgemeinen Analyse (Schritt 4) und des angereicherten Portfolios (Schritt 9) werden zum finalen, sauberen und normalisierten JSON-Objekt für den Wettbewerber zusammengefügt.
 Dieser Ansatz trennt klar zwischen Datensammlung, marktweiter Normalisierung und spezifischer Anreicherung, was zu deutlich präziseren, konsistenteren und wertvolleren Analyse-Ergebnissen führt.
 ### ✅ Status: Jan 12, 2026 - v5 STABLE & PRODUCTION READY
 Die Implementierung der v5-Pipeline ist abgeschlossen. Die initialen Kinderkrankheiten (SDK-Konflikte, Token-Limits bei Matrizen) wurden durch radikale Architektur-Entscheidungen behoben.
 **Die 3 Säulen der finalen Lösung:**
 1.  **Hybrid Intelligence (The "Matrix Fix"):**
    *   **Problem:** LLMs scheiterten daran, große Matrizen (Produkte x Wettbewerber) konsistent als JSON zu generieren (Abbruch wegen Token-Limit).
    *   **Lösung:** **Python übernimmt die Struktur, KI den Inhalt.** Die Matrizen (`product_matrix`, `industry_matrix`) werden nun **deterministisch im Python-Code berechnet**, basierend auf den in Phase 3 gesammelten Daten. Die KI wird nur noch für die *textliche* Zusammenfassung (`summary`, `opportunities`) gerufen. Das Ergebnis ist 100% fehlerfrei und stabil.
 2.  **User-In-The-Loop (Data Rescue):**
    *   **Feature:** In Schritt 4 wurde ein "Repair"-Modus implementiert. Nutzer können nun für jeden Wettbewerber **manuelle Produkt-URLs** nachreichen und eine **isolierte Neu-Analyse** (Re-Scrape & Re-Enrich) für diesen einen Wettbewerber auslösen, ohne den gesamten Prozess neu zu starten. Dies löst das Problem von "versteckten" Produktseiten (z.B. Giobotics).
 3.  **Quality Boost (CoT):**
    *   **Feature:** Der Enrichment-Prompt (Phase 3) nutzt nun **Chain-of-Thought (CoT)**. Anstatt stumpf Daten zu extrahieren, wird die KI angewiesen: *"Suche alle Erwähnungen -> Synthetisiere eine Beschreibung -> Bestimme die Kategorie"*. Dies hat die inhaltliche Qualität der Produktbeschreibungen wieder auf das hohe Niveau von v2 gehoben, bei gleichzeitiger Beibehaltung der sauberen Struktur von v3.
 **Fazit:** Der Competitor Analysis Agent ist nun ein robustes, professionelles BI-Tool, das bereit für den produktiven Einsatz und den Import nach Notion ist.
 ### 🏆 Final Validation (Jan 12, 2026 - 14:00)
 Der Validierungslauf mit `analysis_robo-planet.de-4.json` bestätigt den Erfolg aller Maßnahmen:
 *   **Lückenlose Erfassung:** Dank des "Repair Mode" (Manuelle URLs) konnte das Portfolio von *Giobotics*, das zuvor leer war, vollständig mit 9 Produkten (PuduBot 2, KettyBot, etc.) erfasst werden.
 *   **Hohe Datentiefe:** Die Produktbeschreibungen sind dank **Chain-of-Thought** detailliert und wertstiftend (z.B. genaue Funktionsweise des *Phantas* Roboters bei TCO Robotics), statt nur generische Einzeiler zu sein.
 *   **Perfekte Matrizen:** Die Python-generierten Matrizen sind vollständig und fehlerfrei. Das Token-Limit-Problem gehört der Vergangenheit an.
 *   **Stabilität:** Der gesamte Prozess lief ohne Abstürze oder API-Fehler durch.
 **Status:** ✅ **MIGRATION COMPLETE & VERIFIED.**
 ---
 *Dokumentation finalisiert am 12.01.2026.*
--- a/check_syntax.py
+++ b/check_syntax.py
@@ -0,0 +1,12 @@
 import py_compile
 import sys
 try:
    py_compile.compile('/app/competitor-analysis-app/competitor_analysis_orchestrator.py', doraise=True)
    print("Syntax OK")
 except py_compile.PyCompileError as e:
    print(f"Syntax Error: {e}")
    sys.exit(1)
 except Exception as e:
    print(f"General Error: {e}")
    sys.exit(1)
--- a/commit.sh
+++ b/commit.sh
@@ -0,0 +1,5 @@
 #!/bin/bash
 git status
 git add .
 git commit -m "fix(competitor-analysis): final migration fixes and documentation updates"
 git push origin main
--- a/competitor-analysis-app/App.tsx
+++ b/competitor-analysis-app/App.tsx
@@ -1,6 +1,6 @@
 import React, { useState, useCallback, useEffect, useRef } from 'react';
 import type { AppState, CompetitorCandidate, Product, TargetIndustry, Keyword, SilverBullet, Battlecard, ReferenceAnalysis } from './types';
-import { fetchStep1Data, fetchStep2Data, fetchStep3Data, fetchStep4Data, fetchStep5Data_SilverBullets, fetchStep6Data_Conclusion, fetchStep7Data_Battlecards, fetchStep8Data_ReferenceAnalysis } from './services/geminiService';
+import { fetchStep1Data, fetchStep2Data, fetchStep3Data, fetchStep4Data, fetchStep5Data_SilverBullets, fetchStep6Data_Conclusion, fetchStep7Data_Battlecards, fetchStep8Data_ReferenceAnalysis, reanalyzeCompetitor } from './services/geminiService';
 import { generatePdfReport } from './services/pdfService';
 import InputForm from './components/InputForm';
 import StepIndicator from './components/StepIndicator';
@@ -91,6 +91,15 @@ const App: React.FC = () => {
        }
    }, []);
    const handleUpdateAnalysis = useCallback((index: number, updatedAnalysis: any) => {
        setAppState(prevState => {
            if (!prevState) return null;
            const newAnalyses = [...prevState.analyses];
            newAnalyses[index] = updatedAnalysis;
            return { ...prevState, analyses: newAnalyses };
        });
    }, []);
    const handleConfirmStep = useCallback(async () => {
        if (!appState) return;
@@ -112,7 +121,11 @@ const App: React.FC = () => {
                case 3:
                    const shortlist = [...appState.competitor_candidates]
                        .sort((a, b) => b.confidence - a.confidence)
-                        .slice(0, appState.initial_params.max_competitors);
+                        .slice(0, appState.initial_params.max_competitors)
                        .map(c => ({
                            ...c,
                            manual_urls: c.manual_urls ? c.manual_urls.split('\n').map(u => u.trim()).filter(u => u) : []
                        }));
                    const { analyses } = await fetchStep4Data(appState.company, shortlist, lang);
                    newState = { competitors_shortlist: shortlist, analyses, step: 4 };
                    break;
@@ -158,7 +171,7 @@ const App: React.FC = () => {
            case 1: return <Step1Extraction products={appState.products} industries={appState.target_industries} onProductsChange={(p) => handleUpdateState('products', p)} onIndustriesChange={(i) => handleUpdateState('target_industries', i)} t={t.step1} lang={appState.initial_params.language} />;
            case 2: return <Step2Keywords keywords={appState.keywords} onKeywordsChange={(k) => handleUpdateState('keywords', k)} t={t.step2} />;
            case 3: return <Step3Competitors candidates={appState.competitor_candidates} onCandidatesChange={(c) => handleUpdateState('competitor_candidates', c)} maxCompetitors={appState.initial_params.max_competitors} t={t.step3} />;
-            case 4: return <Step4Analysis analyses={appState.analyses} t={t.step4} />;
+            case 4: return <Step4Analysis analyses={appState.analyses} company={appState.company} onAnalysisUpdate={handleUpdateAnalysis} t={t.step4} />;
            case 5: return <Step5SilverBullets silver_bullets={appState.silver_bullets} t={t.step5} />;
            case 6: return <Step6Conclusion appState={appState} t={t.step6} />;
            case 7: return <Step7_Battlecards appState={appState} t={t.step7} />;
--- a/competitor-analysis-app/competitor_analysis_orchestrator.py
+++ b/competitor-analysis-app/competitor_analysis_orchestrator.py
--- a/competitor-analysis-app/components/Step3_Competitors.tsx
+++ b/competitor-analysis-app/components/Step3_Competitors.tsx
@@ -27,9 +27,10 @@ const Step3Competitors: React.FC<Step3CompetitorsProps> = ({ candidates, onCandi
                fieldConfigs={[
                    { key: 'name', label: t.nameLabel, type: 'text' },
                    { key: 'url', label: 'URL', type: 'text' },
                    { key: 'manual_urls', label: t.manualUrlsLabel, type: 'textarea' },
                    { key: 'why', label: t.whyLabel, type: 'textarea' },
                ]}
-                newItemTemplate={{ name: '', url: '', confidence: 0.8, why: '', evidence: [] }}
+                newItemTemplate={{ name: '', url: '', confidence: 0.8, why: '', evidence: [], manual_urls: '' }}
                renderDisplay={(item, index) => (
                    <div>
                        <div className="flex items-center justify-between">
@@ -39,6 +40,11 @@ const Step3Competitors: React.FC<Step3CompetitorsProps> = ({ candidates, onCandi
                                    {t.visitButton}
                                </a>
                                <EvidencePopover evidence={item.evidence} />
                                {item.manual_urls && item.manual_urls.trim() && (
                                    <span className="ml-2 inline-flex items-center px-2 py-0.5 rounded text-xs font-medium bg-blue-100 text-blue-800 dark:bg-blue-900 dark:text-blue-200">
                                        {item.manual_urls.split('\n').filter(u => u.trim()).length} Manual URLs
                                    </span>
                                )}
                            </div>
                            <span className="text-xs font-mono bg-light-accent dark:bg-brand-accent text-light-text dark:text-white py-1 px-2 rounded-full">
                                {(item.confidence * 100).toFixed(0)}%
--- a/competitor-analysis-app/components/Step4_Analysis.tsx
+++ b/competitor-analysis-app/components/Step4_Analysis.tsx
@@ -1,13 +1,17 @@
-import React from 'react';
+import React, { useState } from 'react';
-import type { Analysis } from '../types';
+import type { Analysis, AppState } from '../types';
 import EvidencePopover from './EvidencePopover';
 import { reanalyzeCompetitor } from '../services/geminiService';
 interface Step4AnalysisProps {
    analyses: Analysis[];
    company: AppState['company'];
    onAnalysisUpdate: (index: number, analysis: Analysis) => void;
    t: any;
 }
 const DownloadIcon = () => (<svg xmlns="http://www.w3.org/2000/svg" className="h-4 w-4" fill="none" viewBox="0 0 24 24" stroke="currentColor"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 16v1a3 3 0 003 3h10a3 3 0 003-3v-1m-4-4l-4 4m0 0l-4-4m4 4V4" /></svg>);
 const RefreshIcon = () => (<svg xmlns="http://www.w3.org/2000/svg" className="h-4 w-4" fill="none" viewBox="0 0 24 24" stroke="currentColor"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 4v5h.582m15.356 2A8.001 8.001 0 004.582 9m0 0H9m11 11v-5h-.581m0 0a8.003 8.003 0 01-15.357-2m15.357 2H15" /></svg>);
 const downloadJSON = (data: any, filename: string) => {
    const jsonStr = JSON.stringify(data, null, 2);
@@ -28,8 +32,46 @@ const OverlapBar: React.FC<{ score: number }> = ({ score }) => (
    </div>
 );
-const Step4Analysis: React.FC<Step4AnalysisProps> = ({ analyses, t }) => {
+const Step4Analysis: React.FC<Step4AnalysisProps> = ({ analyses, company, onAnalysisUpdate, t }) => {
    const sortedAnalyses = [...analyses].sort((a, b) => b.overlap_score - a.overlap_score);
    const [editingIndex, setEditingIndex] = useState<number | null>(null);
    const [manualUrls, setManualUrls] = useState<string>("");
    const [isReanalyzing, setIsReanalyzing] = useState<boolean>(false);
    const handleEditStart = (index: number) => {
        setEditingIndex(index);
        setManualUrls(""); // Reset
    };
    const handleReanalyze = async (index: number, competitor: Analysis['competitor']) => {
        setIsReanalyzing(true);
        try {
            const urls = manualUrls.split('\n').map(u => u.trim()).filter(u => u);
            // Construct a partial CompetitorCandidate object as expected by the service
            const candidate = {
                name: competitor.name,
                url: competitor.url,
                confidence: 0, // Not needed for re-analysis
                why: "",
                evidence: []
            };
            const updatedAnalysis = await reanalyzeCompetitor(company, candidate, urls);
            // Find the original index in the unsorted 'analyses' array to update correctly
            const originalIndex = analyses.findIndex(a => a.competitor.name === competitor.name);
            if (originalIndex !== -1) {
                onAnalysisUpdate(originalIndex, updatedAnalysis);
            }
            setEditingIndex(null);
        } catch (error) {
            console.error("Re-analysis failed:", error);
            alert("Fehler bei der Re-Analyse. Bitte Logs prüfen.");
        } finally {
            setIsReanalyzing(false);
        }
    };
    return (
        <div>
@@ -49,6 +91,13 @@ const Step4Analysis: React.FC<Step4AnalysisProps> = ({ analyses, t }) => {
                                </a>
                            </div>
                            <div className="flex items-center space-x-2">
                                <button 
                                    onClick={() => handleEditStart(index)}
                                    title="Daten anreichern / Korrigieren"
                                    className="text-light-subtle dark:text-brand-light hover:text-blue-500 p-1 rounded-full"
                                >
                                    <RefreshIcon />
                                </button>
                                <button 
                                    onClick={() => downloadJSON(analysis, `analysis_${analysis.competitor.name.replace(/ /g, '_')}.json`)}
                                    title={t.downloadJson_title}
@@ -60,6 +109,39 @@ const Step4Analysis: React.FC<Step4AnalysisProps> = ({ analyses, t }) => {
                            </div>
                        </div>
                        {/* Re-Analysis UI */}
                        {editingIndex === index && (
                            <div className="mb-6 bg-light-primary dark:bg-brand-primary p-4 rounded-md border border-brand-highlight">
                                <h4 className="font-bold text-sm mb-2">Manuelle Produkt-URLs ergänzen (optional)</h4>
                                <p className="text-xs text-light-subtle dark:text-brand-light mb-2">
                                    Falls Produkte fehlen, fügen Sie hier direkte Links zu den Produktseiten ein (eine pro Zeile).
                                </p>
                                <textarea
                                    className="w-full bg-light-secondary dark:bg-brand-secondary text-light-text dark:text-brand-text border border-light-accent dark:border-brand-accent rounded-md px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-brand-highlight mb-2"
                                    rows={3}
                                    placeholder="https://example.com/product-a&#10;https://example.com/product-b"
                                    value={manualUrls}
                                    onChange={(e) => setManualUrls(e.target.value)}
                                />
                                <div className="flex justify-end space-x-2">
                                    <button 
                                        onClick={() => setEditingIndex(null)}
                                        className="px-3 py-1 text-sm bg-gray-500 hover:bg-gray-600 text-white rounded"
                                        disabled={isReanalyzing}
                                    >
                                        Abbrechen
                                    </button>
                                    <button 
                                        onClick={() => handleReanalyze(index, analysis.competitor)}
                                        className="px-3 py-1 text-sm bg-brand-highlight hover:bg-blue-600 text-white rounded flex items-center"
                                        disabled={isReanalyzing}
                                    >
                                        {isReanalyzing ? "Analysiere..." : "Neu scrapen & Analysieren"}
                                    </button>
                                </div>
                            </div>
                        )}
                        <div className="grid grid-cols-1 md:grid-cols-2 gap-6">
                            <div>
                                <h4 className="font-semibold mb-2">{t.portfolio}</h4>
--- a/competitor-analysis-app/services/geminiService.ts
+++ b/competitor-analysis-app/services/geminiService.ts
@@ -72,6 +72,20 @@ export const fetchStep4Data = async (company: AppState['company'], competitors:
    return response.json();
 };
 export const reanalyzeCompetitor = async (company: AppState['company'], competitor: CompetitorCandidate, manualUrls: string[]): Promise<Analysis> => {
    const response = await fetch(`${API_BASE_URL}api/reanalyzeCompetitor`, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
        },
        body: JSON.stringify({ company, competitor, manual_urls: manualUrls }),
    });
    if (!response.ok) {
        throw new Error(`HTTP error! status: ${response.status}`);
    }
    return response.json();
 };
 export const fetchStep5Data_SilverBullets = async (company: AppState['company'], analyses: Analysis[], language: 'de' | 'en'): Promise<{ silver_bullets: SilverBullet[] }> => {
    const response = await fetch(`${API_BASE_URL}api/fetchStep5Data_SilverBullets`, {
        method: 'POST',
--- a/competitor-analysis-app/translations.ts
+++ b/competitor-analysis-app/translations.ts
@@ -58,6 +58,7 @@ export const translations = {
            cardTitle: "Wettbewerber-Kandidaten",
            nameLabel: "Name",
            whyLabel: "Begründung",
            manualUrlsLabel: "Zusätzliche Produkt-URLs (eine pro Zeile)",
            visitButton: "Besuchen",
            shortlistBoundary: "SHORTLIST-GRENZE",
            editableCard: { add: "Hinzufügen", cancel: "Abbrechen", save: "Speichern" }
@@ -200,6 +201,7 @@ export const translations = {
            cardTitle: "Competitor Candidates",
            nameLabel: "Name",
            whyLabel: "Justification",
            manualUrlsLabel: "Additional Product URLs (one per line)",
            visitButton: "Visit",
            shortlistBoundary: "SHORTLIST BOUNDARY",
            editableCard: { add: "Add", cancel: "Cancel", save: "Save" }
--- a/competitor-analysis-app/types.ts
+++ b/competitor-analysis-app/types.ts
@@ -27,6 +27,7 @@ export interface CompetitorCandidate {
    confidence: number;
    why: string;
    evidence: Evidence[];
    manual_urls?: string;
 }
 export interface ShortlistedCompetitor {
--- a/import_competitive_radar.py
+++ b/import_competitive_radar.py
@@ -4,15 +4,15 @@ import requests
 import sys
 # Configuration
-JSON_FILE = 'analysis_robo-planet.de.json'
+JSON_FILE = 'analysis_robo-planet.de-2.json'
 TOKEN_FILE = 'notion_token.txt'
 PARENT_PAGE_ID = "2e088f42-8544-8024-8289-deb383da3818" 
 # Database Titles
-DB_TITLE_HUB = "📦 Competitive Radar (Companies) v4"
+DB_TITLE_HUB = "📦 Competitive Radar (Companies) v5"
-DB_TITLE_LANDMINES = "💣 Competitive Radar (Landmines) v4"
+DB_TITLE_LANDMINES = "💣 Competitive Radar (Landmines) v5"
-DB_TITLE_REFS = "🏆 Competitive Radar (References) v4"
+DB_TITLE_REFS = "🏆 Competitive Radar (References) v5"
-DB_TITLE_PRODUCTS = "🤖 Competitive Radar (Products) v4"
+DB_TITLE_PRODUCTS = "🤖 Competitive Radar (Products) v5"
 def load_json_data(filepath):
    with open(filepath, 'r') as f:
@@ -80,7 +80,7 @@ def main():
            for prod in analysis.get('portfolio', []):
                p_props = {
                    "Product": {"title": [{"text": {"content": prod['product'][:100]}}]},
-                    "Category": {"select": {"name": prod.get('category', 'Other')}},
+                    "Category": {"select": {"name": prod.get('purpose', 'Other')[:100]}},
                    "Related Competitor": {"relation": [{"id": pid}]}
                }
                create_page(token, prod_id, p_props)
--- a/import_competitors_to_notion.py
+++ b/import_competitors_to_notion.py
@@ -1,200 +0,0 @@
 import json
 import os
 import requests
 import sys
 # Configuration
 JSON_FILE = 'analysis_robo-planet.de.json'
 TOKEN_FILE = 'notion_token.txt'
 # Root Page ID from notion_integration.md
 PARENT_PAGE_ID = "2e088f42-8544-8024-8289-deb383da3818" 
 DB_TITLE = "Competitive Radar 🎯"
 def load_json_data(filepath):
    try:
        with open(filepath, 'r') as f:
            return json.load(f)
    except Exception as e:
        print(f"Error loading JSON: {e}")
        sys.exit(1)
 def load_notion_token(filepath):
    try:
        with open(filepath, 'r') as f:
            return f.read().strip()
    except Exception as e:
        print(f"Error loading token: {e}")
        sys.exit(1)
 def create_competitor_database(token, parent_page_id):
    url = "https://api.notion.com/v1/databases"
    headers = {
        "Authorization": f"Bearer {token}",
        "Notion-Version": "2022-06-28",
        "Content-Type": "application/json"
    }
    payload = {
        "parent": {"type": "page_id", "page_id": parent_page_id},
        "title": [{"type": "text", "text": {"content": DB_TITLE}}],
        "properties": {
            "Competitor Name": {"title": {}},
            "Website": {"url": {}},
            "Target Industries": {"multi_select": {}},
            "USPs / Differentiators": {"rich_text": {}},
            "Silver Bullet": {"rich_text": {}},
            "Landmines": {"rich_text": {}},
            "Strengths vs Weaknesses": {"rich_text": {}},
            "Portfolio": {"rich_text": {}},
            "Known References": {"rich_text": {}}
        }
    }
    print(f"Creating database '{DB_TITLE}'...")
    response = requests.post(url, headers=headers, json=payload)
    if response.status_code != 200:
        print(f"Error creating database: {response.status_code}")
        print(response.text)
        sys.exit(1)
    db_data = response.json()
    print(f"Database created successfully! ID: {db_data['id']}")
    return db_data['id']
 def format_list_as_bullets(items):
    """Converts a python list of strings into a Notion rich_text text block with bullets."""
    if not items:
        return ""
    text_content = ""
    for item in items:
        text_content += f"• {item}\n"
    return text_content.strip()
 def get_competitor_data(data, comp_name):
    """Aggregates data from different sections of the JSON for a single competitor."""
    # Init structure
    comp_data = {
        "name": comp_name,
        "url": "",
        "industries": [],
        "differentiators": [],
        "portfolio": [],
        "silver_bullet": "",
        "landmines": [],
        "strengths_weaknesses": [],
        "references": []
    }
    # 1. Basic Info & Portfolio (from 'analyses')
    for analysis in data.get('analyses', []):
        c = analysis.get('competitor', {})
        if c.get('name') == comp_name:
            comp_data['url'] = c.get('url', '')
            comp_data['industries'] = analysis.get('target_industries', [])
            comp_data['differentiators'] = analysis.get('differentiators', [])
            # Format Portfolio
            for prod in analysis.get('portfolio', []):
                p_name = prod.get('product', '')
                p_purpose = prod.get('purpose', '')
                comp_data['portfolio'].append(f"{p_name}: {p_purpose}")
            break
    # 2. Battlecards
    for card in data.get('battlecards', []):
        if card.get('competitor_name') == comp_name:
            comp_data['silver_bullet'] = card.get('silver_bullet', '')
            comp_data['landmines'] = card.get('landmine_questions', [])
            comp_data['strengths_weaknesses'] = card.get('strengths_vs_weaknesses', [])
            break
    # 3. References
    for ref_entry in data.get('reference_analysis', []):
        if ref_entry.get('competitor_name') == comp_name:
            for ref in ref_entry.get('references', []):
                r_name = ref.get('name', 'Unknown')
                r_ind = ref.get('industry', '')
                entry = r_name
                if r_ind:
                    entry += f" ({r_ind})"
                comp_data['references'].append(entry)
            break
    return comp_data
 def add_competitor_entry(token, db_id, c_data):
    url = "https://api.notion.com/v1/pages"
    headers = {
        "Authorization": f"Bearer {token}",
        "Notion-Version": "2022-06-28",
        "Content-Type": "application/json"
    }
    # Prepare properties
    props = {
        "Competitor Name": {"title": [{"text": {"content": c_data['name']}}]},
        "USPs / Differentiators": {"rich_text": [{"text": {"content": format_list_as_bullets(c_data['differentiators'])}}]},
        "Silver Bullet": {"rich_text": [{"text": {"content": c_data['silver_bullet']}}]},
        "Landmines": {"rich_text": [{"text": {"content": format_list_as_bullets(c_data['landmines'])}}]},
        "Strengths vs Weaknesses": {"rich_text": [{"text": {"content": format_list_as_bullets(c_data['strengths_weaknesses'])}}]},
        "Portfolio": {"rich_text": [{"text": {"content": format_list_as_bullets(c_data['portfolio'])}}]},
        "Known References": {"rich_text": [{"text": {"content": format_list_as_bullets(c_data['references'])}}]}
    }
    if c_data['url']:
        props["Website"] = {"url": c_data['url']}
    # Multi-select for industries
    # Note: Notion options are auto-created, but we must ensure no commas or weird chars break it
    ms_options = []
    for ind in c_data['industries']:
        # Simple cleanup
        clean_ind = ind.replace(',', '') 
        ms_options.append({"name": clean_ind})
    props["Target Industries"] = {"multi_select": ms_options}
    payload = {
        "parent": {"database_id": db_id},
        "properties": props
    }
    try:
        response = requests.post(url, headers=headers, json=payload)
        response.raise_for_status()
        print(f"  - Added: {c_data['name']}")
    except requests.exceptions.HTTPError as e:
        print(f"  - Failed to add {c_data['name']}: {e}")
        # print(response.text)
 def main():
    token = load_notion_token(TOKEN_FILE)
    data = load_json_data(JSON_FILE)
    # 1. Create DB
    db_id = create_competitor_database(token, PARENT_PAGE_ID)
    # 2. Collect List of Competitors
    # We use the shortlist or candidates list to drive the iteration
    competitor_list = data.get('competitors_shortlist', [])
    if not competitor_list:
        competitor_list = data.get('competitor_candidates', [])
    print(f"Importing {len(competitor_list)} competitors...")
    for comp in competitor_list:
        c_name = comp.get('name')
        if not c_name: continue
        # Aggregate Data
        c_data = get_competitor_data(data, c_name)
        # Push to Notion
        add_competitor_entry(token, db_id, c_data)
    print("Import complete.")
 if __name__ == "__main__":
    main()
--- a/import_references_to_notion.py
+++ b/import_references_to_notion.py
@@ -1,143 +0,0 @@
 import json
 import os
 import requests
 import sys
 # Configuration
 JSON_FILE = 'analysis_robo-planet.de.json'
 TOKEN_FILE = 'notion_token.txt'
 # Root Page ID from notion_integration.md
 PARENT_PAGE_ID = "2e088f42-8544-8024-8289-deb383da3818" 
 DB_TITLE = "Competitive Radar - References"
 def load_json_data(filepath):
    """Loads the analysis JSON data."""
    try:
        with open(filepath, 'r') as f:
            return json.load(f)
    except FileNotFoundError:
        print(f"Error: File '{filepath}' not found.")
        sys.exit(1)
    except json.JSONDecodeError:
        print(f"Error: Failed to decode JSON from '{filepath}'.")
        sys.exit(1)
 def load_notion_token(filepath):
    """Loads the Notion API token."""
    try:
        with open(filepath, 'r') as f:
            return f.read().strip()
    except FileNotFoundError:
        print(f"Error: Token file '{filepath}' not found.")
        sys.exit(1)
 def create_database(token, parent_page_id):
    """Creates the References database in Notion."""
    url = "https://api.notion.com/v1/databases"
    headers = {
        "Authorization": f"Bearer {token}",
        "Notion-Version": "2022-06-28",
        "Content-Type": "application/json"
    }
    payload = {
        "parent": {"type": "page_id", "page_id": parent_page_id},
        "title": [{"type": "text", "text": {"content": DB_TITLE}}],
        "properties": {
            "Reference Name": {"title": {}},
            "Competitor": {"select": {}},
            "Industry": {"select": {}},
            "Snippet": {"rich_text": {}},
            "Case Study URL": {"url": {}}
        }
    }
    print(f"Creating database '{DB_TITLE}'...")
    response = requests.post(url, headers=headers, json=payload)
    if response.status_code != 200:
        print(f"Error creating database: {response.status_code}")
        print(response.text)
        sys.exit(1)
    db_data = response.json()
    print(f"Database created successfully! ID: {db_data['id']}")
    return db_data['id']
 def add_reference(token, db_id, competitor, reference):
    """Adds a single reference entry to the database."""
    url = "https://api.notion.com/v1/pages"
    headers = {
        "Authorization": f"Bearer {token}",
        "Notion-Version": "2022-06-28",
        "Content-Type": "application/json"
    }
    ref_name = reference.get("name", "Unknown Reference")
    industry = reference.get("industry", "Unknown")
    snippet = reference.get("testimonial_snippet", "")
    case_url = reference.get("case_study_url")
    # Truncate snippet if too long (Notion limit is 2000 chars for text block, but good practice anyway)
    if len(snippet) > 2000:
        snippet = snippet[:1997] + "..."
    properties = {
        "Reference Name": {"title": [{"text": {"content": ref_name}}]},
        "Competitor": {"select": {"name": competitor}},
        "Industry": {"select": {"name": industry}},
        "Snippet": {"rich_text": [{"text": {"content": snippet}}]}
    }
    if case_url and case_url.startswith("http"):
        properties["Case Study URL"] = {"url": case_url}
    payload = {
        "parent": {"database_id": db_id},
        "properties": properties
    }
    try:
        response = requests.post(url, headers=headers, json=payload)
        response.raise_for_status()
        print(f"  - Added: {ref_name}")
    except requests.exceptions.HTTPError as e:
        print(f"  - Failed to add {ref_name}: {e}")
        # print(response.text) # Uncomment for debugging
 def main():
    token = load_notion_token(TOKEN_FILE)
    data = load_json_data(JSON_FILE)
    # Check if we have reference analysis data
    ref_analysis = data.get("reference_analysis", [])
    if not ref_analysis:
        print("Warning: No reference analysis data found in JSON.")
        # We proceed anyway to create the DB and show it works, 
        # but in a real scenario we might want to exit or use dummy data.
    # 1. Create Database (You could also check if it exists, but for this script we create new)
    # Ideally, we would search for an existing DB first to avoid duplicates.
    # For this task, we create it.
    db_id = create_database(token, PARENT_PAGE_ID)
    # 2. Iterate and Import
    print("Importing references...")
    count = 0
    for entry in ref_analysis:
        competitor = entry.get("competitor_name", "Unknown Competitor")
        refs = entry.get("references", [])
        if not refs:
            print(f"Skipping {competitor} (No references found)")
            continue
        print(f"Processing {competitor} ({len(refs)} references)...")
        for ref in refs:
            add_reference(token, db_id, competitor, ref)
            count += 1
    print(f"\nImport complete! {count} references added.")
 if __name__ == "__main__":
    main()
--- a/import_relational_radar.py
+++ b/import_relational_radar.py
@@ -1,265 +0,0 @@
 import json
 import os
 import requests
 import sys
 # Configuration
 JSON_FILE = 'analysis_robo-planet.de.json'
 TOKEN_FILE = 'notion_token.txt'
 PARENT_PAGE_ID = "2e088f42-8544-8024-8289-deb383da3818"
 # Database Titles
 DB_TITLE_HUB = "📦 Competitive Radar (Companies)"
 DB_TITLE_LANDMINES = "💣 Competitive Radar (Landmines & Intel)"
 DB_TITLE_REFS = "🏆 Competitive Radar (References)"
 DB_TITLE_PRODUCTS = "🤖 Competitive Radar (Products)"
 def load_json_data(filepath):
    try:
        with open(filepath, 'r') as f:
            return json.load(f)
    except Exception as e:
        print(f"Error loading JSON: {e}")
        sys.exit(1)
 def load_notion_token(filepath):
    try:
        with open(filepath, 'r') as f:
            return f.read().strip()
    except Exception as e:
        print(f"Error loading token: {e}")
        sys.exit(1)
 def create_database(token, parent_page_id, title, properties):
    url = "https://api.notion.com/v1/databases"
    headers = {
        "Authorization": f"Bearer {token}",
        "Notion-Version": "2022-06-28",
        "Content-Type": "application/json"
    }
    payload = {
        "parent": {"type": "page_id", "page_id": parent_page_id},
        "title": [{"type": "text", "text": {"content": title}}],
        "properties": properties
    }
    response = requests.post(url, headers=headers, json=payload)
    if response.status_code != 200:
        print(f"Error creating DB '{title}': {response.status_code}")
        print(response.text)
        sys.exit(1)
    db_data = response.json()
    print(f"✅ Created DB '{title}' (ID: {db_data['id']})")
    return db_data['id']
 def create_page(token, db_id, properties):
    url = "https://api.notion.com/v1/pages"
    headers = {
        "Authorization": f"Bearer {token}",
        "Notion-Version": "2022-06-28",
        "Content-Type": "application/json"
    }
    payload = {
        "parent": {"database_id": db_id},
        "properties": properties
    }
    response = requests.post(url, headers=headers, json=payload)
    if response.status_code != 200:
        print(f"Error creating page: {response.status_code}")
        # print(response.text)
        return None
    return response.json()['id']
 def format_list_as_bullets(items):
    if not items: return ""
    return "\n".join([f"• {item}" for item in items])
 def main():
    token = load_notion_token(TOKEN_FILE)
    data = load_json_data(JSON_FILE)
    print("🚀 Starting Relational Import (Level 3 - Full Radar)...")
    # --- STEP 1: Define & Create Competitors Hub DB ---
    props_hub = {
        "Name": {"title": {}},
        "Website": {"url": {}},
        "Target Industries": {"multi_select": {}},
        "Portfolio Summary": {"rich_text": {}},
        "Silver Bullet": {"rich_text": {}},
        "USPs": {"rich_text": {}}
    }
    hub_db_id = create_database(token, PARENT_PAGE_ID, DB_TITLE_HUB, props_hub)
    # --- STEP 2: Define & Create Satellite DBs (Linked to Hub) ---
    # Landmines DB
    props_landmines = {
        "Statement / Question": {"title": {}},
        "Type": {"select": {
            "options": [
                {"name": "Landmine Question", "color": "red"},
                {"name": "Competitor Weakness", "color": "green"},
                {"name": "Competitor Strength", "color": "orange"}
            ]
        }},
        "Related Competitor": {
            "relation": {
                "database_id": hub_db_id, 
                "dual_property": {"synced_property_name": "Related Landmines & Intel"}
            }
        }
    }
    landmines_db_id = create_database(token, PARENT_PAGE_ID, DB_TITLE_LANDMINES, props_landmines)
    # References DB
    props_refs = {
        "Customer Name": {"title": {}},
        "Industry": {"select": {}},
        "Snippet": {"rich_text": {}},
        "Case Study URL": {"url": {}},
        "Related Competitor": {
            "relation": {
                "database_id": hub_db_id, 
                "dual_property": {"synced_property_name": "Related References"}
            }
        }
    }
    refs_db_id = create_database(token, PARENT_PAGE_ID, DB_TITLE_REFS, props_refs)
    # Products DB
    props_products = {
        "Product Name": {"title": {}},
        "Purpose / Description": {"rich_text": {}},
        "Related Competitor": {
            "relation": {
                "database_id": hub_db_id, 
                "dual_property": {"synced_property_name": "Related Products"}
            }
        }
    }
    products_db_id = create_database(token, PARENT_PAGE_ID, DB_TITLE_PRODUCTS, props_products)
    # --- STEP 3: Import Competitors (and store IDs) ---
    competitor_map = {} # Maps Name -> Notion Page ID
    competitors = data.get('competitors_shortlist', []) or data.get('competitor_candidates', [])
    print(f"\nImporting {len(competitors)} Competitors...")
    for comp in competitors:
        c_name = comp.get('name')
        if not c_name: continue
        # Gather Data
        c_url = comp.get('url', '')
        # Find extended analysis data
        analysis_data = next((a for a in data.get('analyses', []) if a.get('competitor', {}).get('name') == c_name), {})
        battlecard_data = next((b for b in data.get('battlecards', []) if b.get('competitor_name') == c_name), {})
        industries = analysis_data.get('target_industries', [])
        portfolio = analysis_data.get('portfolio', [])
        portfolio_text = "\n".join([f"{p.get('product')}: {p.get('purpose')}" for p in portfolio])
        usps = format_list_as_bullets(analysis_data.get('differentiators', []))
        silver_bullet = battlecard_data.get('silver_bullet', '')
        # Create Page
        props = {
            "Name": {"title": [{"text": {"content": c_name}}]},
            "Portfolio Summary": {"rich_text": [{"text": {"content": portfolio_text[:2000]}}]}, # Limit length
            "USPs": {"rich_text": [{"text": {"content": usps[:2000]}}]},
            "Silver Bullet": {"rich_text": [{"text": {"content": silver_bullet[:2000]}}]},
            "Target Industries": {"multi_select": [{"name": i.replace(',', '')} for i in industries]}
        }
        if c_url: props["Website"] = {"url": c_url}
        page_id = create_page(token, hub_db_id, props)
        if page_id:
            competitor_map[c_name] = page_id
            print(f"  - Created: {c_name}")
    # --- STEP 4: Import Landmines & Intel ---
    print("\nImporting Landmines & Intel...")
    for card in data.get('battlecards', []):
        c_name = card.get('competitor_name')
        comp_page_id = competitor_map.get(c_name)
        if not comp_page_id: continue
        # 1. Landmines
        for q in card.get('landmine_questions', []):
            props = {
                "Statement / Question": {"title": [{"text": {"content": q}}]},
                "Type": {"select": {"name": "Landmine Question"}},
                "Related Competitor": {"relation": [{"id": comp_page_id}]}
            }
            create_page(token, landmines_db_id, props)
        # 2. Weaknesses
        for point in card.get('strengths_vs_weaknesses', []):
            p_type = "Competitor Weakness" 
            props = {
                "Statement / Question": {"title": [{"text": {"content": point}}]},
                "Type": {"select": {"name": p_type}},
                "Related Competitor": {"relation": [{"id": comp_page_id}]}
            }
            create_page(token, landmines_db_id, props)
    print("  - Landmines imported.")
    # --- STEP 5: Import References ---
    print("\nImporting References...")
    count_refs = 0
    for ref_group in data.get('reference_analysis', []):
        c_name = ref_group.get('competitor_name')
        comp_page_id = competitor_map.get(c_name)
        if not comp_page_id: continue
        for ref in ref_group.get('references', []):
            r_name = ref.get('name', 'Unknown')
            r_industry = ref.get('industry', 'Unknown')
            r_snippet = ref.get('testimonial_snippet', '')
            r_url = ref.get('case_study_url', '')
            props = {
                "Customer Name": {"title": [{"text": {"content": r_name}}]},
                "Industry": {"select": {"name": r_industry}},
                "Snippet": {"rich_text": [{"text": {"content": r_snippet[:2000]}}]},
                "Related Competitor": {"relation": [{"id": comp_page_id}]}
            }
            if r_url and r_url.startswith('http'):
                props["Case Study URL"] = {"url": r_url}
            create_page(token, refs_db_id, props)
            count_refs += 1
    print(f"  - {count_refs} References imported.")
    # --- STEP 6: Import Products (Portfolio) ---
    print("\nImporting Products...")
    count_prods = 0
    for analysis in data.get('analyses', []):
        c_name = analysis.get('competitor', {}).get('name')
        comp_page_id = competitor_map.get(c_name)
        if not comp_page_id: continue
        for prod in analysis.get('portfolio', []):
            p_name = prod.get('product', 'Unknown Product')
            p_purpose = prod.get('purpose', '')
            props = {
                "Product Name": {"title": [{"text": {"content": p_name}}]},
                "Purpose / Description": {"rich_text": [{"text": {"content": p_purpose[:2000]}}]},
                "Related Competitor": {"relation": [{"id": comp_page_id}]}
            }
            create_page(token, products_db_id, props)
            count_prods += 1
    print(f"  - {count_prods} Products imported.")
    print("\n✅ Relational Import Complete (Level 3)!")
 if __name__ == "__main__":
    main()
--- a/refresh_references.py
+++ b/refresh_references.py
@@ -1,35 +0,0 @@
 import asyncio
 import json
 import os
 import sys
 # Path to the orchestrator
 sys.path.append(os.path.join(os.getcwd(), 'competitor-analysis-app'))
 from competitor_analysis_orchestrator import analyze_single_competitor_references
 async def refresh_references():
    json_path = 'analysis_robo-planet.de.json'
    with open(json_path, 'r') as f:
        data = json.load(f)
    competitors = data.get('competitors_shortlist', [])
    if not competitors:
        competitors = data.get('competitor_candidates', [])
    print(f"Refreshing references for {len(competitors)} competitors...")
    tasks = [analyze_single_competitor_references(c) for c in competitors]
    results = await asyncio.gather(*tasks)
    # Filter and update
    data['reference_analysis'] = [r for r in results if r is not None]
    with open(json_path, 'w') as f:
        json.dump(data, f, indent=2)
    print(f"Successfully updated {json_path} with grounded reference data.")
 if __name__ == "__main__":
    asyncio.run(refresh_references())