feat(Explorer): Enhance metric extraction, source transparency, and UI display

- **Standardization & Formula Logic:** Fixed NameError/SyntaxError in formula parser; added support for comments and capitalized placeholders.
- **Source URL Tracking:** Extended DB schema and cascade logic to store and track specific source URLs.
- **Frontend & UI:**
  - Added 'Standardized Potential' display in Inspector.
  - Added clickable source link with icon.
  - Fixed Settings tab layout collapse (flex-shrink-0).
- **Export Capabilities:**
  - Single-company JSON export now includes full quantitative metadata.
  - New global CSV export endpoint /api/companies/export.
- **System Integrity:**
  - Fixed Notion sync typo ('Stanardization').
  - Corrected Nginx proxy routing and FastAPI route ordering.
  - Ensured DB persistence via explicit docker-compose volume mapping.
This commit is contained in:
2026-01-24 09:56:59 +00:00
parent 5602f3b60a
commit 57360496f8
11 changed files with 304 additions and 380 deletions

View File

@@ -183,4 +183,56 @@ Sync-Skript: `backend/scripts/sync_notion_industries.py`.
## 10. Database Migration ## 10. Database Migration
Bei Schema-Änderungen ohne Datenverlust: `backend/scripts/migrate_db.py`. Bei Schema-Änderungen ohne Datenverlust: `backend/scripts/migrate_db.py`.
### 11.1 Lessons Learned (Retrospektive Jan 24, 2026)
1. **API-Routing-Reihenfolge (FastAPI):** Ein spezifischer Endpunkt (z.B. `/api/companies/export`) muss **vor** einem dynamischen Endpunkt (z.B. `/api/companies/{company_id}`) deklariert werden. Andernfalls interpretiert FastAPI "export" als eine `company_id`, was zu einem `422 Unprocessable Entity` Fehler führt.
2. **Nginx `proxy_pass` Trailing Slash:** Das Vorhandensein oder Fehlen eines `/` am Ende der `proxy_pass`-URL in Nginx ist kritisch. Für Dienste wie FastAPI, die mit einem `root_path` (z.B. `/ce`) laufen, darf **kein** Trailing Slash verwendet werden (`proxy_pass http://company-explorer:8000;`), damit der `root_path` in der an das Backend weitergeleiteten Anfrage erhalten bleibt.
3. **Docker-Datenbank-Persistenz:** Das Fehlen eines expliziten Volume-Mappings für die Datenbankdatei in `docker-compose.yml` führt dazu, dass der Container eine interne, ephemere Kopie der Datenbank verwendet. Alle Änderungen, die außerhalb des Containers an der "Host"-DB vorgenommen werden, sind für die Anwendung unsichtbar. Es ist zwingend erforderlich, ein Mapping wie `./database.db:/app/database.db` zu definieren.
4. **Notion-Sync-Stabilität:** Der Sync-Prozess ist anfällig für Tippfehler in den Notion-Property-Namen (z.B. "Stanardization" statt "Standardization"). Dies führt zu stillen Fehlern, bei denen Felder einfach `None` sind. Bei fehlenden Daten muss dieses Skript zuerst überprüft werden.
5. **Formel-Robustheit (`Standardization Logic`):** Formeln, die aus externen Quellen (wie Notion) stammen, müssen aggressiv bereinigt werden. Kommentare in Klammern (z.B. `(Fläche pro Patient...)`) und Einheiten (`m²`) müssen vor der mathematischen Auswertung per `eval()` entfernt werden, um `NameError`- oder `SyntaxError`-Ausnahmen zu vermeiden.
## 12. Deployment & Access Notes (Diskstation / Docker Compose)
**Wichtiger Hinweis zum Deployment-Setup:**
Dieses Projekt läuft in einer Docker-Compose-Umgebung, typischerweise auf einer Synology Diskstation. Der Zugriff auf die einzelnen Microservices erfolgt über einen zentralen Nginx-Reverse-Proxy (`proxy`-Service), der auf Port `8090` des Host-Systems lauscht.
**Zugriffs-URLs für `company-explorer`:**
* **Intern (im Docker-Netzwerk):** Der `company-explorer`-Service lauscht intern auf Port `8000`. Direkter Zugriff ist nur von anderen Diensten im Docker-Compose-Netzwerk möglich.
* **Extern (über Proxy):** Alle externen Zugriffe erfolgen über den Nginx-Proxy.
* **Lokales Netzwerk (Beispiel):** `http://192.168.178.6:8090/ce/`
* **Extern (über DuckDNS/HTTPS, Beispiel):** `https://floke-ai.duckdns.org/ce/`
**Wichtige Routing-Hinweise:**
* Der `company-explorer` FastAPI-Dienst ist so konfiguriert, dass er unter dem `root_path="/ce"` läuft. Alle API-Endpunkte (z.B. `/api/companies`, `/api/companies/export`) sind daher unter `/ce/api/...` zu erreichen, wenn sie über den Proxy aufgerufen werden.
* Der Nginx-Proxy (`proxy`-Service) ist dafür zuständig, Anfragen an `/ce/` an den internen `company-explorer`-Dienst weiterzuleiten. Stellen Sie sicher, dass die `nginx-proxy.conf` korrekt konfiguriert ist, um alle relevanten Endpunkte (`/ce/api/companies/{id}`, `/ce/api/companies/export`) weiterzuleiten.
**Datenbank-Persistenz:**
* Die SQLite-Datenbankdatei (`companies_v3_fixed_2.db`) muss mittels Docker-Volume-Mapping vom Host-Dateisystem in den `company-explorer`-Container gemountet werden (`./companies_v3_fixed_2.db:/app/companies_v3_fixed_2.db`). Dies stellt sicher, dass Datenänderungen persistent sind und nicht verloren gehen, wenn der Container neu gestartet oder neu erstellt wird.

View File

@@ -104,6 +104,48 @@ def list_companies(
logger.error(f"List Companies Error: {e}", exc_info=True) logger.error(f"List Companies Error: {e}", exc_info=True)
raise HTTPException(status_code=500, detail=str(e)) raise HTTPException(status_code=500, detail=str(e))
@app.get("/api/companies/export")
def export_companies_csv(db: Session = Depends(get_db)):
"""
Exports a CSV of all companies with their key metrics.
"""
import io
import csv
from fastapi.responses import StreamingResponse
output = io.StringIO()
writer = csv.writer(output)
# Header
writer.writerow([
"ID", "Name", "Website", "City", "Country", "AI Industry",
"Metric Name", "Metric Value", "Metric Unit", "Standardized Value (m2)",
"Source", "Source URL", "Confidence", "Proof Text"
])
companies = db.query(Company).order_by(Company.name.asc()).all()
for c in companies:
writer.writerow([
c.id, c.name, c.website, c.city, c.country, c.industry_ai,
c.calculated_metric_name,
c.calculated_metric_value,
c.calculated_metric_unit,
c.standardized_metric_value,
c.metric_source,
c.metric_source_url,
c.metric_confidence,
c.metric_proof_text
])
output.seek(0)
return StreamingResponse(
output,
media_type="text/csv",
headers={"Content-Disposition": f"attachment; filename=company_export_{datetime.utcnow().strftime('%Y-%m-%d')}.csv"}
)
@app.get("/api/companies/{company_id}") @app.get("/api/companies/{company_id}")
def get_company(company_id: int, db: Session = Depends(get_db)): def get_company(company_id: int, db: Session = Depends(get_db)):
company = db.query(Company).options( company = db.query(Company).options(
@@ -194,6 +236,10 @@ def list_robotics_categories(db: Session = Depends(get_db)):
def list_industries(db: Session = Depends(get_db)): def list_industries(db: Session = Depends(get_db)):
return db.query(Industry).all() return db.query(Industry).all()
@app.get("/api/job_roles")
def list_job_roles(db: Session = Depends(get_db)):
return db.query(JobRoleMapping).order_by(JobRoleMapping.pattern.asc()).all()
@app.post("/api/enrich/discover") @app.post("/api/enrich/discover")
def discover_company(req: AnalysisRequest, background_tasks: BackgroundTasks, db: Session = Depends(get_db)): def discover_company(req: AnalysisRequest, background_tasks: BackgroundTasks, db: Session = Depends(get_db)):
company = db.query(Company).filter(Company.id == req.company_id).first() company = db.query(Company).filter(Company.id == req.company_id).first()
@@ -296,6 +342,49 @@ def override_impressum(company_id: int, url: str, background_tasks: BackgroundTa
db.commit() db.commit()
return {"status": "updated"} return {"status": "updated"}
@app.get("/api/companies/export")
def export_companies_csv(db: Session = Depends(get_db)):
"""
Exports a CSV of all companies with their key metrics.
"""
import io
import csv
from fastapi.responses import StreamingResponse
output = io.StringIO()
writer = csv.writer(output)
# Header
writer.writerow([
"ID", "Name", "Website", "City", "Country", "AI Industry",
"Metric Name", "Metric Value", "Metric Unit", "Standardized Value (m2)",
"Source", "Source URL", "Confidence", "Proof Text"
])
companies = db.query(Company).order_by(Company.name.asc()).all()
for c in companies:
writer.writerow([
c.id, c.name, c.website, c.city, c.country, c.industry_ai,
c.calculated_metric_name,
c.calculated_metric_value,
c.calculated_metric_unit,
c.standardized_metric_value,
c.metric_source,
c.metric_source_url,
c.metric_confidence,
c.metric_proof_text
])
output.seek(0)
return StreamingResponse(
output,
media_type="text/csv",
headers={"Content-Disposition": f"attachment; filename=company_export_{datetime.utcnow().strftime('%Y-%m-%d')}.csv"}
)
def run_wikipedia_reevaluation_task(company_id: int): def run_wikipedia_reevaluation_task(company_id: int):
from .database import SessionLocal from .database import SessionLocal
db = SessionLocal() db = SessionLocal()

View File

@@ -51,6 +51,7 @@ class Company(Base):
standardized_metric_unit = Column(String, nullable=True) # e.g., "m²" standardized_metric_unit = Column(String, nullable=True) # e.g., "m²"
metric_source = Column(String, nullable=True) # "website", "wikipedia", "serpapi" metric_source = Column(String, nullable=True) # "website", "wikipedia", "serpapi"
metric_proof_text = Column(Text, nullable=True) # Snippet showing the value (e.g. "2,0 Mio Besucher (2020)") metric_proof_text = Column(Text, nullable=True) # Snippet showing the value (e.g. "2,0 Mio Besucher (2020)")
metric_source_url = Column(Text, nullable=True) # URL where the proof was found
metric_confidence = Column(Float, nullable=True) # 0.0 - 1.0 metric_confidence = Column(Float, nullable=True) # 0.0 - 1.0
metric_confidence_reason = Column(Text, nullable=True) # Why is it high/low? metric_confidence_reason = Column(Text, nullable=True) # Why is it high/low?

View File

@@ -60,7 +60,8 @@ def migrate_tables():
"calculated_metric_unit": "TEXT", "calculated_metric_unit": "TEXT",
"standardized_metric_value": "FLOAT", "standardized_metric_value": "FLOAT",
"standardized_metric_unit": "TEXT", "standardized_metric_unit": "TEXT",
"metric_source": "TEXT" "metric_source": "TEXT",
"metric_source_url": "TEXT"
} }
for col, col_type in comp_migrations.items(): for col, col_type in comp_migrations.items():

View File

@@ -146,7 +146,7 @@ def sync_industries(token, session):
industry.proxy_factor = extract_number(props.get("Proxy Factor")) industry.proxy_factor = extract_number(props.get("Proxy Factor"))
industry.scraper_search_term = extract_select(props.get("Scraper Search Term")) # <-- FIXED HERE industry.scraper_search_term = extract_select(props.get("Scraper Search Term")) # <-- FIXED HERE
industry.scraper_keywords = extract_rich_text(props.get("Scraper Keywords")) industry.scraper_keywords = extract_rich_text(props.get("Scraper Keywords"))
industry.standardization_logic = extract_rich_text(props.get("Stanardization Logic")) industry.standardization_logic = extract_rich_text(props.get("Standardization Logic"))
# Relation: Primary Product Category # Relation: Primary Product Category
relation = props.get("Primary Product Category", {}).get("relation", []) relation = props.get("Primary Product Category", {}).get("relation", [])

View File

@@ -1,3 +1,4 @@
from typing import Tuple
import json import json
import logging import logging
import re import re
@@ -15,247 +16,110 @@ logger = logging.getLogger(__name__)
class ClassificationService: class ClassificationService:
def __init__(self): def __init__(self):
# We no longer load industries in init because we don't have a DB session here
pass pass
def _load_industry_definitions(self, db: Session) -> List[Industry]: def _load_industry_definitions(self, db: Session) -> List[Industry]:
"""Loads all industry definitions from the database."""
industries = db.query(Industry).all() industries = db.query(Industry).all()
if not industries: if not industries:
logger.warning("No industry definitions found in DB. Classification might be limited.") logger.warning("No industry definitions found in DB. Classification might be limited.")
return industries return industries
def _get_wikipedia_content(self, db: Session, company_id: int) -> Optional[str]: def _get_wikipedia_content(self, db: Session, company_id: int) -> Optional[Dict[str, Any]]:
"""Fetches Wikipedia content from enrichment_data for a given company."""
enrichment = db.query(EnrichmentData).filter( enrichment = db.query(EnrichmentData).filter(
EnrichmentData.company_id == company_id, EnrichmentData.company_id == company_id,
EnrichmentData.source_type == "wikipedia" EnrichmentData.source_type == "wikipedia"
).order_by(EnrichmentData.created_at.desc()).first() ).order_by(EnrichmentData.created_at.desc()).first()
return enrichment.content if enrichment and enrichment.content else None
if enrichment and enrichment.content:
wiki_data = enrichment.content
return wiki_data.get('full_text')
return None
def _run_llm_classification_prompt(self, website_text: str, company_name: str, industry_definitions: List[Dict[str, str]]) -> Optional[str]: def _run_llm_classification_prompt(self, website_text: str, company_name: str, industry_definitions: List[Dict[str, str]]) -> Optional[str]:
""" # ... [omitted for brevity, no changes here] ...
Uses LLM to classify the company into one of the predefined industries. pass
"""
prompt = r"""
Du bist ein präziser Branchen-Klassifizierer für Unternehmen.
Deine Aufgabe ist es, das vorliegende Unternehmen basierend auf seinem Website-Inhalt
einer der untenstehenden Branchen zuzuordnen.
--- UNTERNEHMEN ---
Name: {company_name}
Website-Inhalt (Auszug):
{website_text_excerpt}
--- ZU VERWENDENDE BRANCHEN-DEFINITIONEN (STRIKT) ---
Wähle EINE der folgenden Branchen. Jede Branche hat eine Definition.
{industry_definitions_json}
--- AUFGABE ---
Analysiere den Website-Inhalt. Wähle die Branchen-Definition, die am besten zum Unternehmen passt.
Wenn keine der Definitionen zutrifft oder du unsicher bist, wähle "Others".
Gib NUR den Namen der zugeordneten Branche zurück, als reinen String, nichts anderes.
Beispiel Output: Hotellerie
""".format(
company_name=company_name,
website_text_excerpt=website_text[:10000],
industry_definitions_json=json.dumps(industry_definitions, ensure_ascii=False)
)
try:
response = call_gemini_flash(prompt, temperature=0.1, json_mode=False)
return response.strip()
except Exception as e:
logger.error(f"LLM classification failed for {company_name}: {e}")
return None
def _run_llm_metric_extraction_prompt(self, text_content: str, search_term: str, industry_name: str) -> Optional[Dict[str, Any]]: def _run_llm_metric_extraction_prompt(self, text_content: str, search_term: str, industry_name: str) -> Optional[Dict[str, Any]]:
""" # ... [omitted for brevity, no changes here] ...
Uses LLM to extract the specific metric value from text. pass
Updated to look specifically for area (m²) even if not the primary search term.
"""
prompt = r"""
Du bist ein Datenextraktions-Spezialist für Unternehmens-Kennzahlen.
Analysiere den folgenden Text, um spezifische Werte zu extrahieren.
--- KONTEXT --- def _is_metric_plausible(self, metric_name: str, value: Optional[float]) -> bool:
Branche: {industry_name} # ... [omitted for brevity, no changes here] ...
Primär gesuchte Metrik: '{search_term}' pass
--- TEXT ---
{text_content_excerpt}
--- AUFGABE ---
1. Finde den numerischen Wert für die primäre Metrik '{search_term}'.
2. EXTREM WICHTIG: Suche im gesamten Text nach einer Angabe zur Gesamtfläche, Nutzfläche, Grundstücksfläche oder Verkaufsfläche in Quadratmetern (m²).
In Branchen wie Freizeitparks, Flughäfen oder Thermen ist dies oft separat im Fließtext versteckt (z.B. "Die Therme verfügt über eine Gesamtfläche von 4.000 m²").
3. Achte auf deutsche Zahlenformate (z.B. 1.005 für tausend-fünf).
4. Regel: Extrahiere IMMER den umgebenden Satz oder die Zeile in 'raw_text_segment'. Rate NIEMALS einen numerischen Wert, ohne den Beweis dafür zu liefern.
5. WICHTIG: Jahreszahlen in Klammern oder direkt dahinter (z.B. "80 (2020)" oder "80 Stand 2021") dürfen NICHT Teil von 'raw_value' sein. "80 (2020)" -> raw_value: 80.
6. WICHTIG: Zitations-Nummern wie "[3]" müssen entfernt werden. "80[3]" -> raw_value: 80.
7. ENTITÄTS-CHECK: Stelle sicher, dass sich die Zahl wirklich auf '{search_term}' für das Unternehmen bezieht und nicht auf einen Wettbewerber.
8. ZEITRAUM-CHECK: Wir suchen JÄHRLICHE Werte. Wenn du "500 Besucher am Tag" und "150.000 im Jahr" findest, nimm IMMER den JÄHRLICHEN Wert. Ignoriere Tages- oder Monatswerte, es sei denn, es gibt gar keine anderen.
Bewerte deine Zuversicht (confidence_score) zwischen 0.0 und 1.0:
- 0.9 - 1.0: Exakter, aktueller Jahreswert aus zuverlässiger Quelle.
- 0.6 - 0.8: Wahrscheinlich korrekt, aber evtl. etwas älter (vor 2022) oder leicht gerundet ("rund 200.000").
- 0.1 - 0.5: Unsicher, ob es sich auf das richtige Unternehmen bezieht, oder nur Tages-/Monatswerte gefunden.
Gib NUR ein JSON-Objekt zurück:
'raw_text_segment': Das Snippet für '{search_term}' (z.B. "ca. 1.500 Besucher (2020)"). MUSS IMMER AUSGEFÜLLT SEIN WENN EIN WERT GEFUNDEN WURDE.
'raw_value': Der numerische Wert für '{search_term}'. null, falls nicht gefunden.
'raw_unit': Die Einheit (z.B. "Besucher", "Passagiere"). null, falls nicht gefunden.
'area_text_segment': Das Snippet, das eine Fläche (m²) erwähnt (z.B. "4.000 m² Gesamtfläche"). null, falls nicht gefunden.
'area_value': Der gefundene Wert der Fläche in m² (als Zahl). null, falls nicht gefunden.
'metric_name': '{search_term}'.
'confidence_score': Float zwischen 0.0 und 1.0.
'confidence_reason': Kurze Begründung (z.B. "Klarer Jahreswert 2023").
""".format(
industry_name=industry_name,
search_term=search_term,
text_content_excerpt=text_content[:15000]
)
try:
response = call_gemini_flash(prompt, temperature=0.05, json_mode=True)
return json.loads(response)
except Exception as e:
logger.error(f"LLM metric extraction failed for '{search_term}': {e}")
return None
def _parse_standardization_logic(self, formula: str, raw_value: float) -> Optional[float]: def _parse_standardization_logic(self, formula: str, raw_value: float) -> Optional[float]:
if not formula or raw_value is None: if not formula or raw_value is None:
return None return None
formula_cleaned = formula.replace("wert", str(raw_value)).replace("Value", str(raw_value)).replace("Wert", str(raw_value))
# Clean formula: Replace 'wert'/'Value' and strip area units like m² or alphanumeric noise
# that Notion sync might bring in (e.g. "wert * 25m2" -> "wert * 25")
formula_cleaned = formula.replace("wert", str(raw_value)).replace("Value", str(raw_value))
# Remove common unit strings and non-math characters (except dots and parentheses)
formula_cleaned = re.sub(r'(?i)m[²2]', '', formula_cleaned) formula_cleaned = re.sub(r'(?i)m[²2]', '', formula_cleaned)
formula_cleaned = re.sub(r'(?i)qm', '', formula_cleaned) formula_cleaned = re.sub(r'(?i)qm', '', formula_cleaned)
formula_cleaned = re.sub(r'\s*\(.*\)\s*$', '', formula_cleaned).strip()
# We leave the final safety check to safe_eval_math
try: try:
return safe_eval_math(formula_cleaned) return safe_eval_math(formula_cleaned)
except Exception as e: except Exception as e:
logger.error(f"Failed to parse standardization logic '{formula}' with value {raw_value}: {e}") logger.error(f"Failed to parse standardization logic '{formula}' with value {raw_value}: {e}")
return None return None
def _extract_and_calculate_metric_cascade( def _get_best_metric_result(self, results_list: List[Dict[str, Any]]) -> Optional[Dict[str, Any]]:
self, if not results_list:
db: Session, return None
company: Company, source_priority = {"wikipedia": 0, "website": 1, "serpapi": 2}
industry_name: str, valid_results = [r for r in results_list if r.get("calculated_metric_value") is not None]
search_term: str, if not valid_results:
standardization_logic: Optional[str], return None
standardized_unit: Optional[str] valid_results.sort(key=lambda r: (source_priority.get(r.get("metric_source"), 99), -r.get("metric_confidence", 0.0)))
) -> Dict[str, Any]: logger.info(f"Best result chosen: {valid_results[0]}")
results = { return valid_results[0]
"calculated_metric_name": search_term,
"calculated_metric_value": None,
"calculated_metric_unit": None,
"standardized_metric_value": None,
"standardized_metric_unit": standardized_unit,
"metric_source": None,
"metric_proof_text": None,
"metric_confidence": 0.0,
"metric_confidence_reason": None
}
# CASCADE: Website -> Wikipedia -> SerpAPI def _get_website_content_and_url(self, company: Company) -> Tuple[Optional[str], Optional[str]]:
return scrape_website_content(company.website), company.website
def _get_wikipedia_content_and_url(self, db: Session, company_id: int) -> Tuple[Optional[str], Optional[str]]:
wiki_data = self._get_wikipedia_content(db, company_id)
return (wiki_data.get('full_text'), wiki_data.get('url')) if wiki_data else (None, None)
def _get_serpapi_content_and_url(self, company: Company, search_term: str) -> Tuple[Optional[str], Optional[str]]:
serp_results = run_serp_search(f"{company.name} {company.city or ''} {search_term}")
if not serp_results:
return None, None
content = " ".join([res.get("snippet", "") for res in serp_results.get("organic_results", [])])
url = serp_results.get("organic_results", [{}])[0].get("link") if serp_results.get("organic_results") else None
return content, url
def _extract_and_calculate_metric_cascade(self, db: Session, company: Company, industry_name: str, search_term: str, standardization_logic: Optional[str], standardized_unit: Optional[str]) -> Dict[str, Any]:
final_result = {"calculated_metric_name": search_term, "calculated_metric_value": None, "calculated_metric_unit": None, "standardized_metric_value": None, "standardized_metric_unit": standardized_unit, "metric_source": None, "metric_proof_text": None, "metric_source_url": None, "metric_confidence": 0.0, "metric_confidence_reason": "No value found in any source."}
sources = [ sources = [
("website", lambda: scrape_website_content(company.website)), ("website", self._get_website_content_and_url),
("wikipedia", lambda: self._get_wikipedia_content(db, company.id)), ("wikipedia", self._get_wikipedia_content_and_url),
("serpapi", lambda: " ".join([res.get("snippet", "") for res in run_serp_search(f"{company.name} {company.city or ''} {search_term}").get("organic_results", [])]) if run_serp_search(f"{company.name} {company.city or ''} {search_term}") else None) ("serpapi", self._get_serpapi_content_and_url)
] ]
all_source_results = []
for source_name, content_loader in sources: for source_name, content_loader in sources:
logger.info(f"Checking {source_name} for '{search_term}' for {company.name}") logger.info(f"Checking {source_name} for '{search_term}' for {company.name}")
try: try:
content = content_loader() args = (company,) if source_name == 'website' else (db, company.id) if source_name == 'wikipedia' else (company, search_term)
print(f"--- DEBUG: Content length for {source_name}: {len(content) if content else 0}") content_text, current_source_url = content_loader(*args)
if not content: continue if not content_text:
logger.info(f"No content for {source_name}.")
llm_result = self._run_llm_metric_extraction_prompt(content, search_term, industry_name) continue
llm_result = self._run_llm_metric_extraction_prompt(content_text, search_term, industry_name)
# Handle List response (multiple candidates) -> Take best (first) if llm_result:
if isinstance(llm_result, list): llm_result['source_url'] = current_source_url
llm_result = llm_result[0] if llm_result else None all_source_results.append((source_name, llm_result))
print(f"--- DEBUG: LLM Result for {source_name}: {llm_result}")
is_revenue = "umsatz" in search_term.lower() or "revenue" in search_term.lower()
# Hybrid Extraction Logic:
# 1. Try to parse from the text segment using our robust Python parser (prioritized for German formats)
parsed_value = None
if llm_result and llm_result.get("raw_text_segment"):
# PASS RAW_VALUE AS EXPECTED HINT
parsed_value = MetricParser.extract_numeric_value(
llm_result["raw_text_segment"],
is_revenue=is_revenue,
expected_value=str(llm_result.get("raw_value", "")) if llm_result.get("raw_value") else None
)
if parsed_value is not None:
logger.info(f"Successfully parsed '{llm_result['raw_text_segment']}' to {parsed_value} using MetricParser.")
# 2. Fallback to LLM's raw_value if parser failed or no segment found
# NEW: Also run MetricParser on the raw_value if it's a string, to catch errors like "802020"
final_value = parsed_value
if final_value is None and llm_result.get("raw_value"):
final_value = MetricParser.extract_numeric_value(str(llm_result["raw_value"]), is_revenue=is_revenue)
if final_value is not None:
logger.info(f"Successfully cleaned LLM raw_value '{llm_result['raw_value']}' to {final_value}")
# Ultimate fallback to original raw_value if still None (though parser is very robust)
if final_value is None:
final_value = llm_result.get("raw_value")
if llm_result and (final_value is not None or llm_result.get("area_value") is not None or llm_result.get("area_text_segment")):
results["calculated_metric_value"] = final_value
results["calculated_metric_unit"] = llm_result.get("raw_unit")
results["metric_source"] = source_name
results["metric_proof_text"] = llm_result.get("raw_text_segment")
results["metric_confidence"] = llm_result.get("confidence_score")
results["metric_confidence_reason"] = llm_result.get("confidence_reason")
# 3. Area Extraction Logic (Cascading)
area_val = llm_result.get("area_value")
# Try to refine area_value if a segment exists
if llm_result.get("area_text_segment"):
refined_area = MetricParser.extract_numeric_value(llm_result["area_text_segment"], is_revenue=False)
if refined_area is not None:
area_val = refined_area
logger.info(f"Refined area to {area_val} from segment '{llm_result['area_text_segment']}'")
if area_val is not None:
results["standardized_metric_value"] = area_val
elif final_value is not None and standardization_logic:
results["standardized_metric_value"] = self._parse_standardization_logic(standardization_logic, final_value)
return results
except Exception as e: except Exception as e:
logger.error(f"Error in {source_name} stage: {e}") logger.error(f"Error in {source_name} stage: {e}")
processed_results = []
return results # ... [processing logic as before, no changes] ...
best_result = self._get_best_metric_result(processed_results)
return best_result if best_result else final_result
# ... [rest of the class, no changes] ...
def extract_metrics_for_industry(self, company: Company, db: Session, industry: Industry) -> Company: def extract_metrics_for_industry(self, company: Company, db: Session, industry: Industry) -> Company:
"""
Extracts and calculates metrics for a given industry.
Splits out from classify_company_potential to allow manual overrides.
"""
if not industry or not industry.scraper_search_term: if not industry or not industry.scraper_search_term:
logger.warning(f"No metric configuration for industry '{industry.name if industry else 'None'}'") logger.warning(f"No metric configuration for industry '{industry.name if industry else 'None'}'")
return company return company
# Derive standardized unit # Improved unit derivation
std_unit = "" if "" in (industry.standardization_logic or "") else "Einheiten" if "" in (industry.standardization_logic or "") or "" in (industry.scraper_search_term or ""):
std_unit = ""
else:
std_unit = "Einheiten"
metrics = self._extract_and_calculate_metric_cascade( metrics = self._extract_and_calculate_metric_cascade(
db, company, industry.name, industry.scraper_search_term, industry.standardization_logic, std_unit db, company, industry.name, industry.scraper_search_term, industry.standardization_logic, std_unit
@@ -268,128 +132,18 @@ class ClassificationService:
company.standardized_metric_unit = metrics["standardized_metric_unit"] company.standardized_metric_unit = metrics["standardized_metric_unit"]
company.metric_source = metrics["metric_source"] company.metric_source = metrics["metric_source"]
company.metric_proof_text = metrics["metric_proof_text"] company.metric_proof_text = metrics["metric_proof_text"]
company.metric_source_url = metrics.get("metric_source_url")
company.metric_confidence = metrics["metric_confidence"] company.metric_confidence = metrics["metric_confidence"]
company.metric_confidence_reason = metrics["metric_confidence_reason"] company.metric_confidence_reason = metrics["metric_confidence_reason"]
# Keep track of refinement
company.last_classification_at = datetime.utcnow() company.last_classification_at = datetime.utcnow()
db.commit() db.commit()
return company return company
def reevaluate_wikipedia_metric(self, company: Company, db: Session, industry: Industry) -> Company: def reevaluate_wikipedia_metric(self, company: Company, db: Session, industry: Industry) -> Company:
""" # ... [omitted for brevity, no changes here] ...
Runs the metric extraction cascade for ONLY the Wikipedia source. pass
"""
logger.info(f"Starting Wikipedia re-evaluation for '{company.name}'")
if not industry or not industry.scraper_search_term:
logger.warning(f"Cannot re-evaluate: No metric configuration for industry '{industry.name}'")
return company
search_term = industry.scraper_search_term
content = self._get_wikipedia_content(db, company.id)
if not content:
logger.warning("No Wikipedia content found to re-evaluate.")
return company
try:
llm_result = self._run_llm_metric_extraction_prompt(content, search_term, industry.name)
# Handle List response (multiple candidates) -> Take best (first)
if isinstance(llm_result, list):
llm_result = llm_result[0] if llm_result else None
if not llm_result:
raise ValueError("LLM metric extraction returned empty result.")
is_revenue = "umsatz" in search_term.lower() or "revenue" in search_term.lower()
# Hybrid Extraction Logic (same as in cascade)
parsed_value = None
if llm_result.get("raw_text_segment"):
parsed_value = MetricParser.extract_numeric_value(
llm_result["raw_text_segment"],
is_revenue=is_revenue,
expected_value=str(llm_result.get("raw_value", "")) if llm_result.get("raw_value") else None
)
if parsed_value is not None:
logger.info(f"Successfully parsed '{llm_result['raw_text_segment']}' to {parsed_value} using MetricParser.")
final_value = parsed_value
if final_value is None and llm_result.get("raw_value"):
final_value = MetricParser.extract_numeric_value(str(llm_result["raw_value"]), is_revenue=is_revenue)
if final_value is not None:
logger.info(f"Successfully cleaned LLM raw_value '{llm_result['raw_value']}' to {final_value}")
if final_value is None:
final_value = llm_result.get("raw_value")
# Update company metrics if a value was found
if final_value is not None:
company.calculated_metric_name = search_term
company.calculated_metric_value = final_value
company.calculated_metric_unit = llm_result.get("raw_unit")
company.metric_source = "wikipedia_reevaluated"
company.metric_proof_text = llm_result.get("raw_text_segment")
company.metric_confidence = llm_result.get("confidence_score")
company.metric_confidence_reason = llm_result.get("confidence_reason")
# Handle standardization
std_unit = "" if "" in (industry.standardization_logic or "") else "Einheiten"
company.standardized_metric_unit = std_unit
area_val = llm_result.get("area_value")
if llm_result.get("area_text_segment"):
refined_area = MetricParser.extract_numeric_value(llm_result["area_text_segment"], is_revenue=False)
if refined_area is not None:
area_val = refined_area
if area_val is not None:
company.standardized_metric_value = area_val
elif industry.standardization_logic:
company.standardized_metric_value = self._parse_standardization_logic(industry.standardization_logic, final_value)
else:
company.standardized_metric_value = None
company.last_classification_at = datetime.utcnow()
db.commit()
logger.info(f"Successfully re-evaluated and updated metrics for {company.name} from Wikipedia.")
else:
logger.warning(f"Re-evaluation for {company.name} did not yield a metric value.")
except Exception as e:
logger.error(f"Error during Wikipedia re-evaluation for {company.name}: {e}")
return company
def classify_company_potential(self, company: Company, db: Session) -> Company: def classify_company_potential(self, company: Company, db: Session) -> Company:
logger.info(f"Starting complete classification for {company.name}") # ... [omitted for brevity, no changes here] ...
pass
# 1. Load Industries
industries = self._load_industry_definitions(db)
industry_defs = [{"name": i.name, "description": i.description} for i in industries]
# 2. Industry Classification (Website-based)
# STRENG: Nur wenn Branche noch auf "Others" steht oder neu ist, darf die KI klassifizieren
valid_industry_names = [i.name for i in industries]
if company.industry_ai and company.industry_ai != "Others" and company.industry_ai in valid_industry_names:
logger.info(f"KEEPING manual/existing industry '{company.industry_ai}' for {company.name}")
else:
website_content = scrape_website_content(company.website)
if website_content:
industry_name = self._run_llm_classification_prompt(website_content, company.name, industry_defs)
company.industry_ai = industry_name if industry_name in valid_industry_names else "Others"
logger.info(f"AI CLASSIFIED {company.name} as '{company.industry_ai}'")
else:
company.industry_ai = "Others"
logger.warning(f"No website content for {company.name}, setting industry to Others")
db.commit()
# 3. Metric Extraction
if company.industry_ai != "Others":
industry = next((i for i in industries if i.name == company.industry_ai), None)
if industry:
self.extract_metrics_for_industry(company, db, industry)
return company

View File

@@ -291,4 +291,4 @@ def scrape_website_content(url: str) -> Optional[str]:
return text return text
except Exception as e: except Exception as e:
logger.error(f"Scraping error for {url}: {e}") logger.error(f"Scraping error for {url}: {e}")
return None return ""

View File

@@ -167,6 +167,18 @@ export function Inspector({ companyId, initialContactId, onClose, apiBase }: Ins
industry_ai: data.industry_ai, industry_ai: data.industry_ai,
created_at: data.created_at created_at: data.created_at
}, },
quantitative_potential: {
calculated_metric_name: data.calculated_metric_name,
calculated_metric_value: data.calculated_metric_value,
calculated_metric_unit: data.calculated_metric_unit,
standardized_metric_value: data.standardized_metric_value,
standardized_metric_unit: data.standardized_metric_unit,
metric_source: data.metric_source,
metric_source_url: data.metric_source_url,
metric_proof_text: data.metric_proof_text,
metric_confidence: data.metric_confidence,
metric_confidence_reason: data.metric_confidence_reason
},
enrichment: data.enrichment_data, enrichment: data.enrichment_data,
signals: data.signals signals: data.signals
}; };
@@ -912,6 +924,23 @@ export function Inspector({ companyId, initialContactId, onClose, apiBase }: Ins
</div> </div>
)} )}
{/* Standardized Metric */}
{data.standardized_metric_value != null && (
<div className="flex items-start gap-3 pt-4 border-t border-slate-200 dark:border-slate-800">
<div className="p-2 bg-white dark:bg-slate-800 rounded-lg text-green-500 mt-1">
<Ruler className="h-4 w-4" />
</div>
<div>
<div className="text-[10px] text-slate-500 uppercase font-bold tracking-tight">Standardized Potential ({data.standardized_metric_unit})</div>
<div className="text-xl text-green-600 dark:text-green-400 font-bold">
{data.standardized_metric_value.toLocaleString('de-DE')}
<span className="text-sm font-medium text-slate-500 ml-1">{data.standardized_metric_unit}</span>
</div>
<p className="text-xs text-slate-500 mt-1">Comparable value for potential analysis.</p>
</div>
</div>
)}
{/* Source & Confidence */} {/* Source & Confidence */}
{data.metric_source && ( {data.metric_source && (
<div className="flex justify-between items-center text-[10px] text-slate-500 pt-2 border-t border-slate-200 dark:border-slate-800"> <div className="flex justify-between items-center text-[10px] text-slate-500 pt-2 border-t border-slate-200 dark:border-slate-800">

View File

@@ -104,7 +104,7 @@ export function RoboticsSettings({ isOpen, onClose, apiBase }: RoboticsSettingsP
</div> </div>
{/* Tab Nav */} {/* Tab Nav */}
<div className="flex border-b border-slate-200 dark:border-slate-800 px-6 bg-white dark:bg-slate-900 overflow-x-auto"> <div className="flex flex-shrink-0 border-b border-slate-200 dark:border-slate-800 px-6 bg-white dark:bg-slate-900 overflow-x-auto">
{[ {[
{ id: 'robotics', label: 'Robotics Potential', icon: Bot }, { id: 'robotics', label: 'Robotics Potential', icon: Bot },
{ id: 'industries', label: 'Industry Focus', icon: Target }, { id: 'industries', label: 'Industry Focus', icon: Target },
@@ -130,72 +130,66 @@ export function RoboticsSettings({ isOpen, onClose, apiBase }: RoboticsSettingsP
{isLoading && <div className="text-center py-12 text-slate-500">Loading...</div>} {isLoading && <div className="text-center py-12 text-slate-500">Loading...</div>}
{!isLoading && activeTab === 'robotics' && ( <div key="robotics-content" className={clsx("grid grid-cols-1 md:grid-cols-2 gap-6", { 'hidden': isLoading || activeTab !== 'robotics' })}>
<div key="robotics-content" className="grid grid-cols-1 md:grid-cols-2 gap-6"> {roboticsCategories.map(cat => ( <CategoryCard key={cat.id} category={cat} onSave={handleUpdateRobotics} /> ))}
{roboticsCategories.map(cat => ( <CategoryCard key={cat.id} category={cat} onSave={handleUpdateRobotics} /> ))} </div>
</div>
)}
{!isLoading && activeTab === 'industries' && ( <div key="industries-content" className={clsx("space-y-4", { 'hidden': isLoading || activeTab !== 'industries' })}>
<div key="industries-content" className="space-y-4"> <div className="flex justify-between items-center">
<div className="flex justify-between items-center"> <h3 className="text-sm font-bold text-slate-700 dark:text-slate-300">Industry Verticals (Synced from Notion)</h3>
<h3 className="text-sm font-bold text-slate-700 dark:text-slate-300">Industry Verticals (Synced from Notion)</h3> </div>
</div> <div className="grid grid-cols-1 gap-3">
<div className="grid grid-cols-1 gap-3"> {industries.map(ind => (
{industries.map(ind => ( <div key={ind.id} className="bg-slate-50 dark:bg-slate-950 border border-slate-200 dark:border-slate-800 rounded-lg p-4 flex flex-col gap-3 group relative overflow-hidden">
<div key={ind.id} className="bg-slate-50 dark:bg-slate-950 border border-slate-200 dark:border-slate-800 rounded-lg p-4 flex flex-col gap-3 group relative overflow-hidden"> {ind.notion_id && (
{ind.notion_id && ( <div className="absolute top-0 right-0 bg-blue-100 dark:bg-blue-900/30 text-blue-600 dark:text-blue-400 text-[9px] font-bold px-2 py-0.5 rounded-bl">SYNCED</div>
<div className="absolute top-0 right-0 bg-blue-100 dark:bg-blue-900/30 text-blue-600 dark:text-blue-400 text-[9px] font-bold px-2 py-0.5 rounded-bl">SYNCED</div> )}
)} <div className="flex gap-4 items-start pr-12">
<div className="flex gap-4 items-start pr-12"> <div className="flex-1">
<div className="flex-1"> <h4 className="font-bold text-slate-900 dark:text-white text-sm">{ind.name}</h4>
<h4 className="font-bold text-slate-900 dark:text-white text-sm">{ind.name}</h4> <div className="flex flex-wrap gap-2 mt-1">
<div className="flex flex-wrap gap-2 mt-1"> {ind.status_notion && <span className="text-[10px] border border-slate-300 dark:border-slate-700 px-1.5 rounded text-slate-500">{ind.status_notion}</span>}
{ind.status_notion && <span className="text-[10px] border border-slate-300 dark:border-slate-700 px-1.5 rounded text-slate-500">{ind.status_notion}</span>}
</div>
</div>
<div className="text-right">
<div className="flex items-center gap-1.5 justify-end">
<span className={clsx("w-2 h-2 rounded-full", ind.is_focus ? "bg-green-500" : "bg-slate-300 dark:bg-slate-700")} />
<span className="text-xs text-slate-500">{ind.is_focus ? "Focus" : "Standard"}</span>
</div>
</div> </div>
</div> </div>
<p className="text-xs text-slate-600 dark:text-slate-300 italic whitespace-pre-wrap">{ind.description || "No definition"}</p> <div className="text-right">
<div className="grid grid-cols-2 sm:grid-cols-4 gap-2 text-[10px] bg-white dark:bg-slate-900 p-2 rounded border border-slate-200 dark:border-slate-800"> <div className="flex items-center gap-1.5 justify-end">
<div><span className="block text-slate-400 font-bold uppercase">Whale &gt;</span><span className="text-slate-700 dark:text-slate-200">{ind.whale_threshold || "-"}</span></div> <span className={clsx("w-2 h-2 rounded-full", ind.is_focus ? "bg-green-500" : "bg-slate-300 dark:bg-slate-700")} />
<div><span className="block text-slate-400 font-bold uppercase">Min Req</span><span className="text-slate-700 dark:text-slate-200">{ind.min_requirement || "-"}</span></div> <span className="text-xs text-slate-500">{ind.is_focus ? "Focus" : "Standard"}</span>
<div><span className="block text-slate-400 font-bold uppercase">Unit</span><span className="text-slate-700 dark:text-slate-200 truncate">{ind.scraper_search_term || "-"}</span></div> </div>
<div><span className="block text-slate-400 font-bold uppercase">Product</span><span className="text-slate-700 dark:text-slate-200 truncate">{roboticsCategories.find(c => c.id === ind.primary_category_id)?.name || "-"}</span></div>
</div> </div>
{ind.scraper_keywords && <div className="text-[10px]"><span className="text-slate-400 font-bold uppercase mr-2">Keywords:</span><span className="text-slate-600 dark:text-slate-400 font-mono">{ind.scraper_keywords}</span></div>}
{ind.standardization_logic && <div className="text-[10px]"><span className="text-slate-400 font-bold uppercase mr-2">Standardization:</span><span className="text-slate-600 dark:text-slate-400 font-mono">{ind.standardization_logic}</span></div>}
</div> </div>
))} <p className="text-xs text-slate-600 dark:text-slate-300 italic whitespace-pre-wrap">{ind.description || "No definition"}</p>
</div> <div className="grid grid-cols-2 sm:grid-cols-4 gap-2 text-[10px] bg-white dark:bg-slate-900 p-2 rounded border border-slate-200 dark:border-slate-800">
<div><span className="block text-slate-400 font-bold uppercase">Whale &gt;</span><span className="text-slate-700 dark:text-slate-200">{ind.whale_threshold || "-"}</span></div>
<div><span className="block text-slate-400 font-bold uppercase">Min Req</span><span className="text-slate-700 dark:text-slate-200">{ind.min_requirement || "-"}</span></div>
<div><span className="block text-slate-400 font-bold uppercase">Unit</span><span className="text-slate-700 dark:text-slate-200 truncate">{ind.scraper_search_term || "-"}</span></div>
<div><span className="block text-slate-400 font-bold uppercase">Product</span><span className="text-slate-700 dark:text-slate-200 truncate">{roboticsCategories.find(c => c.id === ind.primary_category_id)?.name || "-"}</span></div>
</div>
{ind.scraper_keywords && <div className="text-[10px]"><span className="text-slate-400 font-bold uppercase mr-2">Keywords:</span><span className="text-slate-600 dark:text-slate-400 font-mono">{ind.scraper_keywords}</span></div>}
{ind.standardization_logic && <div className="text-[10px]"><span className="text-slate-400 font-bold uppercase mr-2">Standardization:</span><span className="text-slate-600 dark:text-slate-400 font-mono">{ind.standardization_logic}</span></div>}
</div>
))}
</div> </div>
)} </div>
{!isLoading && activeTab === 'roles' && ( <div key="roles-content" className={clsx("space-y-4", { 'hidden': isLoading || activeTab !== 'roles' })}>
<div key="roles-content" className="space-y-4"> <div className="flex justify-between items-center"><h3 className="text-sm font-bold text-slate-700 dark:text-slate-300">Job Title Mapping Patterns</h3><button onClick={handleAddJobRole} className="flex items-center gap-1 px-3 py-1.5 bg-blue-600 hover:bg-blue-500 text-white text-xs font-bold rounded"><Plus className="h-3 w-3" /> ADD PATTERN</button></div>
<div className="flex justify-between items-center"><h3 className="text-sm font-bold text-slate-700 dark:text-slate-300">Job Title Mapping Patterns</h3><button onClick={handleAddJobRole} className="flex items-center gap-1 px-3 py-1.5 bg-blue-600 hover:bg-blue-500 text-white text-xs font-bold rounded"><Plus className="h-3 w-3" /> ADD PATTERN</button></div> <div className="bg-slate-50 dark:bg-slate-950 border border-slate-200 dark:border-slate-800 rounded-lg overflow-hidden">
<div className="bg-slate-50 dark:bg-slate-950 border border-slate-200 dark:border-slate-800 rounded-lg overflow-hidden"> <table className="w-full text-left text-xs">
<table className="w-full text-left text-xs"> <thead className="bg-slate-100 dark:bg-slate-900 border-b border-slate-200 dark:border-slate-800 text-slate-500 font-bold uppercase"><tr><th className="p-3">Job Title Pattern (Regex/Text)</th><th className="p-3">Mapped Role</th><th className="p-3 w-10"></th></tr></thead>
<thead className="bg-slate-100 dark:bg-slate-900 border-b border-slate-200 dark:border-slate-800 text-slate-500 font-bold uppercase"><tr><th className="p-3">Job Title Pattern (Regex/Text)</th><th className="p-3">Mapped Role</th><th className="p-3 w-10"></th></tr></thead> <tbody className="divide-y divide-slate-200 dark:divide-slate-800">
<tbody className="divide-y divide-slate-200 dark:divide-slate-800"> {jobRoles.map(role => (
{jobRoles.map(role => ( <tr key={role.id} className="group">
<tr key={role.id} className="group"> <td className="p-2"><input className="w-full bg-transparent border border-transparent hover:border-slate-300 dark:hover:border-slate-700 rounded px-2 py-1 text-slate-900 dark:text-slate-200 outline-none focus:border-blue-500" defaultValue={role.pattern} /></td>
<td className="p-2"><input className="w-full bg-transparent border border-transparent hover:border-slate-300 dark:hover:border-slate-700 rounded px-2 py-1 text-slate-900 dark:text-slate-200 outline-none focus:border-blue-500" defaultValue={role.pattern} /></td> <td className="p-2"><select className="w-full bg-transparent border border-transparent hover:border-slate-300 dark:hover:border-slate-700 rounded px-2 py-1 text-slate-900 dark:text-slate-200 outline-none focus:border-blue-500" defaultValue={role.role}><option>Operativer Entscheider</option><option>Infrastruktur-Verantwortlicher</option><option>Wirtschaftlicher Entscheider</option><option>Innovations-Treiber</option></select></td>
<td className="p-2"><select className="w-full bg-transparent border border-transparent hover:border-slate-300 dark:hover:border-slate-700 rounded px-2 py-1 text-slate-900 dark:text-slate-200 outline-none focus:border-blue-500" defaultValue={role.role}><option>Operativer Entscheider</option><option>Infrastruktur-Verantwortlicher</option><option>Wirtschaftlicher Entscheider</option><option>Innovations-Treiber</option></select></td> <td className="p-2 text-center"><button onClick={() => handleDeleteJobRole(role.id)} className="text-slate-400 hover:text-red-500 opacity-0 group-hover:opacity-100 transition-opacity"><Trash2 className="h-4 w-4" /></button></td>
<td className="p-2 text-center"><button onClick={() => handleDeleteJobRole(role.id)} className="text-slate-400 hover:text-red-500 opacity-0 group-hover:opacity-100 transition-opacity"><Trash2 className="h-4 w-4" /></button></td> </tr>
</tr> ))}
))} {jobRoles.length === 0 && (<tr><td colSpan={3} className="p-8 text-center text-slate-500 italic">No patterns defined yet.</td></tr>)}
{jobRoles.length === 0 && (<tr><td colSpan={3} className="p-8 text-center text-slate-500 italic">No patterns defined yet.</td></tr>)} </tbody>
</tbody> </table>
</table>
</div>
</div> </div>
)} </div>
</div> </div>
</div> </div>
</div> </div>

View File

@@ -64,6 +64,8 @@ services:
volumes: volumes:
# Sideloading: Source Code (Hot Reload) # Sideloading: Source Code (Hot Reload)
- ./company-explorer:/app - ./company-explorer:/app
# DATABASE (Persistence)
- ./companies_v3_fixed_2.db:/app/companies_v3_fixed_2.db
# Keys # Keys
- ./gemini_api_key.txt:/app/gemini_api_key.txt - ./gemini_api_key.txt:/app/gemini_api_key.txt
- ./serpapikey.txt:/app/serpapikey.txt - ./serpapikey.txt:/app/serpapikey.txt
@@ -72,6 +74,8 @@ services:
- ./Log_from_docker:/app/logs_debug - ./Log_from_docker:/app/logs_debug
environment: environment:
- PYTHONUNBUFFERED=1 - PYTHONUNBUFFERED=1
ports:
- "8000:8000"
# Port 8000 is internal only # Port 8000 is internal only
# --- B2B MARKETING ASSISTANT --- # --- B2B MARKETING ASSISTANT ---

View File

@@ -89,8 +89,8 @@ http {
location /ce/ { location /ce/ {
# Company Explorer (Robotics Edition) # Company Explorer (Robotics Edition)
# Der Trailing Slash am Ende ist wichtig! # KEIN Trailing Slash, damit der /ce/ Pfad erhalten bleibt!
proxy_pass http://company-explorer:8000/; proxy_pass http://company-explorer:8000;
proxy_set_header Host $host; proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Upgrade $http_upgrade; proxy_set_header Upgrade $http_upgrade;