feat(company-explorer): force-refresh analysis and refine extraction logic

- Enforced fresh scrape on 'Analyze' request to bypass stale cache.
- Implemented 2-Hop Impressum scraping strategy (via Kontakt page).
- Refined numeric extraction for German locale (thousands separators).
- Updated documentation with Lessons Learned.
This commit is contained in:
2026-01-08 12:20:11 +00:00
parent 601593c65c
commit b3fa036809
3 changed files with 48 additions and 41 deletions

View File

@@ -324,6 +324,19 @@ def analyze_company(req: AnalysisRequest, background_tasks: BackgroundTasks, db:
if not company.website or company.website == "k.A.":
return {"error": "No website to analyze. Run Discovery first."}
# FORCE SCRAPE LOGIC
# If explicit force_scrape is requested OR if we want to ensure fresh data for debugging
# We delete the old scrape data.
# For now, let's assume every manual "Analyze" click implies a desire for fresh results if previous failed.
# But let's respect the flag from frontend if we add it later.
# Always clearing scrape data for now to fix the "stuck cache" issue reported by user
db.query(EnrichmentData).filter(
EnrichmentData.company_id == company.id,
EnrichmentData.source_type == "website_scrape"
).delete()
db.commit()
background_tasks.add_task(run_analysis_task, company.id, company.website)
return {"status": "queued"}