- FEATURE: Brancheneinstufung erfolgt nun in Batches (z.B. 20 Unternehmen pro API-Call), um die Token-Kosten drastisch zu senken.
- REFACTOR: Neue Funktion `evaluate_branches_batch` in `helpers.py` erstellt, die den komplexen Batch-Prompt generiert.
- REFACTOR: `reclassify_all_branches` in `data_processor.py` überarbeitet, um die Batch-Verarbeitung und das Ergebnis-Mapping zu steuern.
- FEATURE: Brancheneinstufung 2.0 implementiert; nutzt nun die reichhaltigen Definitionen und Beispiele aus `config.py` für ein hochpräzises, kontextuelles Matching.
- REFACTOR: `evaluate_branche_chatgpt` in `helpers.py` komplett neugeschrieben; gibt nun eine detaillierte Begründung für die Zuordnung zurück.
- FEATURE: Neuer Batch-Modus `reclassify_branches` in `data_processor.py` hinzugefügt, um eine vollständige Neubewertung aller Accounts zu ermöglichen.
- Enhances the `_scrape_website_task_batch` worker to improve data quality assessment.
- Implements a "Thin Content" check: If the extracted text is less than 200 characters, the URL status is set to `URL_SCRAPE_THIN_CONTENT`.
- Adds a heuristic for detecting cookie banners: If the text is short (< 500 chars) and contains a high density of cookie-related keywords, the status is set to `URL_SCRAPE_COOKIE_BANNER`.
- These new statuses provide more granular insights into scraping issues, allowing for better-targeted reprocessing and quality control.
- Refactors the website scraping batch process to fix critical stability issues.
- Replaces multiple redundant and conflicting scraping functions (`_scrape_website_task`, `_scrape_raw_text_task`, `_scrape_and_summarize_task`) with a single, robust worker function: `_scrape_website_task_batch`.
- The new worker function now consistently returns a structured dictionary, resolving the `TypeError` that prevented results from being written to the sheet.
- The main batch function `process_website_scraping_batch` is updated to correctly handle this new dictionary structure, including error states.
- Functionality is now aligned with the single-row processing mode by also fetching meta-details in the batch process, not just raw text.
- The two large, duplicated, and now obsolete `process_website_scraping` functions have been removed to improve code clarity and maintainability.