From 5602f3b60aec2ac7f64c6c9b701d8c3972d23f99 Mon Sep 17 00:00:00 2001 From: Floke Date: Fri, 23 Jan 2026 21:46:04 +0000 Subject: [PATCH] docs: consolidate metric parser fixes in migration plan --- MIGRATION_PLAN.md | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/MIGRATION_PLAN.md b/MIGRATION_PLAN.md index 18b6ff93..92eb9068 100644 --- a/MIGRATION_PLAN.md +++ b/MIGRATION_PLAN.md @@ -94,17 +94,16 @@ Wir kapseln das neue Projekt vollständig ab ("Fork & Clean"). ## 7. Historie & Fixes (Jan 2026) - * **[STABILITY] v0.7.2: Robust Metric Parsing (Jan 23, 2026) [RESOLVED]** - * **Legacy Logic Restored:** Re-implemented the robust, regex-based number parsing logic (formerly in legacy helpers) as `MetricParser`. - * **German Formats:** Correctly handles "1.000" (thousands) vs "1,5" (decimal) and mixed formats. - * **Citation & Year Cleaning:** Filters out Wikipedia citations like `[3]` and years in parentheses. - * **Wolfra Fix:** Specifically detects and fixes the "802020" bug by stripping concatenated years from the end of numeric strings. - * **Hybrid Extraction:** The ClassificationService now asks the LLM for the *text segment* and parses the number deterministically, fixing the LLM hallucinations. - - * **[SUCCESS] v0.6.4: Wolfra Metric Extraction Bug (Jan 23, 2026)** - * **Problem:** Mitarbeiterzahl für "Wolfra" wurde fälschlicherweise als "802020" anstatt "80" ausgelesen. - * **Gelöst durch:** Hybrid-Extraktion (LLM Segment + Python Cleanup) und Wiederherstellung der `MetricParser` Logik. - * **Status:** Problem **vollständig gelöst**. Alle Core-API Endpunkte (Import, Override, Create) wurden ebenfalls wiederhergestellt. + * **[STABILITY] v0.7.3: Hardening Metric Parser & Regression Testing (Jan 23, 2026) [RESOLVED]** + * **Summary:** A series of critical fixes were applied to the `MetricParser` to handle complex real-world scenarios, and a regression test suite was created to prevent future issues. + * **Fixes Implemented:** + * **Greilmeier Bug:** Removed aggressive sentence splitting on hyphens that was truncating text and causing the parser to miss numbers at the end of a phrase. + * **Expected Value Cleaning:** The parser now aggressively strips units (like "m²") from the LLM's `expected_value` to ensure it can find the correct numeric target even if the source text contains multiple numbers. + * **Wolfra Bug:** Logic to detect and remove trailing years from concatenated numbers (e.g., "802020" -> "80") was confirmed and added to tests. + * **Erding Bug:** Logic to ignore year-like prefixes (e.g., "2022 ... 200.000") was confirmed and added to tests. + * **Regression Test Suite (`/backend/tests/test_metric_parser.py`):** + * Created to lock in the fixes for the three critical bugs (Wolfra, Erding, Greilmeier) and ensure long-term stability of the number extraction logic. + * **Status:** The parser is now significantly more robust. All known major bugs are resolved and protected by automated tests. Core API endpoints (Import, Override, Create) were also restored during this process. * **[STABILITY] v0.7.1: AI Robustness & UI Fixes (Jan 21, 2026)** * **SDK Stabilität:** Umstellung auf `gemini-2.0-flash` im Legacy-SDK zur Behebung von `404 Not Found` Fehlern bei `1.5-flash-latest`.