docs: consolidate metric parser fixes in migration plan

2026-01-23 21:46:04 +00:00
parent 5de7d38fcb
commit 5602f3b60a
1 changed files with 10 additions and 11 deletions
--- a/MIGRATION_PLAN.md
+++ b/MIGRATION_PLAN.md
@@ -94,17 +94,16 @@ Wir kapseln das neue Projekt vollständig ab ("Fork & Clean").

 ## 7. Historie & Fixes (Jan 2026)

-    *   **[STABILITY] v0.7.2: Robust Metric Parsing (Jan 23, 2026) [RESOLVED]**
-        *   **Legacy Logic Restored:** Re-implemented the robust, regex-based number parsing logic (formerly in legacy helpers) as `MetricParser`.
-        *   **German Formats:** Correctly handles "1.000" (thousands) vs "1,5" (decimal) and mixed formats.
-        *   **Citation & Year Cleaning:** Filters out Wikipedia citations like `[3]` and years in parentheses.
-        *   **Wolfra Fix:** Specifically detects and fixes the "802020" bug by stripping concatenated years from the end of numeric strings.
-        *   **Hybrid Extraction:** The ClassificationService now asks the LLM for the *text segment* and parses the number deterministically, fixing the LLM hallucinations.
-
-    *   **[SUCCESS] v0.6.4: Wolfra Metric Extraction Bug (Jan 23, 2026)**
-        *   **Problem:** Mitarbeiterzahl für "Wolfra" wurde fälschlicherweise als "802020" anstatt "80" ausgelesen.
-        *   **Gelöst durch:** Hybrid-Extraktion (LLM Segment + Python Cleanup) und Wiederherstellung der `MetricParser` Logik.
-        *   **Status:** Problem **vollständig gelöst**. Alle Core-API Endpunkte (Import, Override, Create) wurden ebenfalls wiederhergestellt.
+    *   **[STABILITY] v0.7.3: Hardening Metric Parser & Regression Testing (Jan 23, 2026) [RESOLVED]**
+        *   **Summary:** A series of critical fixes were applied to the `MetricParser` to handle complex real-world scenarios, and a regression test suite was created to prevent future issues.
+        *   **Fixes Implemented:**
+            *   **Greilmeier Bug:** Removed aggressive sentence splitting on hyphens that was truncating text and causing the parser to miss numbers at the end of a phrase.
+            *   **Expected Value Cleaning:** The parser now aggressively strips units (like "m²") from the LLM's `expected_value` to ensure it can find the correct numeric target even if the source text contains multiple numbers.
+            *   **Wolfra Bug:** Logic to detect and remove trailing years from concatenated numbers (e.g., "802020" -> "80") was confirmed and added to tests.
+            *   **Erding Bug:** Logic to ignore year-like prefixes (e.g., "2022 ... 200.000") was confirmed and added to tests.
+        *   **Regression Test Suite (`/backend/tests/test_metric_parser.py`):**
+            *   Created to lock in the fixes for the three critical bugs (Wolfra, Erding, Greilmeier) and ensure long-term stability of the number extraction logic.
+        *   **Status:** The parser is now significantly more robust. All known major bugs are resolved and protected by automated tests. Core API endpoints (Import, Override, Create) were also restored during this process.

    *   **[STABILITY] v0.7.1: AI Robustness & UI Fixes (Jan 21, 2026)**
        *   **SDK Stabilität:** Umstellung auf `gemini-2.0-flash` im Legacy-SDK zur Behebung von `404 Not Found` Fehlern bei `1.5-flash-latest`.