docs: consolidate metric parser fixes in migration plan
This commit is contained in:
@@ -94,17 +94,16 @@ Wir kapseln das neue Projekt vollständig ab ("Fork & Clean").
|
||||
|
||||
## 7. Historie & Fixes (Jan 2026)
|
||||
|
||||
* **[STABILITY] v0.7.2: Robust Metric Parsing (Jan 23, 2026) [RESOLVED]**
|
||||
* **Legacy Logic Restored:** Re-implemented the robust, regex-based number parsing logic (formerly in legacy helpers) as `MetricParser`.
|
||||
* **German Formats:** Correctly handles "1.000" (thousands) vs "1,5" (decimal) and mixed formats.
|
||||
* **Citation & Year Cleaning:** Filters out Wikipedia citations like `[3]` and years in parentheses.
|
||||
* **Wolfra Fix:** Specifically detects and fixes the "802020" bug by stripping concatenated years from the end of numeric strings.
|
||||
* **Hybrid Extraction:** The ClassificationService now asks the LLM for the *text segment* and parses the number deterministically, fixing the LLM hallucinations.
|
||||
|
||||
* **[SUCCESS] v0.6.4: Wolfra Metric Extraction Bug (Jan 23, 2026)**
|
||||
* **Problem:** Mitarbeiterzahl für "Wolfra" wurde fälschlicherweise als "802020" anstatt "80" ausgelesen.
|
||||
* **Gelöst durch:** Hybrid-Extraktion (LLM Segment + Python Cleanup) und Wiederherstellung der `MetricParser` Logik.
|
||||
* **Status:** Problem **vollständig gelöst**. Alle Core-API Endpunkte (Import, Override, Create) wurden ebenfalls wiederhergestellt.
|
||||
* **[STABILITY] v0.7.3: Hardening Metric Parser & Regression Testing (Jan 23, 2026) [RESOLVED]**
|
||||
* **Summary:** A series of critical fixes were applied to the `MetricParser` to handle complex real-world scenarios, and a regression test suite was created to prevent future issues.
|
||||
* **Fixes Implemented:**
|
||||
* **Greilmeier Bug:** Removed aggressive sentence splitting on hyphens that was truncating text and causing the parser to miss numbers at the end of a phrase.
|
||||
* **Expected Value Cleaning:** The parser now aggressively strips units (like "m²") from the LLM's `expected_value` to ensure it can find the correct numeric target even if the source text contains multiple numbers.
|
||||
* **Wolfra Bug:** Logic to detect and remove trailing years from concatenated numbers (e.g., "802020" -> "80") was confirmed and added to tests.
|
||||
* **Erding Bug:** Logic to ignore year-like prefixes (e.g., "2022 ... 200.000") was confirmed and added to tests.
|
||||
* **Regression Test Suite (`/backend/tests/test_metric_parser.py`):**
|
||||
* Created to lock in the fixes for the three critical bugs (Wolfra, Erding, Greilmeier) and ensure long-term stability of the number extraction logic.
|
||||
* **Status:** The parser is now significantly more robust. All known major bugs are resolved and protected by automated tests. Core API endpoints (Import, Override, Create) were also restored during this process.
|
||||
|
||||
* **[STABILITY] v0.7.1: AI Robustness & UI Fixes (Jan 21, 2026)**
|
||||
* **SDK Stabilität:** Umstellung auf `gemini-2.0-flash` im Legacy-SDK zur Behebung von `404 Not Found` Fehlern bei `1.5-flash-latest`.
|
||||
|
||||
Reference in New Issue
Block a user