Brancheneinstufung2

Author	SHA1	Message	Date
Floke	b4635ae455	bugfix	2025-04-24 17:08:48 +00:00
Floke	b72c63d206	bugfix	2025-04-24 17:01:31 +00:00
Floke	3ed2ddd16e	bugfix	2025-04-24 16:55:18 +00:00
Floke	93ec051912	bugfix	2025-04-24 16:49:43 +00:00
Floke	00ceac6ed3	bugfix	2025-04-24 16:36:07 +00:00
Floke	32e1e55652	bugfix	2025-04-24 16:29:53 +00:00
Floke	a3f154cd44	bugfix	2025-04-24 16:17:17 +00:00
Floke	62afec9e4e	bugfix	2025-04-24 15:45:46 +00:00
Floke	f9c47ea6a7	bugfix	2025-04-24 15:24:45 +00:00
Floke	8dd3381728	bugfix	2025-04-24 15:18:25 +00:00
Floke	1cac9dd1ee	debug	2025-04-24 15:05:35 +00:00
Floke	b771a35b66	debug	2025-04-24 14:57:48 +00:00
Floke	cccb90f3da	bugfix	2025-04-24 14:42:12 +00:00
Floke	b4ca6e35e9	bugfix	2025-04-24 14:41:06 +00:00
Floke	42de5dee50	bugfix	2025-04-24 14:39:50 +00:00
Floke	c36d2cf300	bugfix	2025-04-24 06:33:57 +00:00
Floke	dab3a6b8f2	bugfix	2025-04-24 06:32:25 +00:00
Floke	f8621a41b8	bugfix	2025-04-24 06:12:04 +00:00
Floke	73e835afc1	bugfix	2025-04-24 05:59:51 +00:00
Floke	48f1c8aa2e	v1.7.0: Major refactoring for flexible processing modes and UI. This version introduces significant structural changes to improve code maintainability and user flexibility by centralizing processing logic within the DataProcessor class and implementing a new menu-driven user interface with granular control over processing steps and row selection. - Increment version number to v1.7.0. - Major Structural Refactoring: - DataProcessor Centralization: Move the core processing logic for sequential runs, re-evaluation, batch modes, and specific data lookups/updates into the `DataProcessor` class as methods. - Resolve AttributeErrors: Correct the indentation for all methods belonging to the `DataProcessor` class to ensure they are correctly defined within the class scope. - Fix DataProcessor Initialization: Update the `DataProcessor.__init__` signature and implementation to accept and store required handler instances (e.g., `GoogleSheetHandler`, `WikipediaScraper`). - New User Interface: - Menu-Driven Dispatcher: Implement a new `run_user_interface` function to replace the old `main` logic block. This function provides an interactive, multi-level numeric menu for selecting processing modes and parameters. It can also process direct CLI arguments. - Simplified Main: The `main` function is reduced to handling initial setup (Config, Logging, Handlers, DataProcessor instantiation) and then calling `run_user_interface`. - Granular Processing Control: - Step Selection: Implement the ability for users to select specific processing steps (grouped logically, e.g., 'website', 'wiki', 'chatgpt') for execution within sequential, re-evaluation, and criteria-based modes. - Flags for Steps: Adapt the `_process_single_row` method and the methods that call it (`process_reevaluation_rows`, `process_sequential`, `process_rows_matching_criteria`) to accept and utilize flags (e.g., `process_wiki`, `process_chatgpt`) to control which processing blocks are attempted for a given row. - Refined Step Logic: Ensure processing blocks within `_process_single_row` correctly check their corresponding step flag and the necessary timestamp/status conditions (unless `force_reeval` is active). - New Processing Modes: - Criteria Mode: Implement the `process_rows_matching_criteria` method and its UI integration, allowing users to select a predefined criterion function (e.g., 'M filled and AN empty') to filter rows for processing. - Wiki Re-Extraction (Criteria-based): Integrate the logic for processing rows where Wiki URL (M) is filled and Wiki Timestamp (AN) is empty, likely as a specific option within the new Criteria mode. - Fixes and Improvements: - SyntaxError Resolution: Resolve persistent `SyntaxError`s related to complex f-string formatting in logging calls by constructing message parts separately. - `find_wiki_serp` Filter Logic: Ensure the `process_find_wiki_serp` method correctly uses the `get_numeric_filter_value` helper to apply the Umsatz OR Mitarbeiter threshold filter logic based on the correct data units. - Timestamp/Status Logic: Consolidate and clarify the logic for checking process necessity based on timestamps, status flags (like S='X'), and the `force_reeval` parameter in helper methods like `_is_step_processing_needed`. - ML Integration: Ensure `prepare_data_for_modeling` and `train_technician_model` are correctly integrated as `DataProcessor` methods and function within the new structure. - Consistency: Address inconsistencies in timestamp setting (e.g., ensuring AP is set by batch modes) and parameter handling across different methods where identified during the refactoring. - Helper Functions: Define or confirm the global scope of necessary helper functions (`get_numeric_filter_value`, criteria functions, `_process_batch`, etc.). This version marks a significant milestone in making the script more modular, maintainable, and user-controllable, laying the groundwork for further enhancements like the ML estimation mode.	2025-04-24 05:04:51 +00:00
Floke	aadebd162e	bugfix	2025-04-23 12:47:54 +00:00
Floke	552fb3e372	bugfix	2025-04-23 12:27:01 +00:00
Floke	9ae750d6bd	bugfix	2025-04-23 05:36:57 +00:00
Floke	3d337a39df	v1.6.7: Behebt strukturelle/Syntax-Fehler; passt Filter für Wiki-Suche via SerpAPI an - Inkrementiere Versionsnummer auf v1.6.7. - Behebe kritischen AttributeError: Korrigiere die Einrückung für mehrere Verarbeitungsmethoden (_process_single_row, process_reevaluation_rows, process_serp_website_lookup_for_empty, process_website_details_for_marked_rows, prepare_data_for_modeling, process_rows_sequentially, process_find_wiki_with_serp), sodass diese korrekt als Methoden innerhalb der Klasse DataProcessor definiert sind. - Behebe SyntaxError: Löse das Problem mit komplexen f-Strings in _process_single_row und potenziell anderen Stellen, indem die String-Konstruktion von Ausdrücken innerhalb der f-String-Syntax getrennt wird. - Passe Filterlogik für Modus 'find_wiki_serp' an: Die SerpAPI-Suche nach fehlenden Wiki-URLs (M=k.A./leer) wird nun ausgelöst, wenn (CRM Umsatz (J) > 200 Mio ODER CRM Anzahl Mitarbeiter (K) > 500). Implementiere robuste numerische Extraktion für J und K innerhalb der Filterlogik. - Stelle sicher, dass SerpAPI Wiki Search Timestamp (AY) immer nach einem Suchversuch im Modus 'find_wiki_serp' gesetzt wird, unabhängig vom Ergebnis. - Diverse Logging-Anpassungen für Klarheit und Debugging (z.B. im Wiki-Verarbeitungsschritt).	2025-04-23 05:18:30 +00:00
Floke	903362afef	bugfix	2025-04-22 14:17:22 +00:00
Floke	0f12d30f2d	bugfix	2025-04-22 14:10:50 +00:00
Floke	7dfee24a77	bugfix	2025-04-22 14:03:42 +00:00
Floke	212f5232d5	bugfix	2025-04-22 13:58:36 +00:00
Floke	632b94a926	bugfix	2025-04-22 12:42:49 +00:00
Floke	2fac656e8d	bugfix	2025-04-22 12:29:03 +00:00
Floke	f6c2cc5e14	bugfix	2025-04-22 12:21:33 +00:00
Floke	fd58b80a01	bugfix	2025-04-22 11:18:10 +00:00
Floke	23ce90a0c6	bugfix	2025-04-22 09:54:08 +00:00
Floke	8bb2ac9130	bugfix	2025-04-22 08:23:32 +00:00
Floke	5c020a94c1	bugfix	2025-04-22 06:43:59 +00:00
Floke	80ae56bf9f	bugfix	2025-04-22 06:31:38 +00:00
Floke	5ddd8ee065	bugfix	2025-04-22 06:17:23 +00:00
Floke	c8cc8cc435	bugfix	2025-04-22 06:13:52 +00:00
Floke	dcc2413f9b	bugfix	2025-04-22 06:12:55 +00:00
Floke	be6db61200	bugfix	2025-04-22 05:38:11 +00:00
Floke	621e658c93	v1.6.6: Füge SerpAPI-Suche für fehlende Wiki-URLs großer Firmen hinzu - Füge neuen Betriebsmodus `--mode find_wiki_serp` hinzu. - Implementiere neue Funktion `serp_wikipedia_lookup`, die SerpAPI nutzt, um gezielt nach Wikipedia-Artikeln für einen Firmennamen zu suchen. - Implementiere neue Funktion `process_find_wiki_with_serp`: - Lädt aktuelle Sheet-Daten. - Filtert Zeilen, bei denen Spalte M (Wiki URL) leer/'k.A.' ist UND Spalte K (CRM Mitarbeiter) einen Schwellenwert (Standard: 500) überschreitet. - Ruft `serp_wikipedia_lookup` für gefilterte Zeilen auf. - Bei erfolgreicher URL-Findung: - Schreibt die gefundene URL in Spalte M. - Setzt Flag 'x' in Spalte A (ReEval Flag). - Löscht Timestamps in Spalten AN (Wikipedia Timestamp) und AO (Timestamp letzte Prüfung). - Führt gebündelte Sheet-Updates am Ende durch. - Integriere den neuen Modus `find_wiki_serp` in die Argumentenverarbeitung und Ausführungslogik der `main`-Funktion. - Füge notwendige Imports hinzu und stelle sicher, dass die neuen Funktionen Logging verwenden. - Aktualisiere Versionsnummer in `Config.VERSION` auf v1.6.6.	2025-04-22 05:19:53 +00:00
Floke	732660b104	bugfix	2025-04-21 12:39:07 +00:00
Floke	8772a0e1f6	bugfix	2025-04-20 17:15:26 +00:00
Floke	5d0768aa7c	bugfix	2025-04-19 17:49:33 +00:00
Floke	fada980dc6	bugfix	2025-04-19 17:44:51 +00:00
Floke	025fa51363	bugfix	2025-04-19 17:36:41 +00:00
Floke	0b51a11aef	bugfix	2025-04-19 17:23:36 +00:00
Floke	6c1fd91a69	v1.6.5: Refactor logging & integrate improved WikipediaScraper - Replace custom `debug_print` function with standard Python `logging` module calls throughout the codebase. - Use appropriate logging levels (DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION). - Refactor logging setup in `main` for clarity and proper handler initialization. - Integrate updated `WikipediaScraper` class (previously developed as v1.6.5 logic): - Implement more robust infobox parsing (`_extract_infobox_value`) using flexible selectors, keyword checking (`in`), and improved value cleaning (incl. `sup` removal). - Remove old infobox fallback functions. - Enhance article validation (`_validate_article`) with better link checking via `_get_page_soup`. - Improve reliability of article search (`search_company_article`) with direct match attempt and better error handling. - Apply `@retry_on_failure` decorator to network-dependent scraper methods (`_get_page_soup`, `search_company_article`). - Ensure `Config.VERSION` reflects the logical state (v1.6.5 for this commit).	2025-04-19 16:53:35 +00:00
Floke	bde384c1a8	bugfix	2025-04-19 07:17:29 +00:00
Floke	dd4c0b2b75	bugfix	2025-04-19 07:15:24 +00:00

1 2 3 4 5 ...

376 Commits