diff --git a/fotograf-de-scraper/README.md b/fotograf-de-scraper/README.md index 4745c27b6..654bfa8c7 100644 --- a/fotograf-de-scraper/README.md +++ b/fotograf-de-scraper/README.md @@ -1,6 +1,6 @@ # Fotograf.de Scraper & Management UI -**Status:** Production-Ready Microservice (Core Feature: PDF List Generation, QR Cards, Shooting Schedule, **Siblings List**, **Gmail API Integration** & **Automated Release Requests**) +**Status:** Production-Ready Microservice (Core Feature: PDF List Generation, QR Cards, Shooting Schedule, **SQLite Data Sync**, **Gmail API Integration** & **Automated Release Requests**) Dieser Service modernisiert die alten `Fotograf.de` Skripte, indem er eine robuste, web-basierte UI zur Verwaltung und Automatisierung von Foto-Aufträgen bereitstellt. Er ist als eigenständiger Microservice konzipiert, der unabhängig vom Haupt-Stack läuft. @@ -10,16 +10,22 @@ Der Service besteht aus zwei Hauptkomponenten: 1. **Backend (Python / FastAPI / Selenium / SQLAlchemy):** * **Automatisierung:** Nutzt Selenium für das Scraping von `fotograf.de`. - * **Persistenz:** Eine SQLite-Datenbank (`fotograf_jobs.db`) speichert die Auftragsliste, OAuth-Tokens (`GmailToken`), Gutscheincodes (`DiscountCode`) und Teilnehmerdaten (`ReleaseParticipant`). + * **Persistenz:** Eine SQLite-Datenbank (`fotograf_jobs.db`) speichert die Auftragsliste, OAuth-Tokens (`GmailToken`), Gutscheincodes (`DiscountCode`), Teilnehmerdaten (`ReleaseParticipant`), **Auftragsteilnehmer (`JobParticipant`)** und die **Versand-Historie (`ReleaseHistory`)**. * **PDF-Engine:** Nutzt WeasyPrint für Teilnehmerlisten und ReportLab/PyPDF2 für präzise PDF-Overlays (QR-Karten). * **API-Integration:** Direkte Anbindung an die **Calendly API (v2)** sowie an die **Gmail API** für direkten E-Mail-Versand und automatisierte Webhook-Antworten. 2. **Frontend (TypeScript / React / Vite / TailwindCSS):** * **Modernes UI:** Ein vollständig responsives Dashboard mit Tailwind CSS (Kachel-Layout, Tabs für Kiga/Schule). - * **Arbeitsfluss:** Tools sind direkt in der Detailansicht des jeweiligen Auftrags integriert. + * **Arbeitsfluss:** Tools sind in der Detailansicht eines Auftrags in logische Phasen (Vorbereitung, Follow-Up, Statistik) unterteilt. ## ✨ Core Features +### 🚀 Performance-Optimierung (SQLite Sync) +Statt wie früher jedes Mal mühsam durch alle Foto-Alben zu "crawlen", nutzt das System nun eine intelligente Synchronisierung: +* **One-Click Sync:** Über den Button "Daten von Fotograf.de abgleichen" lädt das System die detaillierte Namensliste (CSV) herunter. +* **Lokale Datenbank:** Alle relevanten Infos (E-Mail der Eltern, Login-Zahlen, Bestellstatus, Zugangscodes) werden in der Tabelle `job_participants` gespeichert. +* **Blitzschnelle Analyse:** Nachfass-Mails und Statistiken werden nun in Sekunden (statt Minuten) direkt aus der Datenbank generiert. + ### Feature 1: Teilnehmerlisten (Vollständig) Automatisierter Workflow zum Download und Formatieren der Anmeldelisten von `fotograf.de` als sortiertes PDF inkl. "Kinderfotos Erding" Branding. @@ -28,43 +34,37 @@ Spezielles Modul für Familien-Mini-Shootings: * **QR-Karten-Andruck:** Präzises Overlay von Name, Kinderanzahl und Uhrzeit inkl. automatischer **Einwilligungs-Checkbox (☑)** aus Calendly-Daten. * **Termin-Übersichtsliste:** Generiert eine A4-Tabelle für den Shooting-Tag im 6-Minuten-Takt inkl. Lückenfüller. -### Feature 3: Nachfass-E-Mails & Gmail Direkt-Versand (Vollständig) -Identifizierung von Nicht-Käufern und automatisierter Massenversand personalisierter E-Mails via Gmail API. +### Feature 3: Nachfass-E-Mails & Gmail Direkt-Versand (Optimiert) +Identifizierung von Nicht-Käufern (0-1 Logins, keine Bestellung) basierend auf den synchronisierten Datenbank-Daten. +* **Vorschau-Modus:** Ermöglicht das Durchklicken der personalisierten E-Mails an jeden Empfänger vor dem eigentlichen Versand. +* **Quick-Login Automation:** Die Login-Links (`https://www.kinderfotos-erding.de/a/{code}`) werden automatisch generiert. -### Feature 4: Verkaufs-Statistiken (Vollständig) -Detaillierte Analyse des Kaufverhaltens pro Album mit Echtzeit-Fortschrittsanzeige. +### Feature 4: Verkaufs-Statistiken (Optimiert) +Detaillierte Analyse des Kaufverhaltens pro Gruppe/Klasse basierend auf den lokalen Datenbank-Einträgen. ### Feature 5: Geschwisterliste (Einrichtungsintern) (Vollständig) Tool zur Identifizierung von Geschwistergruppen innerhalb einer Einrichtung inkl. Cross-Check mit Calendly-Buchungen und speziellen Geschwister-QR-Karten. -### Feature 6: Freigabeanfragen & Gutschein-Automation (Vollständig - Neu April 2026) +### Feature 6: Freigabeanfragen & Gutschein-Automation (Vollständig) Vollautomatisierter DSGVO-Workflow zur Einholung von Veröffentlichungsgenehmigungen: -* **Schlanker Versand:** Manuelle Eingabe von Empfängern (E-Mail, Vorname, Kindernamen) für gezielte Anfragen. -* **Intelligente Personalisierung:** Automatische Bereinigung von Einrichtungsnamen (entfernt "Kindergarten" und Jahreszahlen). +* **Schlanker Versand:** Manuelle Eingabe von Empfängern (E-Mail, Vorname, Kindernamen) mit **E-Mail-Vorschau**. * **Versand-Planung:** Einstellbare Versandzeit (Berlin Timezone) via Hintergrund-Tasks. -* **Webhook-Integration:** Direkte Anbindung an **Google Forms**. Bei Absenden des Freigabe-Formulars wird automatisch: - 1. Ein freier Gutscheincode aus der DB reserviert. - 2. Eine personalisierte Dankes-E-Mail mit dem Code und einer bebilderten Einlöse-Anleitung versendet. -* **Gutschein-Management:** UI zum Hochladen und Überwachen des Gutschein-Pools. +* **Webhook-Integration:** Direkte Anbindung an **Google Forms**. Bei Absenden des Freigabe-Formulars wird automatisch ein Gutscheincode reserviert und eine Dankes-E-Mail versendet. +* **Antwort-Übersicht:** Tabelle aller eingegangenen Freigaben inkl. zugewiesenem Code und Zeitstempel. --- ## 🛠️ Technische Details & Sicherheit -* **Sicherer Test-Modus:** Über die Umgebungsvariable `DEV_MODE_EMAIL_RECIPIENT` können alle ausgehenden E-Mails (Anfragen & Gutscheine) global an eine Test-Adresse umgeleitet werden. -* **Zeitzonen:** Durchgängige Verwendung von `Europe/Berlin` für alle zeitgesteuerten Operationen. -* **E-Mail Signatur:** Die offizielle HTML-Signatur von "Kinderfotos Erding" wird automatisch an alle ausgehenden E-Mails (auch vom Backend) angehängt. -* **Gmail OAuth:** Persistente Speicherung der Refresh-Tokens in der Datenbank ermöglicht dauerhaften Betrieb ohne erneutes Einloggen. +* **BCC-Kontrolle:** Jede vom System versendete E-Mail sendet automatisch eine Blindkopie (BCC) an `kontakt@kinderfotos-erding.de`. +* **Versand-Historie:** Alle Aussendungen (Anzahl Empfänger, Zeitpunkt) werden in der Tabelle `release_history` protokolliert. +* **Sicherer Test-Modus:** Über `DEV_MODE_EMAIL_RECIPIENT` können alle E-Mails global an eine Test-Adresse umgeleitet werden. +* **Zeitzonen:** Durchgängige Verwendung von `Europe/Berlin`. +* **Gmail OAuth:** Persistente Speicherung der Refresh-Tokens in der Datenbank. ## 🚀 Deployment & Konfiguration Der Service wird über die Haupt-`docker-compose.yml` des Projekts verwaltet. -### Umgebungsvariablen (`.env`) -Wichtige neue Variablen in `/fotograf-de-scraper/.env`: -* `DEV_MODE_EMAIL_RECIPIENT`: (Optional) E-Mail für Umleitung im Testbetrieb. -* `google_fotograf_client_id` / `google_fotograf_secret`: OAuth Credentials. -* `CALENDLY_TOKEN`: API Zugriff. - ### URLs * **Frontend:** `https://floke-ai.duckdns.org/fotograf-de/` -* **Webhook für Google Forms:** `https://floke-ai.duckdns.org/fotograf-de-api/api/publish-request/webhook` +* **Webhook für Google Forms:** `https://floke-ai.duckdns.org/fotograf-de-api/api/publish-request/webhook` \ No newline at end of file diff --git a/fotograf-de-scraper/backend/database.py b/fotograf-de-scraper/backend/database.py index d692cb375..7725de85f 100644 --- a/fotograf-de-scraper/backend/database.py +++ b/fotograf-de-scraper/backend/database.py @@ -49,6 +49,22 @@ class ReleaseHistory(Base): recipient_count = Column(Integer) scheduled_time = Column(String, nullable=True) +class JobParticipant(Base): + __tablename__ = "job_participants" + id = Column(Integer, primary_key=True) + job_id = Column(String, index=True) + child_id = Column(String, nullable=True) + vorname_kind = Column(String, nullable=True) + nachname_kind = Column(String, nullable=True) + vorname_eltern = Column(String, nullable=True) + nachname_eltern = Column(String, nullable=True) + email_eltern = Column(String, nullable=True) + zugangscode = Column(String, index=True) + gruppe = Column(String, nullable=True) + logins = Column(Integer, default=0) + has_orders = Column(Integer, default=0) # 0 for false, 1 for true + last_synced = Column(DateTime, default=datetime.datetime.utcnow) + Base.metadata.create_all(bind=engine) def get_db(): diff --git a/fotograf-de-scraper/backend/main.py b/fotograf-de-scraper/backend/main.py index 79b91e4c3..019524141 100644 --- a/fotograf-de-scraper/backend/main.py +++ b/fotograf-de-scraper/backend/main.py @@ -16,7 +16,7 @@ from fastapi.middleware.cors import CORSMiddleware from fastapi.responses import FileResponse from typing import List, Dict, Any, Optional from sqlalchemy.orm import Session -from database import get_db, Job as DBJob, engine, Base +from database import get_db, Job as DBJob, engine, Base, JobParticipant, SessionLocal import math import uuid @@ -141,6 +141,120 @@ def get_logo_base64(): logger.warning(f"Logo file not found at {logo_path}") return None +def sync_job_participants(job_id: str, account_type: str, db: Session): + logger.info(f"Syncing participants for job {job_id} ({account_type})") + username = os.getenv(f"{account_type.upper()}_USER") + password = os.getenv(f"{account_type.upper()}_PW") + + with tempfile.TemporaryDirectory() as temp_dir: + driver = setup_driver(download_path=temp_dir) + try: + if not login(driver, username, password): + raise Exception("Login failed during sync.") + + # Navigate to job names list + job_url = f"https://app.fotograf.de/config_jobs_settings/index/{job_id}" + driver.get(job_url) + wait = WebDriverWait(driver, 20) + + personen_tab = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "[data-qa-id='link:photo-jobs-tabs-names_list']"))) + driver.execute_script("arguments[0].click();", personen_tab) + + export_btn = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, SELECTORS["export_dropdown"]))) + driver.execute_script("arguments[0].click();", export_btn) + time.sleep(1) + + csv_btn = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, SELECTORS["export_csv_link"]))) + driver.execute_script("arguments[0].click();", csv_btn) + + # Wait for download + csv_file = None + for _ in range(30): + files = [f for f in os.listdir(temp_dir) if f.endswith('.csv')] + if files: + csv_file = os.path.join(temp_dir, files[0]) + break + time.sleep(1) + + if not csv_file: + raise Exception("CSV download timeout during sync.") + + # Parse CSV + df = None + for sep in [";", ","]: + try: + df = pd.read_csv(csv_file, sep=sep, encoding="utf-8-sig") + if len(df.columns) > 1: break + except: continue + + if df is None: + df = pd.read_csv(csv_file, sep=";", encoding="latin1") + + # Clean columns + df.columns = df.columns.str.strip().str.replace("\"", "") + + # Map columns - based on user feedback + # Expected columns: Child ID, Email der Eltern (1), Vorname Eltern (1), Nachname Eltern (1), Vorname Kind, Zugangscode (1), Logins (1), Bestellungen + + def get_col(df, patterns): + for p in patterns: + for col in df.columns: + if p.lower() in col.lower(): + return col + return None + + col_child_id = get_col(df, ["Child ID"]) + col_email = get_col(df, ["Email der Eltern", "E-Mail der Eltern"]) + col_parent_vn = get_col(df, ["Vorname Eltern", "Parent First Name"]) + col_parent_nn = get_col(df, ["Nachname Eltern", "Parent Last Name"]) + col_child_vn = get_col(df, ["Vorname Kind", "Child First Name"]) + col_child_nn = get_col(df, ["Nachname Kind", "Child Last Name"]) + col_code = get_col(df, ["Zugangscode", "Access Code"]) + col_group = get_col(df, ["Gruppe", "Klasse", "Group", "Class"]) + col_logins = get_col(df, ["Logins"]) + col_orders = get_col(df, ["Bestellungen", "Orders"]) + + # Delete old entries for this job + db.query(JobParticipant).filter(JobParticipant.job_id == job_id).delete() + + added = 0 + for _, row in df.iterrows(): + try: + logins_val = 0 + try: logins_val = int(row[col_logins]) if col_logins and pd.notna(row[col_logins]) else 0 + except: pass + + orders_val = 0 + if col_orders and pd.notna(row[col_orders]): + val = str(row[col_orders]).lower() + if val and val != "0" and val != "nein" and val != "false": + orders_val = 1 + + participant = JobParticipant( + job_id=job_id, + child_id=str(row[col_child_id]) if col_child_id and pd.notna(row[col_child_id]) else None, + vorname_kind=str(row[col_child_vn]) if col_child_vn and pd.notna(row[col_child_vn]) else None, + nachname_kind=str(row[col_child_nn]) if col_child_nn and pd.notna(row[col_child_nn]) else None, + vorname_eltern=str(row[col_parent_vn]) if col_parent_vn and pd.notna(row[col_parent_vn]) else None, + nachname_eltern=str(row[col_parent_nn]) if col_parent_nn and pd.notna(row[col_parent_nn]) else None, + email_eltern=str(row[col_email]).strip().lower() if col_email and pd.notna(row[col_email]) else None, + zugangscode=str(row[col_code]) if col_code and pd.notna(row[col_code]) else None, + gruppe=str(row[col_group]) if col_group and pd.notna(row[col_group]) else None, + logins=logins_val, + has_orders=orders_val + ) + db.add(participant) + added += 1 + except Exception as e: + logger.warning(f"Error adding participant row: {e}") + + db.commit() + logger.info(f"Sync complete. {added} participants stored for job {job_id}") + return added + + finally: + driver.quit() + def generate_pdf_from_csv(csv_path: str, institution: str, date_info: str, list_type: str, output_path: str): logger.info(f"Generating PDF for {institution} from {csv_path}") df = None @@ -488,264 +602,142 @@ def get_jobs_list(driver) -> List[Dict[str, Any]]: task_store: Dict[str, Dict[str, Any]] = {} def process_statistics(task_id: str, job_id: str, account_type: str): - logger.info(f"Task {task_id}: Starting statistics calculation for job {job_id}") - task_store[task_id] = {"status": "running", "progress": "Initialisiere Browser...", "result": None} - - username = os.getenv(f"{account_type.upper()}_USER") - password = os.getenv(f"{account_type.upper()}_PW") - driver = None + logger.info(f"Task {task_id}: Starting fast statistics calculation for job {job_id}") + task_store[task_id] = {"status": "running", "progress": "Synchronisiere Daten von Fotograf.de...", "result": None} + db = SessionLocal() try: - driver = setup_driver() - if not driver or not login(driver, username, password): - task_store[task_id] = {"status": "error", "progress": "Login fehlgeschlagen. Überprüfe die Zugangsdaten."} - return - - task_store[task_id]["progress"] = f"Lade Alben-Übersicht für Auftrag..." - - albums_overview_url = f"https://app.fotograf.de/config_jobs_photos/index/{job_id}" - logger.info(f"Navigating to albums: {albums_overview_url}") - driver.get(albums_overview_url) - wait = WebDriverWait(driver, 15) - - albums_to_visit = [] + # 1. Sync data from CSV try: - album_rows = wait.until(EC.presence_of_all_elements_located((By.XPATH, SELECTORS["album_overview_rows"]))) - for row in album_rows: - try: - album_link = row.find_element(By.XPATH, SELECTORS["album_overview_link"]) - albums_to_visit.append({"name": album_link.text, "url": album_link.get_attribute('href')}) - except NoSuchElementException: - continue - except TimeoutException: - task_store[task_id] = {"status": "error", "progress": "Konnte die Album-Liste nicht finden."} - return - - total_albums = len(albums_to_visit) - task_store[task_id]["progress"] = f"{total_albums} Alben gefunden. Starte Auswertung..." - - statistics = [] - - for index, album in enumerate(albums_to_visit): - album_name = album['name'] - task_store[task_id]["progress"] = f"Bearbeite Album {index + 1}/{total_albums}: '{album_name}'..." - driver.get(album['url']) - - try: - total_codes_text = wait.until(EC.visibility_of_element_located((By.XPATH, SELECTORS["access_code_count"]))).text - num_pages = math.ceil(int(total_codes_text) / 20) - - total_children_in_album = 0 - children_with_purchase = 0 - children_with_all_purchased = 0 + sync_participants(job_id, account_type, db) + except Exception as sync_err: + logger.error(f"Sync failed during statistics: {sync_err}") + count = db.query(JobParticipant).filter(JobParticipant.job_id == job_id).count() + if count == 0: + task_store[task_id] = {"status": "error", "progress": f"Synchronisierung fehlgeschlagen: {str(sync_err)}"} + return - for page_num in range(1, num_pages + 1): - task_store[task_id]["progress"] = f"Bearbeite Album {index + 1}/{total_albums}: '{album_name}' (Seite {page_num}/{num_pages})..." - - if page_num > 1: - driver.get(album['url'] + f"?page_guest_accesses={page_num}") - - person_rows = wait.until(EC.presence_of_all_elements_located((By.XPATH, SELECTORS["person_rows"]))) - - for person_row in person_rows: - total_children_in_album += 1 - try: - photo_container = person_row.find_element(By.XPATH, "./following-sibling::div[1]") - - num_total_photos = len(photo_container.find_elements(By.XPATH, SELECTORS["person_all_photos"])) - num_purchased_photos = len(photo_container.find_elements(By.XPATH, SELECTORS["person_purchased_photos"])) - num_access_cards = len(photo_container.find_elements(By.XPATH, SELECTORS["person_access_card_photo"])) - - buyable_photos = num_total_photos - num_access_cards - - if num_purchased_photos > 0: - children_with_purchase += 1 - - if buyable_photos > 0 and buyable_photos == num_purchased_photos: - children_with_all_purchased += 1 - except NoSuchElementException: - continue - - statistics.append({ - "Album": album_name, - "Kinder_insgesamt": total_children_in_album, - "Kinder_mit_Käufen": children_with_purchase, - "Kinder_Alle_Bilder_gekauft": children_with_all_purchased - }) + # 2. Query DB and group by 'gruppe' + task_store[task_id]["progress"] = "Berechne Statistiken..." + + # Get all participants for this job + participants = db.query(JobParticipant).filter(JobParticipant.job_id == job_id).all() + + # Group by group + groups = {} + for p in participants: + g_name = p.gruppe or "Unbekannt" + if g_name not in groups: + groups[g_name] = { + "Album": g_name, + "Kinder_insgesamt": 0, + "Kinder_mit_Käufen": 0, + "Kinder_Alle_Bilder_gekauft": 0 # Not available in CSV, setting to 0 or estimates + } + groups[g_name]["Kinder_insgesamt"] += 1 + if p.has_orders: + groups[g_name]["Kinder_mit_Käufen"] += 1 - except Exception as e: - logger.error(f"Fehler bei Auswertung von Album '{album_name}': {e}") - continue + statistics = list(groups.values()) + statistics.sort(key=lambda x: x["Album"]) task_store[task_id] = { "status": "completed", - "progress": "Auswertung erfolgreich abgeschlossen!", + "progress": "Statistik erfolgreich berechnet!", "result": statistics } except Exception as e: - logger.exception(f"Unexpected error in task {task_id}") + logger.exception(f"Unexpected error in statistics task {task_id}") task_store[task_id] = {"status": "error", "progress": f"Unerwarteter Fehler: {str(e)}"} finally: - if driver: - logger.debug(f"Task {task_id}: Closing driver.") - driver.quit() + db.close() def process_reminder_analysis(task_id: str, job_id: str, account_type: str): - logger.info(f"Task {task_id}: Starting reminder analysis for job {job_id}") - task_store[task_id] = {"status": "running", "progress": "Initialisiere Browser...", "result": None} - - username = os.getenv(f"{account_type.upper()}_USER") - password = os.getenv(f"{account_type.upper()}_PW") - driver = None + logger.info(f"Task {task_id}: Starting fast reminder analysis for job {job_id}") + task_store[task_id] = {"status": "running", "progress": "Synchronisiere Daten von Fotograf.de...", "result": None} + db = SessionLocal() try: - driver = setup_driver() - if not driver or not login(driver, username, password): - task_store[task_id] = {"status": "error", "progress": "Login fehlgeschlagen."} - return - - wait = WebDriverWait(driver, 15) - - # 1. Navigate to albums overview - albums_overview_url = f"https://app.fotograf.de/config_jobs_photos/index/{job_id}" - task_store[task_id]["progress"] = "Lade Alben-Übersicht..." - driver.get(albums_overview_url) - - albums_to_visit = [] + # 1. Sync data from CSV (This takes ~20s and gets all parent emails, logins and orders) try: - album_rows = wait.until(EC.presence_of_all_elements_located((By.XPATH, SELECTORS["album_overview_rows"]))) - for row in album_rows: - try: - album_link = row.find_element(By.XPATH, SELECTORS["album_overview_link"]) - albums_to_visit.append({"name": album_link.text, "url": album_link.get_attribute('href')}) - except NoSuchElementException: - continue - except TimeoutException: - task_store[task_id] = {"status": "error", "progress": "Konnte die Album-Liste nicht finden."} + sync_participants(job_id, account_type, db) + except Exception as sync_err: + logger.error(f"Sync failed during reminder analysis: {sync_err}") + # Continue anyway if we have some data, or fail if we have none + count = db.query(JobParticipant).filter(JobParticipant.job_id == job_id).count() + if count == 0: + task_store[task_id] = {"status": "error", "progress": f"Synchronisierung fehlgeschlagen: {str(sync_err)}"} + return + + # 2. Query DB for potential candidates (Logins <= 1 and No Orders) + task_store[task_id]["progress"] = "Analysiere Datenbank-Einträge..." + + candidates = db.query(JobParticipant).filter( + JobParticipant.job_id == job_id, + JobParticipant.has_orders == 0, + JobParticipant.logins <= 1, + JobParticipant.email_eltern != "", + JobParticipant.email_eltern != None + ).all() + + if not candidates: + task_store[task_id] = { + "status": "completed", + "progress": "Keine passenden Empfänger (0-1 Logins, keine Bestellung) gefunden.", + "result": [] + } return - raw_results = [] - total_albums = len(albums_to_visit) - - for index, album in enumerate(albums_to_visit): - album_name = album['name'] - task_store[task_id]["progress"] = f"Album {index+1}/{total_albums}: '{album_name}'..." - driver.get(album['url']) - - try: - total_codes_text = wait.until(EC.visibility_of_element_located((By.XPATH, SELECTORS["access_code_count"]))).text - num_pages = math.ceil(int(total_codes_text) / 20) - - for page_num in range(1, num_pages + 1): - task_store[task_id]["progress"] = f"Album {index+1}/{total_albums}: '{album_name}' (Seite {page_num}/{num_pages})..." - if page_num > 1: - driver.get(album['url'] + f"?page_guest_accesses={page_num}") - - person_rows = wait.until(EC.presence_of_all_elements_located((By.XPATH, SELECTORS["person_rows"]))) - num_persons = len(person_rows) - - for i in range(num_persons): - # Re-locate rows to avoid stale element reference - person_rows = wait.until(EC.presence_of_all_elements_located((By.XPATH, SELECTORS["person_rows"]))) - person_row = person_rows[i] - - login_count_text = person_row.find_element(By.XPATH, ".//span[text()='Logins']/following-sibling::strong").text - - # Only interested in people with 0 or 1 logins (potential reminders) - # Actually, if they haven't bought yet, they might need a reminder regardless of logins, - # but the legacy logic uses login_count <= 1. - # Let's stick to the legacy logic for now. - if int(login_count_text) <= 1: - vorname = person_row.find_element(By.XPATH, ".//span[text()='Vorname']/following-sibling::strong").text - - try: - photo_container = person_row.find_element(By.XPATH, "./following-sibling::div[1]") - purchase_icons = photo_container.find_elements(By.XPATH, ".//img[@alt='Bestellungen mit diesem Foto']") - if len(purchase_icons) > 0: - continue - except NoSuchElementException: - pass - - # Potential candidate - access_code_page_url = person_row.find_element(By.XPATH, ".//a[contains(@data-qa-id, 'guest-access-banner-access-code')]").get_attribute('href') - - # Open in new tab or navigate back and forth? - # Scraper.py navigates back and forth. - driver.get(access_code_page_url) - - try: - wait.until(EC.visibility_of_element_located((By.XPATH, "//a[@id='quick-login-url']"))) - quick_login_url = driver.find_element(By.XPATH, "//a[@id='quick-login-url']").get_attribute('href') - potential_buyer_element = driver.find_element(By.XPATH, "//a[contains(@href, '/config_customers/view_customer')]") - buyer_name = potential_buyer_element.text - - potential_buyer_element.click() - email = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[contains(., '@')]"))).text - - raw_results.append({ - "child_name": vorname, - "buyer_name": buyer_name, - "email": email, - "quick_login": quick_login_url - }) - except Exception as e: - logger.warning(f"Error getting details for {vorname}: {e}") - - # Go back to the album page - driver.get(album['url'] + (f"?page_guest_accesses={page_num}" if page_num > 1 else "")) - wait.until(EC.presence_of_element_located((By.XPATH, SELECTORS["person_rows"]))) - - except Exception as e: - logger.error(f"Fehler bei Album '{album_name}': {e}") - continue - - # Aggregate Results - task_store[task_id]["progress"] = "Aggregiere Ergebnisse..." - aggregated_data = {} - for res in raw_results: - email = res['email'] - child_name = "Familienbilder" if res['child_name'] == "Familie" else res['child_name'] - html_link = f'Fotos von {child_name}' - - if email not in aggregated_data: - aggregated_data[email] = { - 'buyer_first_name': res['buyer_name'].split(' ')[0], - 'email': email, - 'children': [child_name], - 'links': [html_link] + # 3. Aggregate results by Email + aggregation = {} + for c in candidates: + email = c.email_eltern + if email not in aggregation: + aggregation[email] = { + "email": email, + "parent_name": c.vorname_eltern if c.vorname_eltern else "Liebe Eltern", + "children": [], + "links": [] } - else: - if child_name not in aggregated_data[email]['children']: - aggregated_data[email]['children'].append(child_name) - aggregated_data[email]['links'].append(html_link) - - final_list = [] - for email, data in aggregated_data.items(): - names = data['children'] - if len(names) > 2: - names_str = ', '.join(names[:-1]) + ' und ' + names[-1] - else: - names_str = ' und '.join(names) - - final_list.append({ - 'Name Käufer': data['buyer_first_name'], - 'E-Mail-Adresse Käufer': email, - 'Kindernamen': names_str, - 'LinksHTML': ''.join(data['links']) - }) + # Add child name + child_name = c.vorname_kind or "" + child_label = "Familienbilder" if child_name.lower() == "familie" else child_name + if child_label and child_label not in aggregation[email]["children"]: + aggregation[email]["children"].append(child_label) + + # Add Quick Login Link + link = f"https://www.kinderfotos-erding.de/a/{c.zugangscode}" + html_link = f'Fotos von {child_label}' + if html_link not in aggregation[email]["links"]: + aggregation[email]["links"].append(html_link) + + # 4. Format for Supermailer/Gmail + final_result = [] + for email, data in aggregation.items(): + children_str = " und ".join(data["children"]) if len(data["children"]) > 1 else (data["children"][0] if data["children"] else "Eurem Kind") + links_html = "".join([f"{l}" for l in data["links"]]) + + final_result.append({ + "E-Mail-Adresse Käufer": email, + "Name Käufer": data["parent_name"], + "Kindernamen": children_str, + "Anzahl Kinder": len(data["children"]), + "LinksHTML": links_html + }) + task_store[task_id] = { - "status": "completed", - "progress": "Analyse abgeschlossen!", - "result": final_list + "status": "completed", + "progress": f"Analyse fertig! {len(final_result)} Empfänger identifiziert.", + "result": final_result } except Exception as e: logger.exception(f"Error in task {task_id}") task_store[task_id] = {"status": "error", "progress": f"Fehler: {str(e)}"} finally: - if driver: driver.quit() + db.close() from fastapi import FastAPI, HTTPException, Depends, BackgroundTasks, UploadFile, File, Form from fastapi.middleware.cors import CORSMiddleware @@ -1092,6 +1084,124 @@ async def send_bulk_emails(request: BulkEmailRequest, db: Session = Depends(get_ "failed": failed_emails } +def sync_participants(job_id: str, account_type: str, db: Session): + logger.info(f"Syncing participants for job {job_id} ({account_type})") + username = os.getenv(f"{account_type.upper()}_USER") + password = os.getenv(f"{account_type.upper()}_PW") + + with tempfile.TemporaryDirectory() as temp_dir: + driver = setup_driver(download_path=temp_dir) + try: + if not login(driver, username, password): + raise Exception("Login failed.") + + # Navigate to the Persons tab + job_url = f"https://app.fotograf.de/config_jobs_settings/index/{job_id}" + driver.get(job_url) + wait = WebDriverWait(driver, 30) + + personen_tab = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "[data-qa-id='link:photo-jobs-tabs-names_list']"))) + driver.execute_script("arguments[0].click();", personen_tab) + + # Click Export -> CSV + export_btn = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, SELECTORS["export_dropdown"]))) + driver.execute_script("arguments[0].click();", export_btn) + time.sleep(1) + csv_btn = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, SELECTORS["export_csv_link"]))) + driver.execute_script("arguments[0].click();", csv_btn) + + # Wait for download + csv_file = None + for _ in range(45): + files = os.listdir(temp_dir) + csv_files = [f for f in files if f.endswith('.csv')] + if csv_files: + csv_file = os.path.join(temp_dir, csv_files[0]) + break + time.sleep(1) + + if not csv_file: + raise Exception("CSV download timed out.") + + # Read CSV with pandas + df = None + for sep in [";", ","]: + try: + df = pd.read_csv(csv_file, sep=sep, encoding="utf-8-sig") + if len(df.columns) > 1: break + except: continue + + if df is None: raise Exception("Could not parse CSV.") + + # Clean columns + df.columns = df.columns.str.strip().str.replace("\"", "") + logger.debug(f"Sync CSV Columns: {list(df.columns)}") + + # Column Mapping + mapping = { + "Child ID": "child_id", + "Email der Eltern (1)": "email_eltern", + "Vorname Eltern (1)": "vorname_eltern", + "Nachname Eltern (1)": "nachname_eltern", + "Vorname Kind": "vorname_kind", + "Nachname Kind": "nachname_kind", + "Zugangscode (1)": "zugangscode", + "Logins (1)": "logins", + "Bestellungen": "has_orders", + "Gruppe": "gruppe", + "Klasse": "gruppe" + } + + # Upsert into database + for _, row in df.iterrows(): + code = str(row.get("Zugangscode (1)", "")).strip() + if not code or code == "nan": continue + + def clean_val(val): + v = str(val).strip() + return "" if v.lower() == "nan" else v + + # Determine order status + orders_val = str(row.get("Bestellungen", "0")).lower() + has_orders = 1 if (orders_val != "0" and orders_val != "nan" and orders_val != "") else 0 + + # Determine logins + logins_val = row.get("Logins (1)", 0) + try: logins = int(float(logins_val)) + except: logins = 0 + + participant = db.query(JobParticipant).filter(JobParticipant.job_id == job_id, JobParticipant.zugangscode == code).first() + if not participant: + participant = JobParticipant(job_id=job_id, zugangscode=code) + db.add(participant) + + participant.child_id = clean_val(row.get("Child ID")) + participant.vorname_kind = clean_val(row.get("Vorname Kind")) + participant.nachname_kind = clean_val(row.get("Nachname Kind")) + participant.vorname_eltern = clean_val(row.get("Vorname Eltern (1)")) + participant.nachname_eltern = clean_val(row.get("Nachname Eltern (1)")) + participant.email_eltern = clean_val(row.get("Email der Eltern (1)")).lower() + participant.gruppe = clean_val(row.get("Gruppe", row.get("Klasse"))) + participant.logins = logins + participant.has_orders = has_orders + participant.last_synced = datetime.datetime.utcnow() + + db.commit() + logger.info(f"Successfully synced {len(df)} participants for job {job_id}") + return len(df) + + finally: + driver.quit() + +@app.post("/api/jobs/{job_id}/sync-participants") +async def sync_participants_api(job_id: str, account_type: str, db: Session = Depends(get_db)): + try: + count = sync_participants(job_id, account_type, db) + return {"status": "success", "count": count} + except Exception as e: + logger.exception("Sync failed") + raise HTTPException(status_code=500, detail=str(e)) + @app.get("/api/jobs/{job_id}/generate-pdf") async def generate_pdf(job_id: str, account_type: str, db: Session = Depends(get_db)): logger.info(f"API Request: Generate PDF for job {job_id} ({account_type})") diff --git a/fotograf-de-scraper/frontend/src/App.tsx b/fotograf-de-scraper/frontend/src/App.tsx index be16d29b5..57a32d4d0 100644 --- a/fotograf-de-scraper/frontend/src/App.tsx +++ b/fotograf-de-scraper/frontend/src/App.tsx @@ -45,6 +45,24 @@ function App() { const [isReminderRunning, setIsReminderRunning] = useState(false); const [latestFile, setLatestFile] = useState(null); const [isGmailAuthenticated, setIsGmailAuthenticated] = useState(false); + const [isSyncing, setIsSyncing] = useState(false); + + const handleSyncParticipants = async (job: Job) => { + setIsSyncing(true); + try { + const response = await fetch(`${API_BASE_URL}/api/jobs/${job.id}/sync-participants?account_type=${activeTab}`, { + method: 'POST' + }); + if (response.ok) { + alert("Daten erfolgreich mit Fotograf.de synchronisiert!"); + } else { + alert("Synchronisierung fehlgeschlagen."); + } + } catch (e) { + alert("Netzwerkfehler."); + } + setIsSyncing(false); + }; // Email States const [reminderResult, setReminderResult] = useState(null); @@ -958,7 +976,25 @@ function App() { {et.name} ))} - Wird für QR-Karten und die Terminübersicht benötigt. + Wird für QR-Karten und die Terminübersicht benötigt. + + + handleSyncParticipants(selectedJob)} + disabled={isSyncing} + className="w-full px-3 py-2 bg-white border border-indigo-200 text-indigo-600 text-xs font-bold rounded-lg hover:bg-indigo-50 transition-colors flex items-center justify-center gap-2" + > + {isSyncing ? ( + <> + + Sync läuft... + > + ) : ( + <>🔄 Daten von Fotograf.de abgleichen> + )} + + Aktualisiert E-Mails, Logins & Bestellstatus. + {/* Actions */}
Wird für QR-Karten und die Terminübersicht benötigt.
Aktualisiert E-Mails, Logins & Bestellstatus.