Files
Brancheneinstufung2/debug_igepa_dump.py
Floke e684f33bed feat(company-explorer): bump version to 0.3.0, add VAT ID extraction, and fix deep-link scraping
- Updated version to v0.3.0 (UI & Backend) to clear potential caching confusion.
- Enhanced Impressum scraper to extract VAT ID (Umsatzsteuer-ID).
- Implemented 2-Hop scraping strategy: Looks for 'Kontakt' page if Impressum isn't on the start page.
- Added VAT ID display to the Legal Data block in Inspector.
2026-01-08 16:14:01 +01:00

28 lines
826 B
Python

import requests
from bs4 import BeautifulSoup
url = "https://www.igepa.de/"
print(f"Fetching {url}...")
try:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}
response = requests.get(url, headers=headers, verify=False, timeout=15)
soup = BeautifulSoup(response.content, 'html.parser')
print(f"Page Title: {soup.title.string if soup.title else 'No Title'}")
print("\n--- All Links (First 50) ---")
count = 0
for a in soup.find_all('a', href=True):
text = a.get_text().strip().replace('\n', ' ')
href = a['href']
print(f"[{count}] {text[:30]}... -> {href}")
count += 1
if count > 50: break
except Exception as e:
print(f"Error: {e}")