""" Pure Python metadata extractor - no lxml, no memory leaks. This module provides a fast, memory-efficient alternative to extruct for common e-commerce metadata extraction. It handles: - JSON-LD (covers 80%+ of modern sites) - OpenGraph meta tags - Basic microdata attributes Uses Python's built-in html.parser instead of lxml/libxml2, avoiding C-level memory allocation issues. For edge cases, the main processor can fall back to extruct (with subprocess isolation on Linux). """ from html.parser import HTMLParser import json import re from loguru import logger class JSONLDExtractor(HTMLParser): """ Extract JSON-LD structured data from HTML. Finds all