A comprehensive PDF processing pipeline that extracts structured data from complex PDFs, including OCR text, tables, images, and rich context-aware metadata using Large Language Models (LLMs).