Intelligent Document Processing

  • 2025
  • Completed

This was a close collaboration with a client, in which we developed a system that allows comparing large volumes of documents and automatically detecting changes between versions. We designed it so that our clients can quickly review important differences, reducing a process that used to take weeks to just minutes.

How did we do it?

During the project, we adopted several strategies:

  1. Data extraction and normalization: We used pdfplumber and Azure Document Intelligence to process PDFs and tables, including OCR for scanned documents and conversion of tables to CSV.

  2. Intelligent refinement: We normalized and organized the information to avoid incorrect comparisons and ensure accurate results.

  3. Enhanced with LLMs: We incorporated state-of-the-art language models (OpenAI) to label and compare complex sections where traditional methods fail, detecting similarities and differences in a broader context.

Results

The combination of the different tools allowed us to create a robust program that runs monthly, enabling our client to quickly see the differences between their documents, streamlining a process that previously could take weeks to complete.