Shipping 100,000 construction PDFs a month: what actually breaks (opens in new tab)
After a year running a document processing pipeline through hundreds of thousands of construction documents (tender packs, permit applications, site surveys, BIM exports, drawing sets at A0 and larger), I can tell you what actually breaks. It is not the PDFs. That is the thing most people get wrong about systems like this. Every PDF tutorial focuses on the parser: which library, which model, which extraction service. After a year, the PDFs themselves rank third on the list of things that brea...
Read the original article