philmcmahon/data-pipeline: Repository for workshop at data harvest 2026 on rapidly analysing documents using the public cloud (opens in new tab)
The aim of this workshop is to build a scalable pipeline to use to process large datasets. We'll focus on the following tasks: OCR (extracting text from images) Transcription Running LLM prompts on a document
Read the original article