philmcmahon/data-pipeline: Repository for workshop at data harvest 2026 on rapidly analysing documents using the public cloud (opens in new tab)

Covers 2 stories including Installation

The aim of this workshop is to build a scalable pipeline to use to process large datasets. We'll focus on the following tasks: OCR (extracting text from images) Transcription Running LLM prompts on a document

Read the original article