How to Build a "Chat with Website" App using Next.js, LangChain, and Cheerio ๐Ÿฆœ๐Ÿ”—
dev.toยท3dยท
Discuss: DEV
๐Ÿค–n8n, automation, AI agents, Gemini, Claude, openrouter, grok, chatgpt
Preview
Report Post

Building RAG (Retrieval Augmented Generation) apps usually starts with PDFs. ๐Ÿ“„ But letโ€™s be honest: users really want to chat with live URLsโ€”documentation, wikis, and blogs. ๐ŸŒ

I spent this weekend adding a Web Scraper to my RAG Starter Kit. Here is the technical breakdown of how I built it, so you can do it too. ๐Ÿ‘‡

๐Ÿ›‘ The Problem with Scraping for LLMs

You canโ€™t just fetch(url) and pass the HTML to GPT-4.

  1. Too much noise: Navbars, footers, and ads waste tokens. ๐Ÿ’ธ
  2. Context Window: Raw HTML is huge and confuses the model.
  3. Headless Browsers: Tools like Puppeteer are heavy and often timeout on serverless functions (like Vercel). โณ

๐Ÿ›  The Stack

  • Framework: Next.js 14
  • Scraper: Cheerio (via LangChain). It parses HTML like jQueryโ€ฆ

Similar Posts

Loading similar posts...