Building Our Gen AI Sales Knowledge Assistant on Snowflake

6 min read4 days ago

–

*This post was co-authored with *Matt Loskamp

Sales teams thrive on information — the right insight at the right time can make or break a deal. Yet, that information often lives across countless documents, presentations, and guides. Finding what you need can feel like searching for a needle in a haystack*.*

To solve this, we built the GTM AI Assistant — a Generative AI–powered chatbot designed to bring clarity and speed to the selling process. Behind the scenes, the assistant searches across our internal sales enablement material, Snowflake’s public website, and official product documentation to deliver accurate, contextual insights. It’s like having every pitch deck, product FAQ, and enablement guide sitting one ques…

6 min read4 days ago

–

*This post was co-authored with *Matt Loskamp

And it’s working: our internal NPS for the assistant is over 90%, and where more than 70% of users indicated that they use the GTM assistant for knowledge discovery around product expertise, process definitions, customer success stories and more. Sellers rely on it daily to prep for customer meetings, validate messaging, and stay aligned — turning what used to be a scavenger hunt into a single, conversational query.

Instant Answers, Powered by Generative AI

The GTM AI Assistant transforms how our teams access and use information. Rather than navigating repositories, users can simply ask:

What are the main benefits of Priority Support?
Share with me some impactful customer stories highlighting Snowflake Cortex.
What’s the best workflow to engage with our legal team?

Behind the scenes, the assistant retrieves and synthesizes the most relevant guidance from Compass, returning answers that are both accurate and conversational.

The Engine: Why We Chose Snowflake Cortex Search

When we set out to build this tool, we wanted a solution that was simple, scalable, and secure. Instead of maintaining a separate vector database — with its complex pipelines for embedding, monitoring, and updating — we built directly on Snowflake Cortex Search Service (CSS).

Cortex Search simplifies everything. It allows us to query, update, and control access to our knowledge base with a few lines of SQL, while automatically keeping the data fresh as source materials evolve.

Here’s a quick comparison of the old versus the new approach:

Press enter or click to view image in full size

Our Architecture: A Retrieval Augmented Generation (RAG) Pipeline

The assistant runs on a Retrieval-Augmented Generation (RAG) pipeline built entirely within Snowflake. It retrieves relevant information from our knowledge base and uses a Cortex LLM Function to generate clear, contextual answers — all within Snowflake’s secure data environment.

This diagram shows the core RAG pipeline built in Snowflake.

Press enter or click to view image in full size

**Ingest & Process **Files are synced from our internal knowledge platform via the Google Drive API. From there we extract text from various file types while preserving structure and formatting.
**Augment **The extracted content is divided into logical chunks. Each chunk is converted into an embedding and loaded into our internal knowledge base, powered by CSS.
**Retrieve **When a user asks a question, Cortex Search automatically finds the most relevant chunks across all materials — no manual embedding or pipeline required.
**Generate **The retrieved context and user query are passed to a Cortex LLM Function, which synthesizes a comprehensive, contextual response. The system even retains conversation history, enabling natural follow-ups.

Lessons from the Trenches: Tips for Handling Unstructured Data

Building this system taught us one thing fast — unstructured data never behaves the way you expect it to. Every file type, layout, and format comes with its own quirks, and what looks clean in Compass or Google Drive can turn into chaos once you start parsing it at scale. From tables disguised as text boxes to 100-page docs without page numbers, we’ve seen it all.

Here are some of the biggest lessons we learned turning that chaos into a reliable, structured knowledge base that powers our GTM AI Assistant.

1. Extraction and Parsing

Extracting text from documents sounds simple — until you realize how unpredictable real-world layouts can be.

**The “Invisible Table” Problem: **Slides often contain tables that are actually stacks of text boxes — sometimes overlapping or layered inconsistently.
**Maintaining Layouts: **Flattened text from tables can lose row/column relationships. Preserving layout metadata proved essential for maintaining meaning.
**Precise Referencing: **Large Google Docs often lack page numbers, so we built custom reference links for more accurate source citations.

👉 Tip: Budget time for trial and error when defining your extraction approach. No single method fits every document type.

2. Chunking and Embedding

How you structure and embed text directly affects retrieval quality.

**Chunking Strategy: **Finding the right balance between context size and specificity is crucial. CSS’s hybrid semantic + lexical search means chunk design matters more than ever. Well-structured chunks ensure that information is neither too fragmented nor too diluted, allowing the model to surface highly relevant results efficiently.
**Embedding Model Choice: **Related to Chunking Strategy, choosing the right model is equally critical. Different models vary in how they handle contexts and how long the text window they can handle. A well-matched model ensures that the chunking and metadata strategies are fully leveraged — maximizing retrieval quality, maintaining fair latency. Developers should also consider the cost of each type of embedding model.
**Tagging to Boosting Relevance: **We prepend each chunk with metadata — subject, topic, and file path tags — giving the retrieval model stronger signals. This structured context helps the model better understand document hierarchy and relationships, resulting in more precise and context-aware responses.
Handling Visuals: For flowcharts, architecture diagrams, or other visual assets, we incorporate multimodal models that interpret both text and imagery. This ensures that visual context is preserved and can be meaningfully referenced during retrieval and generation.

Our experiments show that, for most knowledge documents, around 400 words per chunk provides a coherent and self-contained unit of meaning. We generally use 400 words with a 100-word overlap as a guiding baseline. However, occasionally extending the chunk size to 700–1000 words helps preserve completeness and contextual continuity. While 400 words roughly aligns with the typical token window of the embedding model we use, lexical search together with proper tags enables the system to surface relevant content even when chunks extend beyond that range — maintaining strong retrieval accuracy. In practice, a chunking strategy of 400-word chunks with a 100-word overlap, plus a slack boundary to occasionally expand up to 1000 words, has proven to be the most effective balance between accuracy and cost efficiency.

Partnering with Content Creators: The Human Side of “AI-Ready” Data

One of the biggest lessons we learned didn’t come from code — it came from collaboration. Many technical challenges in building AI-ready knowledge systems can be simplified through close partnership with the teams that create the content in the first place.

At Snowflake, we regularly meet with our enablement, marketing, and product documentation teams to share insights on how small adjustments — like consistent table formatting, clear section headers, or using text over images for key data — can dramatically improve how well the AI understands and retrieves information.

This cross-functional alignment has been key to our success. We view this shift toward an “AI-ready content policy” not as a one-time fix, but as an evolving partnership. When creators know how their materials will be consumed by AI, the entire system — from ingestion to generation — becomes cleaner, faster, and more reliable.

The Full Data Pipeline

Our end-to-end pipeline, orchestrated with Airflow, processes files from multiple sources including Compass, website blog posts, product documentations, and partner profiles, etc. It extracts content, chunks and tags it, then loads everything into the Cortex Search Service. And it automatically refreshes the data as new or updated materials arrive — ensuring every search result is current.

From Knowledge Chaos to Competitive Clarity

By leveraging Snowflake Cortex Search Service, we built a powerful, dynamic Sales Knowledge Assistant — without the overhead of managing embeddings, sync jobs, or external vector databases.

The result? A seamless experience where sales teams get accurate, up-to-date answers instantly, allowing them to focus on customers instead of content hunting.

This project shows what’s possible when modern data platforms meet unstructured knowledge. And we’re just getting started — the same foundation can empower teams across marketing, support, and beyond to turn scattered content into a true competitive advantage.