(LLMs) like Gemini have revolutionised what’s possible in software development. Their ability to understand, generate, and reason about text is remarkable. However, they have a fundamental limitation: they only know what they were trained on. They’re unaware of your company’s internal documentation, your project’s specific codebase, or the latest research paper published yesterday.
To build intelligent and practical applications, we need to bridge this gap and ground the model’s vast reasoning capabilities in your own specific, private data. This is the domain of Retrieval-Augmented Generation (RAG). This powerful technique retrieves relevant information from, typically, an external knowledge base. Then it provides it to the LLM as context to generate a more accurate, appropria…
(LLMs) like Gemini have revolutionised what’s possible in software development. Their ability to understand, generate, and reason about text is remarkable. However, they have a fundamental limitation: they only know what they were trained on. They’re unaware of your company’s internal documentation, your project’s specific codebase, or the latest research paper published yesterday.
To build intelligent and practical applications, we need to bridge this gap and ground the model’s vast reasoning capabilities in your own specific, private data. This is the domain of Retrieval-Augmented Generation (RAG). This powerful technique retrieves relevant information from, typically, an external knowledge base. Then it provides it to the LLM as context to generate a more accurate, appropriate, and verifiable response to questions.
While highly effective, building a robust RAG pipeline from scratch is a significant engineering challenge. It involves a complex sequence of steps:
- Data Ingestion and Chunking. Parsing various file formats (PDFs, DOCX, etc.) and intelligently splitting them into smaller, semantically meaningful chunks.
- Embedding Generation. Using an embedding model to convert these text chunks into numerical vector representations.
- Vector Storage. Setting up, managing, and scaling a dedicated vector database to store these embeddings for efficient searching.
- Retrieval Logic. Implementing a system to take a user’s query, embed it, and perform a similarity search against the vector database to find the most relevant chunks.
- Context Injection. Dynamically inserting the retrieved chunks into a prompt for the LLM in a way that it can effectively use the information. Each of these steps requires careful consideration, infrastructure management, and ongoing maintenance.
Each of these steps requires careful consideration, infrastructure management, and ongoing maintenance.
Recently, continuing its effort to bring an end to traditional RAG as we know it, Google has bought out yet another new product targeting this space. Google’s new File Search tool completely obviates the need for you to chunk, embed and vectorise your documents before carrying out semantic searches on them.
What is the Google File Search tool?
At its core, the File Search Tool is a powerful abstraction layer over a complete RAG pipeline. It handles the entire lifecycle of your data, from ingestion to retrieval, providing a simple yet powerful way to ground Gemini’s responses in your documents.
Let’s break down its core components and the problems they solve.
1) Simple, Integrated Developer Experience
File Search is not a separate API or a complex external service you need to orchestrate. It’s implemented as a Tool directly within the existing Gemini** API. **This seamless integration enables you to add powerful RAG capabilities to your application with just a few additional lines of code. The tool automatically…
- Securely stores your uploaded documents.
- Applies sophisticated strategies to break down your documents into appropriately sized, coherent chunks for the best retrieval results.
- Processes your files, generates embeddings using Google’s state-of-the-art models, and indexes them for fast retrieval.
- Handles the retrieval and injects the relevant context into the prompt sent to Gemini.
2) Powerful Vector Search at its Core
The retrieval engine is powered by the gemini-embedding-001 model, designed for high-performance semantic search. Unlike traditional keyword searching, which only finds exact matches, vector search understands the** meaning and context** of a query. This allows it to surface relevant information from your documents even if the user’s query uses entirely different wording.
3) Built-in Citations for Verifiability
Trust and transparency are critical for enterprise-grade AI applications. The File Search Tool automatically includes grounding metadata in the model’s response. This metadata contains citations that specify exactly which parts of which source documents were used to generate the answer.
This is an important feature that allows you to:-
- Verify Accuracy. Easily check the model’s sources to confirm the correctness of its response.
- Build User Trust. Show users where the information is coming from, increasing their confidence in the system.
- Enable Deeper Exploration. Provides links to the source documents, enabling users to explore topics of interest in greater depth.
4. Support for a Wide Range of Formats.
A knowledge base is rarely composed of simple text files. The File Search Tool supports a wide range of standard file formats out of the box, including PDF, DOCX, TXT, JSON, and various programming language and application file formats. This flexibility means you can build a comprehensive knowledge base from your existing documents without needing to perform cumbersome pre-processing or data conversion steps.
5. Affordability
Google has made using its File Search tool extremely cost-effective. Storage and embedding of queries is free of charge. You only pay for any embeddings of your initial document contents, which can be as low as $0.15 per 1 million tokens (based on, for example, the gemini-embedding-001 embedding model).
Using File Search
Now that we have a better idea of what the File Search tool is, it’s time to see how we can use it in our workflows. For that, I’ll be showcasing some example Python code that shows you how to call and use File Search.
However, before that, it is best practice to set up a separate development environment to keep our various projects isolated from each other.
I’ll be using the UV tool for this and will run my code in a Jupyter notebook under WSL2 Ubuntu for Windows. However, feel free to use whichever package manager suits you best.
$ cd projects
$ uv init gfs
$ cd gfs
$ uv venv
$ source gfs/bin/activate
(gfs) $ uv pip install google-genai jupyter
You’ll also need a Gemini API key, which you can get from Google’s AI Studio home page using the link below.
Look for a Get API Key link near the bottom left of the screen after you’ve logged in.
Example code — a simple search on a PDF document
For testing purposes, I downloaded the user manual for the Samsung S25 mobile phone from their website to my local desktop PC. It’s over 180 pages long. You can get it using this link.
Start up Jupyter notebook and type in the following code into a cell.
import time
from google import genai
from google.genai import types
client = genai.Client(api_key='YOUR_API_KEY')
store = client.file_search_stores.create()
upload_op = client.file_search_stores.upload_to_file_search_store(
file_search_store_name=store.name,
file='SM-S93X_UG_EU_15_Eng_Rev.2.0_250514.pdf'
)
while not upload_op.done:
time.sleep(5)
upload_op = client.operations.get(upload_op)
# Use the file search store as a tool in your generation call
response = client.models.generate_content(
model='gemini-2.5-flash',
contents='What models of phone does this document apply to ...',
config=types.GenerateContentConfig(
tools=[types.Tool(
file_search=types.FileSearch(
file_search_store_names=[store.name]
)
)]
)
)
print(response.text)
After importing the required libraries, we create a “file search store”, which is a container for the data and indexes of your uploaded files. Next, we upload our input file to the store and wait until the upload has completed.
Next, we call the generate_content function, which will answer the question we posed to our chosen model (Gemini 2.5 flash in our example) about our input file, before printing out the model’s response.
Here is the response I got when running the above code.
This document applies to the following phone models:
SM-S931B,
SM-S931B/DS,
SM-S936B,
SM-S936B/DS,
SM-S937B,
SM-S937B/DS,
SM-S938B, and SM-S938B/DS. It also specifically mentions
Galaxy S25 Ultra, Galaxy S25 Edge, Galaxy S25+, and Galaxy S25.
We can verify the accuracy of this information by examining the first page of the PDF, where the list of relevant models is provided.
Image by Author
Let’s dig a little deeper and find a trickier question. Say you want to find out how to turn off your screen automatically when it’s not in use. On page 156 of the PDF, it says this:
You can set the screen to turn off automatically when you are not using it. Open Settings, tap Display → Screen timeout, and then select the length of time you want the device to wait before turning off the screen.
Can the File Search tool identify this?
...
...
...
# Use the file search store as a tool in your generation call
response = client.models.generate_content(
model='gemini-2.5-flash',
contents='How do I set the screen to turn off automatically when not in use',
config=types.GenerateContentConfig(
tools=[types.Tool(
file_search=types.FileSearch(
file_search_store_names=[store.name]
)
)]
)
)
print(response.text)
And the response?
To set your device's screen to turn off automatically when not in use,
you can adjust the "Screen timeout" setting. This setting determines how
long the device waits before turning off the display's backlight.
For a Samsung device (as indicated by the provided user guide), you can
typically find this option by navigating to:
Settings → Display → Screen timeout.
There, you can select the desired length of time before the screen turns off.
Be aware that File Search can also utilise its model’s own internal knowledge base when answering questions, without necessarily consulting the document store to find an answer.
Dealing with multiple input files
If your document corpus consists of multiple files, it’s easy to incorporate them all using a simple for loop, but you should be aware of some of the limitations of File Search. From Google’s own documentation, these limits are,
The File Search API has the following limits to enforce service stability: Maximum file size / per document limit: 100 MB Total size of project File Search stores (based on user tier): Free: 1 GB Tier 1: 10 GB Tier 2: 100 GB Tier 3: 1 TB
Controlling the chunking
When a file is added to a File Search store, the system automatically splits it into smaller chunks, embeds and indexes the content, and then uploads it. If you want to fine-tune how this segmentation happens, you can use the chunking_config option to set limits on chunk size and specify how many tokens should overlap between chunks. Here’s a code snippet showing how you would do that.
...
...
operation = client.file_search_stores.upload_to_file_search_store(
file_search_store_name=file_search_store.name,
file='SM-S93X_UG_EU_15_Eng_Rev.2.0_250514.pdf'
config={
'chunking_config': {
'white_space_config': {
'max_tokens_per_chunk': 200,
'max_overlap_tokens': 20
}
}
}
)
...
...
How does File Search differ from Google’s other RAG-related tools, such as Context Grounding and LangExtract?
I’ve recently written articles on two similar products from Google in this space: Context Grounding and LangExtract. On the surface, they do similar things. And that’s right — up to a point.
The main difference is that File Search is an actual RAG product in that it stores your document embeddings permanently, whereas the other two tools don’t. This means that once your embeddings are in the File Search store, they remain there forever or until you choose to delete them. You don’t have to re-upload your files every time you want to answer a question on them.
Here’s a handy table of the differences for reference.
+--------------------+--------------------------------------+---------------------------------------+--------------------------------------+
| Feature | Google File Search | Google Context Grounding | LangExtract |
+--------------------+--------------------------------------+---------------------------------------+--------------------------------------+
| Primary Goal | To answer questions and generate | Connects model responses to verified | Extract specific, structured data |
| | content from private documents. | sources to improve accuracy and | (like JSON) from unstructured text. |
| | | reduce hallucinations. | |
+--------------------+--------------------------------------+---------------------------------------+--------------------------------------+
| Input | User prompt and uploaded files | User prompt and configured data | Unstructured text plus schema or |
| | (PDFs, DOCX, etc.). | source (e.g., Google Search, URL). | prompt describing what to extract. |
+--------------------+--------------------------------------+---------------------------------------+--------------------------------------+
| Output | Conversational answer grounded in | Fact-checked natural language answer | Structured data (e.g., JSON) mapping |
| | provided files with citations. | with links or references. | info to original text. |
+--------------------+--------------------------------------+---------------------------------------+--------------------------------------+
| Underlying Process | Managed RAG system that chunks, | Connects model to info source; uses | LLM-based library for targeted info |
| | embeds, and indexes files. | File Search, Google Search, etc. | extraction via examples. |
+--------------------+--------------------------------------+---------------------------------------+--------------------------------------+
| Typical Use Case | Chatbot for company knowledge base | Answering recent events using live | Extracting names, meds, dosages from |
| | or manuals. | Google Search results. | clinical notes for a database. |
+--------------------+--------------------------------------+---------------------------------------+--------------------------------------+
Deleting a file search store
Google automatically deletes your raw file contents from its File Store after 48 hours, but it retains the document embeddings, allowing you to continue querying your document contents. If you decide they are no longer needed, you can delete them. This can be done programmatically as shown in the code snippet below.
...
...
...
# deleting the stores
# List all your file search stores
for file_search_store in client.file_search_stores.list():
name = file_search_store.name
print(name)
# Get a specific file search store by name
my_file_search_store = client.file_search_stores.get(name='your_file_search_store_name')
# Delete a file search store
client.file_search_stores.delete(name=my_file_search_store.name, config={'force': True})
Summary
Traditionally, building a RAG pipeline required complex steps — ingesting data, splitting it into chunks, generating embeddings, setting up vector databases, and injecting retrieved context into prompts. Google’s new File Search tool abstracts all these tasks away, offering a fully managed, end-to-end RAG solution integrated directly into the Gemini API via the generateContent call.
In this article, I outlined some of the key features and advantages of File Search before providing a fully working Python code example of its use. My example demonstrated the uploading of a large PDF file (a Samsung phone manual) into a File Search store and querying it through the Gemini model and API to accurately extract specific information. I also showed code you can use to micro-manage your document’s chunking strategy if the default employed by File Search doesn’t meet your needs. Finally, to keep costs to a minimum, I also provided a code snippet showing how to delete unwanted Stores when you’re done with them.
As I was writing this, it occurred to me that, on the face of it, this tool shares many similarities with other Google products in this space that I’ve written about before, i.e LangExtract and Context Grounding. However, I went on to explain that there were key differentiators in each, with File Search being the only true RAG system of the three, and highlighted the differences in an easy-to-read table format.
There is much more to Google’s File Search tool than I was able to cover in this article, including the use of File Metadata and Citations. I encourage you to explore Google’s API documentation online using the link below for a comprehensive description of all File Search’s capabilities.
https://ai.google.dev/gemini-api/docs/file-search#file-search-stores