The festive holidays are upon us. This is the time for deep reflections, catching up with family and friends, and enjoying life in general as the year comes to a close. This period is also a great time to rewatch old movie classics as we slide into the new year.
Today, we will build a retrieval-augmented generation (RAG) app to give us holiday movie recommendations. We will go through it step by step so that you fully understand how you can create one on your own.
What we will build
A RAG app in 30 minutes using:
Django MongoDB Backend. Django MongoDB Backend is a Djan…
The festive holidays are upon us. This is the time for deep reflections, catching up with family and friends, and enjoying life in general as the year comes to a close. This period is also a great time to rewatch old movie classics as we slide into the new year.
Today, we will build a retrieval-augmented generation (RAG) app to give us holiday movie recommendations. We will go through it step by step so that you fully understand how you can create one on your own.
What we will build
A RAG app in 30 minutes using:
Django MongoDB Backend. Django MongoDB Backend is a Django database backend that uses PyMongo to connect to MongoDB.
Voyage AI. Voyage AI (part of MongoDB) is a platform that provides the tools to build state-of-the-art AI models, semantic search, RAG, and agentic applications.
LangChain. LangChain is an orchestration framework that simplifies and streamlines the development of agentic and RAG apps by providing pre-built tooling for defining your agentic or RAG application, an element of which includes data ingestion (in this case, from a JSON file), preprocessing, creation of embeddings, retrieval, and generation.
What you need
- An IDE (this tutorial uses VSCode)
- Python 3.10 or later
- A MongoDB Atlas account (if you haven’t already, create an account)
- A MongoDB Atlas cluster—the free tier is okay
- A MongoDB connection string (with the username, password, and database included in the string)
- A Voyage AI API key—create an account, and create and save an API key
- Dataset—this is a JSON file with twenty(20) classic holiday movies; please download the file and add it to the root of your project directory; I created this by analyzing the structure of the sample_mflix dataset and creating a JSON document based on that
Dependencies
Create a requirements.txt to contain the:
python-dotenvpackage (version at the time of this tutorial is 1.1.1).voyageaipackage (version at the time of this tutorial is 0.3.5).langchain-voyageaipackage (version at the time of this tutorial is 0.1.7).langchain-mongodbpackage (version at the time of this tutorial is 0.7.1).
Next, we will split the building process into the following steps:
- Download the JSON file.
- Follow the Django MongoDB Backend Quickstart.
- Add dependencies and set environment variables.
- Create models.
- Embed with Voyage AI.
- Insert data into the MongoDB Atlas cluster.
- Integrate MongoDB Atlas Vector Search with LangChain.
- Render data.
Download the JSON file
Based on the sample_mflix model, this is how we will model the objects in the JSON file.
The JSON file, holiday_movies.json, has twenty (20) classic holiday movie objects.
Example of object in document format:
{
"_id": 1,
"title": "It’s a Wonderful Life",
"plot": "An angel is sent from Heaven to help a desperately frustrated businessman by showing him what life would have been like if he had never existed.",
"runtime": 130,
"released": "1946-12-20T00:00:00Z",
"genres": ["Drama", "Family", "Fantasy"],
"awards": {
"wins": 0,
"nominations": 5,
"text": "Nominated for 5 Oscars"
},
"plot_embedding_voyage_3_large": null,
"plot_embedding": null
}
Follow the Django MongoDB Backend Quickstart
This will lay the foundation and ensure you have all the necessary utilities bootstrapped to create a scaffold Django project. After that, you’ll be able to:
Create a Django project—in this case, I called it django_rag_app.
Ensure your database settings in your settings.py file are accurate. It should look like this:
DATABASES = {
"default": {
"ENGINE": "django_mongodb_backend",
"HOST": "<connection string URI>",
"NAME": "sample_mflix",
},
}
After, test if everything is working well by running your server with the command
python manage.py runserver. Then, access this url: http://127.0.0.1:8000/.
You should see a page displaying a "Congratulations!" message and an image of a rocket.
Set Environment Variables
To set environment variables, create an .env file at the root of your directory to hold the MONGODB_URI and VOYAGE_API_KEY values.
This is how the MONGODB_URI looks like:
MONGODB_URI="mongodb+srv://[username]:[password]@[cluster_name].hpljptt.mongodb.net/?retryWrites=true&w=majority&appName=[cluster_name]"
Set dependencies
Run pip install requirements.txt to install the packages.
Create Django app
To create a new Django app (in this case, festive_flix), run
python manage.py startapp [festive_flix] --template https://github.com/mongodb-labs/django-mongodb-app/archive/refs/heads/5.2.x.zip
Add the app festive_flix to INSTALLED APPS.
Create models
In our models.py file, we will add the fields we need as seen in the sample_mflix dataset.
from django.db import models
from django_mongodb_backend.fields import EmbeddedModelField, ArrayField
from django_mongodb_backend.models import EmbeddedModel
class Award(EmbeddedModel):
wins = models.IntegerField(default=0)
nominations = models.IntegerField(default=0)
text = models.CharField(max_length=100)
class Movie(models.Model):
title = models.CharField(max_length=200)
plot = models.TextField(blank=True)
runtime = models.IntegerField(default=0)
released = models.DateTimeField("release date", null=True, blank=True)
awards = EmbeddedModelField(Award, null=True, blank=True)
genres = ArrayField(models.CharField(max_length=100), null=True, blank=True)
def __str__(self):
return self.title
Embed with Voyage AI
Create a new file called plot_embedding.py where we can put the script we will use to embed the reviews field of our holiday_movies.json file.
import json
from dotenv import load_dotenv
import voyageai
import os
load_dotenv()
#using API key from environment variables in .env file
VOYAGE_API_KEY = os.environ.get("VOYAGE_API_KEY")
vo = voyageai.Client(VOYAGE_API_KEY)
with open("holiday_movies.json", "r") as f:
data = json.load(f)
#focusing on "plot" field since that's what we are embedding
movies_list = [movie.get("plot", "") for movie in data.get("movies", [])]
#get embeddings for the plots using "voyage-3-lite"
result = vo.embed(movies_list, model="voyage-3-lite", input_type="document")
#new field to hold the embeddings
for movie, embedding in zip(data.get("movies", []), result.embeddings):
movie["embedding"] = embedding
#write embeddings to a new file: embeddings_holiday_movies.json
with open("embeddings_holiday_movies.json", "w") as f:
json.dump(data, f, indent=2)
From the plot_embedding.py script, we are:
- Setting our key which is stored in the .env file at the root of the Django project.
- Opening up our JSON file that’s already in our project.
- Embedding the reviews field using Voyage AI’s “voyage-3-lite” embedding model.
- Creating a new field (
result) to hold the embedding (that has the same name as what we clarified when creating our models.py file). - Creating a new JSON file to hold the embeddings.
Next, run the script using python plot_embedding.py. Once it’s done, we see a new JSON file with our generated embeddings.
At this point, we can go ahead and insert our documents into our MongoDB cluster.
Insert data into the MongoDB cluster
Next, let’s create a new file at the root of the project named json_upload.py.
#json_upload.py
import json
import os
from pymongo import MongoClient
from dotenv import load_dotenv
load_dotenv()
#get the environment variable and connect
connection_string = os.getenv("MONGODB_URI")
client = MongoClient(connection_string)
#specify the database and collection
database = client["festive_flix_db"]
collection = database["holiday_movies_collection"]
#load json file with embeddings
with open("embeddings_holiday_movies.json", "r") as file:
data = json.load(file)
if isinstance(data,dict) and "movies" in data:
movies = data["movies"]
#clear existing collection and insert new data with embeddings
collection.delete_many({})
#use insert_many to insert multiple documents
result = collection.insert_many(movies)
print(f"Successfully inserted {len(result.inserted_ids)} documents")
After running the json_upload.py file, you will see a new collection of movies in your MongoDB Atlas cluster.
As we can see, our data is saved in our cluster with our Voyage AI embeddings.
Now that we have successfully stored our data in the MongoDB Atlas cluster, we will write a script using the langchain-mongodb package to do semantic search.
Integrate MongoDB Atlas Vector Search with LangChain
At this point, we need to integrate MongoDB Atlas Vector Search with LangChain, so we will create a vector search index, perform a semantic search based on the vector store, retrieve relevant documents based on embedding similarity, and finally, pass the output as a context to LangChain’s prompts to generate responses.
Create a vector search
To create the semantic search, we first need to create an Atlas Vector Search index. To do this, click on the “Search & Vector Search” tab on the left sidebar. In the Atlas Search tab, click on the green “Create Search Index” button.
Next, click on the “Create Search Index” button as shown below.
Next, select “Vector Search,” fill in the details, and select the database and collection that the vector search will be based on.
After, choose the path that contains the array of vector embeddings—it should be embedding.
The number of dimensions is 512 because we used Voyage AI’s voyage-3-lite. If choosing a different model, please read up on Voyage AI’s text embeddings documentation to ensure you’re choosing the correct dimensions. For the similarity method, we are using cosine.
Click “Next,” review all the entered information, and click “Create Vector Search.”
Now, let’s create a new file in your project’s root and name it langchain_integration.py. We are using an embeddings object from the Voyage AI documentation.
NB: VectorSearchIndex is now supported as a Django Index.
Query from the MongoDB vector store, retrieval, and answer generation to prompts
#langchain_integration.py
from langchain_voyageai import VoyageAIEmbeddings
from langchain_mongodb.vectorstores import MongoDBAtlasVectorSearch
from pymongo import MongoClient
from dotenv import load_dotenv
import os
load_dotenv()
voyage_api_key = os.getenv("VOYAGE_API_KEY")
connection_string = os.getenv("MONGO_URI")
#Check if environment variables are loaded
assert voyage_api_key, "ERROR: VOYAGE_API_KEY not found in .env file"
assert connection_string, "ERROR: MONGODB_URI not found in .env file"
#Connect to the Atlas cluster
try:
client = MongoClient(os.getenv("MONGO_URI"))
collection = client["festive_flix_db"]["holiday_movies_collection"]
except Exception as e:
print(f"ERROR: Failed to connect to MongoDB: {e}")
exit(1)
embedding = VoyageAIEmbeddings(
voyage_api_key=os.getenv("VOYAGE_API_KEY"),
model="voyage-3-lite"
)
#Instantiate the vector store with LangChain configuration
vector_store = MongoDBAtlasVectorSearch(
collection=collection,
embedding=embedding,
index_name="vector_index",
text_key="plot",
embedding_key="embedding"
)
#Similarity search
query = "Santa Claus"
print(f"\nSearching for: '{query}'")
try:
# LangChain handles embedding the query
results = vector_store.similarity_search_with_score(query, k=3)
print(f"Found {len(results)} results")
except Exception as e:
print(f"Error occurred: {e}")
results = []
#Display results
if results:
print("\n=== Search Results for the query ===")
for i, (doc, score) in enumerate(results, 1):
# Extract metadata from the document
title = doc.metadata.get("title", "Unknown")
runtime = doc.metadata.get("runtime", "Unknown")
genres = doc.metadata.get("genres", [])
released = doc.metadata.get("released", "Unknown")
awards = doc.metadata.get("awards", {})
plot = doc.page_content
print(f"\nResult {i} (Score: {score:.4f}):")
print(f"Title: {title}")
print(f"Plot: {plot}")
print(f"Runtime: {runtime} minutes")
print(f"Genres: {', '.join(genres) if isinstance(genres, list) else genres}")
print(f"Released: {released[:4] if released != 'Unknown' else 'Unknown'}")
if awards and awards.get("text"):
print(f"Awards: {awards['text']}")
print("-" * 50)
else:
print("No results found.")
Now, when we run the script with the query = "Santa Claus", a similarity search is executed and we see the results in the terminal.
But we need to render it in our browser and make it prettier, so let’s do that.
Render data
#views.py
from django.shortcuts import render
import os
from dotenv import load_dotenv
from langchain_voyageai.embeddings import VoyageAIEmbeddings
from langchain_mongodb.vectorstores import MongoDBAtlasVectorSearch
from langchain_voyageai import VoyageAIEmbeddings
from pymongo import MongoClient
load_dotenv()
def search_holiday_movies(request):
query = request.GET.get("q", "")
results = []
if query:
# use API keys
voyage_api_key = os.getenv("VOYAGE_API_KEY")
connection_string = os.getenv("MONGO_URI")
# Check if environment variables are loaded
assert voyage_api_key, "ERROR: VOYAGE_API_KEY not found in .env file"
assert connection_string, "ERROR: MONGO_URI not found in .env file"
#This is the embedding object.
embeddings = VoyageAIEmbeddings(
voyage_api_key=voyage_api_key,
model="voyage-3-lite"
)
#Specify our database and collection
client = MongoClient(os.getenv("MONGO_URI"))
database = client["festive_flix_db"]
collection = database["holiday_movies_collection"]
#Vector store with the embeddings model
vector_store = MongoDBAtlasVectorSearch(
collection=collection,
embedding_key="embedding",
index_name="vector_index",
text_key="plot",
embedding=embeddings
)
#Similarity search with the user's actual query
print(f"\nSearching for: '{query}'")
try:
search_results = vector_store.similarity_search_with_score(query, k=3)
print(f"Found {len(search_results)} results")
#Format results for the template
results = []
for doc, score in search_results:
result = {
'title': doc.metadata.get("title", "Unknown"),
'plot': doc.page_content,
'runtime': doc.metadata.get("runtime", "Unknown"),
'genres': doc.metadata.get("genres", []),
'released': doc.metadata.get("released", "Unknown"),
'awards': doc.metadata.get("awards", {}),
'score': score
}
results.append(result)
except Exception as e:
print(f"Error occurred: {e}")
results = []
#Print output to console
if results:
print(f"\n=== Search Results for '{query}' ===")
for i, result in enumerate(results, 1):
print(f"\nResult {i} (Score: {result['score']:.4f}):")
print(f"Title: {result['title']}")
print(f"Plot: {result['plot'][:100]}...")
print("-" * 50)
else:
print(f"No results found for query: '{query}'")
return render(request, "search_results.html", {"results": results, "query": query})
#festive_flix/templates/search_results.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Holiday Movie Search 🎄</title>
<style>
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
background: #f8f9fa;
color: #333;
line-height: 1.6;
}
.container {
max-width: 800px;
margin: 0 auto;
padding: 40px 20px;
}
.header {
text-align: center;
margin-bottom: 40px;
}
.title {
font-size: 2rem;
color: #2c3e50;
margin-bottom: 10px;
}
.subtitle {
color: #6c757d;
font-size: 1rem;
}
.search-form {
background: white;
padding: 30px;
border-radius: 8px;
box-shadow: 0 2px 10px rgba(156, 151, 151, 0.1);
margin-bottom: 30px;
}
.search-input {
width: 100%;
padding: 12px 16px;
font-size: 16px;
border: 2px solid #e9ecef;
border-radius: 6px;
margin-bottom: 15px;
transition: border-color 0.3s;
}
.search-input:focus {
outline: none;
border-color: #66986d;
}
.search-button {
background: #66986d;
color: white;
padding: 12px 24px;
border: none;
border-radius: 6px;
font-size: 16px;
cursor: pointer;
transition: background-color 0.3s;
}
.search-button:hover {
background: #9da29e;
}
.results-section {
margin-top: 30px;
}
.results-info {
margin-bottom: 20px;
padding: 15px;
background: #d4e6d4;
border-radius: 6px;
border-left: 4px solid #66986d;
}
.movie-card {
background: white;
border-radius: 8px;
padding: 20px;
margin-bottom: 20px;
box-shadow: 0 2px 8px rgba(0,0,0,0.1);
border: 1px solid #e9ecef;
}
.movie-title {
font-size: 1.3rem;
color: #2c3e50;
margin-bottom: 10px;
font-weight: 600;
}
.movie-plot {
color: #555;
margin-bottom: 15px;
line-height: 1.6;
}
.movie-details {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
gap: 10px;
padding-top: 15px;
border-top: 1px solid #e9ecef;
font-size: 0.9rem;
color: #6c757d;
}
.detail-item {
display: flex;
align-items: center;
gap: 5px;
}
.genres {
grid-column: span 2;
margin-top: 10px;
}
.genre-tag {
display: inline-block;
background: #f8f9fa;
color: #495057;
padding: 3px 8px;
border-radius: 12px;
font-size: 0.8rem;
margin: 2px;
border: 1px solid #dee2e6;
}
.similarity-score {
background: #28a745;
color: white;
padding: 4px 8px;
border-radius: 12px;
font-size: 0.8rem;
font-weight: 500;
margin-left: auto;
}
.no-results {
text-align: center;
background: white;
padding: 40px;
border-radius: 8px;
box-shadow: 0 2px 8px rgba(0,0,0,0.1);
}
.no-results-text {
color: #6c757d;
font-size: 1.1rem;
}
</style>
</head>
<body>
<div class="container">
<div class="header">
<h1 class="title">Holiday Movie Search 🎄</h1>
<p class="subtitle">Search for holiday movies by plot, themes, or keywords</p>
</div>
<div class="search-form">
<form method="get" action="">
<input
type="text"
name="q"
class="search-input"
placeholder="Enter keywords like 'elf', 'Santa Claus', 'Christmas magic'..."
value="{{ query }}"
>
<button type="submit" class="search-button">Search Movies</button>
</form>
</div>
{% if query %}
<div class="results-section">
{% if results %}
<div class="results-info">
<strong>Search Query:</strong> "{{ query }}"<br>
<strong>Results Found:</strong> {{ results|length }} movie(s)
</div>
{% for result in results %}
<div class="movie-card">
<div style="display: flex; align-items: center; justify-content: space-between;">
<h3 class="movie-title">{{ result.title }}</h3>
<span class="similarity-score">{{ result.score|floatformat:3 }}</span>
</div>
<p class="movie-plot">{{ result.plot }}</p>
<div class="movie-details">
<div class="detail-item">
<strong>Runtime:</strong> {{ result.runtime }} min
</div>
<div class="detail-item">
<strong>Released:</strong> {{ result.released|slice:":4" }}
</div>
{% if result.genres %}
<div class="genres">
<strong>Genres:</strong>
{% if result.genres|length > 0 %}
{% for genre in result.genres %}
<span class="genre-tag">{{ genre }}</span>
{% endfor %}
{% else %}
<span class="genre-tag">{{ result.genres }}</span>
{% endif %}
</div>
{% endif %}
</div>
</div>
{% endfor %}
{% else %}
<div class="no-results">
<h3 class="no-results-text">No movies found for "{{ query }}"</h3>
<p>Try different keywords like "Santa", "Christmas", "holiday", or "winter"</p>
</div>
{% endif %}
</div>
{% endif %}
</div>
</body>
</html>
Run python manage.py runserver.
Access the url http://127.0.0.1:8000/ and check for movies based on keywords.
Conclusion
Great job on following through this tutorial on building a RAG app that recommends holiday movies based on a similarity search with Django MongoDB Backend, Voyage API, and LangChain.
In this tutorial, we started off by getting our movies dataset (JSON file), creating a scaffold Django project and virtual environment, creating models based on our sample data, adding embeddings with Voyage AI, uploading data into the MongoDB Atlas cluster, adding our MongoDB Atlas Vector Search index with LangChain, and finally, we rendered and visualized query results with a simple template.
You can clone the GitHub repo for this project.
Check out another use case of Django MongoDB Backend, Voyage AI, and LangChain.
If you liked this tutorial or have questions or feedback, kindly leave a comment and give this tutorial a reaction at the top! Thank you and happy holidays in advance!