Rearchitecting Vector Search: A Migration from MongoDB Atlas to Qdrant

If you’re building an AI-powered semantic search or recommendation system, choosing the right vector database can make or break your performance — and your budget.

MongoDB Atlas now offers vector search as part of its document database, which sounds convenient if you’re already using MongoDB. But there’s a catch: it’s a proprietary service tied to cloud pricing, which can quickly become expensive at scale.

Qdrant, on the other hand, is a purpose-built, open-source vector database designed for blazing-fast similarity search. It gives you the freedom to self-host, fine-tune, and control your infrastructure, without the lock-in or high costs.

For developers working on recommendation engines or semantic search pipelines, the decision isn’t just about features. …

If you’re building an AI-powered semantic search or recommendation system, choosing the right vector database can make or break your performance — and your budget.

For developers working on recommendation engines or semantic search pipelines, the decision isn’t just about features. It’s about choosing between a closed system with predictable ease and an open, flexible platform built for long-term performance and cost efficiency.

MongoDB Atlas makes it easy to get started with vector search, but costs can add up fast once you move beyond small experiments.For advanced vector search use-cases with MongoDB, MongoDB Atlas was the only option available until recently.

Atlas vector search runs as an add-on service on top of your regular MongoDB cluster. Even the smallest plan (M10) starts around $60/month for just 2 GB RAM and 10 GB storage. That’s before you add anything else.

If you’re building a vector-heavy app like a recommendation engine or semantic search, you’ll be paying for two layers of infrastructure: MongoDB itself and the search layer on top of it. And since Atlas is fully managed, you can’t tune the hardware for vector workloads or use cheaper compute options like spot instances.

In short: MongoDB Atlas Vector Search is great for convenience, but not for cost optimization or fine-grained performance control.

To compare evenly, we will assume that we have each vector of dimension 1536, stored in FP16 (16 bits = 2 bytes) precision.

So ~300 MB of memory is needed to hold the raw vector array (in FP16) for 100,000 × 1536.

Assuming ~2× overhead for index + metadata + caching, you might need ~600–1,000 MB (~0.6–1 GB) just for th

Using the Official Migration Tool

Qdrant provides an official migration tool that supports direct MongoDB → Qdrant data transfers.

First, create the target collection in Qdrant with matching vector dimensions:

from qdrant_client import QdrantClientclient = QdrantClient(“http://localhost:6333")client.create_collection( collection_name=”migrated_products”, vectors_config=VectorParams(size=1536, distance=Distance.COSINE))

Then, execute the migration using the official Docker image:

docker run — net=host — rm -it registry.cloud.qdrant.io/library/qdrant-migration mongodb \ — mongodb.url ‘mongodb://localhost:27017’ \ — mongodb.database ‘ecommerce’ \ — mongodb.collection ‘products’ \ — qdrant.url ‘http://localhost:6334' \ — qdrant.collection ‘migrated_products’ \ — migration.batch-size 100

Custom Migration Script

For more complex scenarios that require data transformation or field mapping, you can write a custom Python script.

(Ensure that your MongoDB and Qdrant Docker containers are already running as shown in the previous steps.)

1. Set up the environment and install dependencies

python3 -m venv migration-envsource migration-env/bin/activate # On Windows: migration-env\Scripts\activatepip install qdrant-client==1.11.0pip install pymongo==4.6.0pip install sentence-transformers==2.7.0pip install pandas==2.1.4pip install numpy==1.24.3pip install python-dotenv==1.0.1

2. (Option 1) — Run the complete migration script at once

# complete_migration.pyimport subprocessimport sysimport timefrom create_sample_data import create_sample_ecommerce_datafrom load_mongodb_data import load_data_to_mongodb from setup_qdrant import setup_qdrant_collectionfrom custom_migration_script import custom_migrationfrom verify_migration import verify_migrationdef run_complete_migration(): “””Run the complete end-to-end migration process”””  print(“=== MongoDB to Qdrant Migration ===\n”)  try: # Step 1: Setup Qdrant collection print(“Step 1: Setting up Qdrant collection…”) setup_qdrant_collection() print(“✓ Qdrant collection ready\n”)  # Step 2: Load sample data into MongoDB print(“Step 2: Loading sample data into MongoDB…”) doc_count = load_data_to_mongodb() print(f”✓ Loaded {doc_count} documents into MongoDB\n”)  # Step 3: Run migration print(“Step 3: Running migration…”) migrated_count = custom_migration() print(f”✓ Migrated {migrated_count} documents to Qdrant\n”)  # Step 4: Verify migration print(“Step 4: Verifying migration…”) verification_success = verify_migration()  if verification_success: print(“\n🎉 Migration completed successfully!”) else: print(“\n❌ Migration verification failed!”)  except Exception as e: print(f”❌ Migration failed: {e}”) return False  return Trueif __name__ == “__main__”: success = run_complete_migration() sys.exit(0 if success else 1)

(Option 2) Step-by-Step Migration

If you prefer to run the migration step by step, follow this sequence when executing the Python scripts or notebooks.

Step 1: Create a sample dataset and load It into MongoDB

Run the following script to insert sample data into MongoDB:

# load_mongodb_data.pyimport pymongoimport jsonfrom create_sample_data import create_sample_ecommerce_datadef load_data_to_mongodb(): “””Load sample data into MongoDB”””  # Connect to MongoDB client = pymongo.MongoClient(“mongodb://localhost:27017/”)  # Create database and collection db = client.ecommerce_db collection = db.products  # Clear existing data collection.drop()  # Generate sample data print(“Creating sample e-commerce data…”) products = create_sample_ecommerce_data()  # Insert data into MongoDB print(“Inserting data into MongoDB…”) result = collection.insert_many(products)  print(f”Inserted {len(result.inserted_ids)} products into MongoDB”)  # Create index for better performance collection.create_index(“product_id”, unique=True) collection.create_index(“category”) collection.create_index(“price”)  print(“Created indexes on MongoDB collection”)  # Verify the data total_docs = collection.count_documents({}) print(f”Total documents in MongoDB: {total_docs}”)  # Show sample document sample_doc = collection.find_one() if sample_doc: print(“\nSample document structure:”) print(f”Product ID: {sample_doc[‘product_id’]}”) print(f”Title: {sample_doc[‘title’]}”) print(f”Category: {sample_doc[‘category’]}”) print(f”Price: ${sample_doc[‘price’]}”) print(f”Embedding dimensions: {len(sample_doc[‘embedding’])}”)  client.close() return len(products)if __name__ == “__main__”: count = load_data_to_mongodb() print(f”Successfully loaded {count} products into MongoDB”)

Step 2: Create a new collection in Qdrant

Run the following script to initialize the Qdrant collection:

setup_qdrant.py

# Setup Qdrant collection# setup_qdrant.pyfrom qdrant_client import QdrantClientfrom qdrant_client.models import Distance, VectorParams, CreateCollectiondef setup_qdrant_collection(): “””Create Qdrant collection with proper configuration”””  # Connect to Qdrant  client = QdrantClient(“http://localhost:6333")  # Collection configurationcollection_name = “ecommerce_products”  # Delete collection if it exists try: client.delete_collection(collection_name) print(f”Deleted existing collection: {collection_name}”) except Exception as e: print(f”Collection didn’t exist or error deleting: {e}”)  # Create new collection # Note: Using 384 dimensions for all-MiniLM-L6-v2 model client.create_collection( collection_name=collection_name, vectors_config=VectorParams( size=384, # Dimension of all-MiniLM-L6-v2 embeddings distance=Distance.COSINE # Cosine similarity ) )  print(f”Created Qdrant collection: {collection_name}”)  # Verify collection creation collections = client.get_collections() print(“Available collections:”, [col.name for col in collections.collections])  # Get collection info collection_info = client.get_collection(collection_name) print(f”Collection info: {collection_info}”)  return collection_nameif __name__ == “__main__”: collection_name = setup_qdrant_collection() print(f”Qdrant collection ‘{collection_name}’ is ready for migration”)

Step 3: Run the custom migration script

Use the following script to migrate your data from MongoDB to Qdrant. You can customize the payload, load only required fields, and adjust the batch size as needed.

import pymongofrom qdrant_client import QdrantClientfrom qdrant_client.models import PointStruct, Batchimport hashlibfrom typing import Listimport timedef custom_migration(): “””Custom migration script with data transformation”””  # Connect to MongoDB print(“Connecting to MongoDB…”) #mongo_client = pymongo.MongoClient(“mongodb://admin:password123@localhost:27017/”) # mongo_client = pymongo.MongoClient(“mongodb://admin:password123@localhost:27017/?authSource=admin”) mongo_client= pymongo.MongoClient(“mongodb://localhost:27017/”) mongo_db = mongo_client.ecommerce_db mongo_collection = mongo_db.products  # Connect to Qdrant print(“Connecting to Qdrant…”) qdrant_client = QdrantClient(“http://localhost:6333”) collection_name = “ecommerce_products”  # Get total count for progress tracking total_docs = mongo_collection.count_documents({}) print(f”Total documents to migrate: {total_docs}”)  # Migration settings batch_size = 50 migrated_count = 0  # Process documents in batches cursor = mongo_collection.find({}) batch_points = []  print(“Starting migration…”) start_time = time.time()  for doc in cursor: try: # Transform MongoDB document to Qdrant point point_id = int(doc[‘product_id’]) # Use product_id as point ID  # Prepare payload (metadata) payload = { ‘title’: doc[‘title’], ‘description’: doc[‘description’], ‘category’: doc[‘category’], ‘subcategory’: doc[‘subcategory’], ‘price’: doc[‘price’], ‘brand’: doc[‘brand’], ‘rating’: doc[‘rating’], ‘stock’: doc[‘stock’], ‘mongo_id’: str(doc[‘_id’]) # Preserve original MongoDB ID }  # Create Qdrant point point = PointStruct( id=point_id, vector=doc[‘embedding’], payload=payload )  batch_points.append(point)  # Process batch when it reaches batch_size if len(batch_points) >= batch_size: qdrant_client.upsert( collection_name=collection_name, points=batch_points )  migrated_count += len(batch_points) progress = (migrated_count / total_docs) * 100 print(f”Migrated {migrated_count}/{total_docs} documents ({progress:.1f}%)”)  batch_points = [] # Reset batch  except Exception as e: print(f”Error processing document {doc.get(‘product_id’, ‘unknown’)}: {e}”) continue  # Process remaining points in the last batch if batch_points: qdrant_client.upsert( collection_name=collection_name, points=batch_points ) migrated_count += len(batch_points)  # Migration summary end_time = time.time() duration = end_time — start_time  print(f”\n=== Migration Summary ===”) print(f”Total documents migrated: {migrated_count}”) print(f”Migration duration: {duration:.2f} seconds”) print(f”Average speed: {migrated_count/duration:.2f} docs/second”)  # Close connections mongo_client.close()  return migrated_countif __name__ == “__main__”: migrated = custom_migration() print(f”Custom migration completed! Migrated {migrated} documents.”)

Step 4: Verify the Migration

Finally, run the following script to verify that all functionalities are working as expected:

# Verify results# verify_migration.pyimport pymongofrom qdrant_client import QdrantClientfrom qdrant_client.models import Filter, FieldCondition, MatchValueimport numpy as npdef verify_migration(): “””Verify that migration was successful”””  # Connect to both databases mongo_client= pymongo.MongoClient(“mongodb://localhost:27017/”) mongo_db = mongo_client.ecommerce_db mongo_collection = mongo_db.products  qdrant_client = QdrantClient(“http://localhost:6333”) collection_name = “ecommerce_products”  print(“=== Migration Verification ===\n”)  # 1. Count verification mongo_count = mongo_collection.count_documents({}) qdrant_info = qdrant_client.get_collection(collection_name) qdrant_count = qdrant_info.points_count  print(f”MongoDB documents: {mongo_count}”) print(f”Qdrant points: {qdrant_count}”) print(f”Count match: {‘✓’ if mongo_count == qdrant_count else ‘✗’}\n”)  # 2. Data integrity check — compare a few random documents print(“=== Data Integrity Check ===”) sample_docs = list(mongo_collection.find({}).limit(3))  for doc in sample_docs: product_id = int(doc[‘product_id’])  # Get corresponding point from Qdrant qdrant_points = qdrant_client.retrieve( collection_name=collection_name, ids=[product_id], with_vectors=True, with_payload=True )  if qdrant_points: qdrant_point = qdrant_points[0]  print(f”Product ID {product_id}:”) print(f” MongoDB title: {doc[‘title’]}”) print(f” Qdrant title: {qdrant_point.payload[‘title’]}”) print(f” Title match: {‘✓’ if doc[‘title’] == qdrant_point.payload[‘title’] else ‘✗’}”)  # Check embedding similarity mongo_vector = np.array(doc[‘embedding’]) qdrant_vector = np.array(qdrant_point.vector) similarity = np.dot(mongo_vector, qdrant_vector) print(f” Vector similarity: {similarity:.6f} {‘✓’ if similarity > 0.99 else ‘✗’}”) print()  # 3. Search functionality test print(“=== Search Functionality Test ===”)  # Get a sample embedding for search sample_doc = mongo_collection.find_one() query_vector = sample_doc[‘embedding’]  # Perform search in Qdrant search_results = qdrant_client.search( collection_name=collection_name, query_vector=query_vector, limit=3 )  print(“Search results:”) for i, result in enumerate(search_results, 1): print(f” {i}. {result.payload[‘title’]} (score: {result.score:.4f})”)  # 4. Filtering test print(“\n=== Filtering Test ===”)  # Search for electronics with price filter filtered_results = qdrant_client.search( collection_name=collection_name, query_vector=query_vector, query_filter=Filter( must=[ FieldCondition(key=”category”, match=MatchValue(value=”electronics”)), FieldCondition(key=”price”, range={“gte”: 50.0}) ] ), limit=3 )  print(“Filtered search (electronics, price >= $50):”) for i, result in enumerate(filtered_results, 1): print(f” {i}. {result.payload[‘title’]} — ${result.payload[‘price’]} (score: {result.score:.4f})”)  mongo_client.close()  print(“\n=== Verification Complete ===”) return mongo_count == qdrant_countif __name__ == “__main__”: success = verify_migration() if success: print(“✓ Migration verification passed!”) else: print(“✗ Migration verification failed!”)

All code files and detailed migration steps are available here: 📂 GitHub: MongoDB-to-Qdrant Migration Repo

Once migration is complete, open Qdrant’s web dashboard to confirm successful data transfer and index creation: 🔗 http://localhost:6333/dashboard

Query Format Comparison

Loading Collections

MongoDB Atlas requires creating collections with vector search indexes through the Aggregation Framework.

The Aggregation Framework is MongoDB’s pipeline-based query engine — it processes data through a sequence of stages such as $match, $group, and, in this case, $vector_index. Think of it as MongoDB’s data transformation and orchestration layer, similar to SQL’s GROUP BY, WHERE, and ORDER BY, but expressed as JSON pipeline stages.

db.products.createSearchIndex({ name: “vector_index”, definition: { “fields”: [{ “type”: “knnVector”, “path”: “embedding”, “numDimensions”: 1536, “similarity”: “cosine” }] }})

Unlike MongoDB, where vector indexing is tied to Lucene and the aggregation pipeline, Qdrant collections are purpose-built for vector operations and can be optimized independently.

from qdrant_client import QdrantClientclient = QdrantClient("http://localhost:6333")client.create_collection( collection_name=”products”, vectors_config=VectorParams(size=1536, distance=Distance.COSINE))

Filtering and Search Operations

MongoDB Atlas performs vector similarity search using the $vectorSearch stage within the Aggregation Framework.

This allows combining vector search with traditional filters in a single query pipeline.

db.products.aggregate([ { “$vectorSearch”: { “index”: “vector_index”, “path”: “embedding”,  “queryVector”: queryEmbedding, “numCandidates”: 100, “limit”: 10, “filter”: { “category”: {“$eq”: “electronics”}, “price”: {“$gte”: 100}}}}])

Qdrant provides a more modular and flexible approach to search and filtering.

Filters are handled through payload indexes, allowing them to be evaluated before vector similarity is computed, which improves efficiency and scalability on large datasets.

search_result = client.search( collection_name=”products”, query_vector=query_embedding, query_filter=Filter( must=[ FieldCondition(key=”category”, match=MatchValue(value=”electronics”)), FieldCondition(key=”price”, range=Range(gte=100)) ] ), limit=10)

Advanced Features: Multi-Vector Support

Qdrant’s multi-vector capability enables storing multiple embeddings per document — ideal for late-interaction models like ColBERT or MiniColBERT (MiniCoil).

You can define named vectors (multiple embeddings per point for different modalities) within the same collection. This supports text, image, or token-level embeddings for more context-aware retrieval.

# Create a collection with two modalities: “text” and “image”client.create_collection( collection_name=”docs_multimodal”, vectors_config={ “text”: VectorParams(size=768, distance=Distance.COSINE), “image”: VectorParams(size=512, distance=Distance.COSINE), })# Insert a point with both text and image embeddingspoint = PointStruct( id=1, vectors={ “text”: text_embedding, # e.g. list[float] of length 768 “image”: image_embedding, # e.g. list[float] of length 512 }, payload={“title”: “A cat on a sofa”, “tags”: [“pet”, “sofa”]})client.upsert(collection_name=”docs_multimodal”, points=[point])# Query: you can also search on one vector modality or combineresponse = client.search( collection_name=”docs_multimodal”, query_vector={“text”: query_text_embedding}, top=5,)

MongoDB Atlas, on the other hand, lacks native multi-vector support. Developers must rely on workarounds that increase complexity and degrade performance.

Qdrant’s multimodal embeddings let you store text, image, and audio vectors together — enabling comprehensive, cross-media search.

Sparse Vectors

MongoDB currently supports only a single dense vector field per index. It does not natively handle multi-vector or sparse vector representations.

db.documents.createSearchIndex({ name: “vector_index”, definition: { fields: [ { type: “knnVector”, path: “embedding”, numDimensions: 1536, similarity: “cosine” } ] }});

Qdrant supports dense, sparse, and multi-modal vectors within the same collection.

You can define named vectors for different modalities and sparse vectors for models like BM25, ColBERT, or MiniColBERT.

client.create_collection( collection_name=”documents”, vectors_config={ “text”: VectorParams(size=768, distance=Distance.COSINE), “image”: VectorParams(size=512, distance=Distance.DOT) }, sparse_vectors_config={ “bm25”: SparseVectorParams() })

This architecture allows Qdrant to handle dense, sparse, and multimodal embeddings in one optimized collection — far more flexible than MongoDB’s single-vector design.

To Summarize

Migrating from MongoDB Atlas to Qdrant isn’t just a database change; it’s a strategic upgrade that can:

Cut operational costs by 60–90%
Improve vector search performance and scalability
Unlock multi-vector and multimodal search capabilities
Avoid vendor lock-in with a truly open-source system

Qdrant’s HNSW indexing, vector quantization, and horizontal scaling make it a purpose-built engine for AI and recommendation workloads. Its flexible API and modular filtering give developers precise control over retrieval — something MongoDB’s document-centric design can’t match.

For organizations serious about AI-driven search, Qdrant provides a scalable, cost-effective, and open foundation that grows with your needs.

Thankyou for Reading !

GitHub: https://github.com/harshchan/MongoDB-to-qdrant-migration

Video demonstration of the workflow: https://drive.google.com/file/d/1ll-AmKD0HJMjRBnZPmljELra-R56rHEa/view