
Befriending Vector Databases · Part 4
Part 4: How Vector Indexes Work Internally: HNSW, IVF, PQ, And The Algorithms Behind Them
May 7, 2026 · 11 min read
A vector database has one brutal problem: given a query vector, find the closest vectors among millions or billions of stored vectors.
The naive solution is simple:
For every vector:
compute distance(query, vector)
return the top k closestThat is exact search. It works beautifully for small datasets. But if you have 100 million vectors, each with 768 dimensions, brute force means a huge number of floating-point operations per query.
Vector indexes exist to avoid scanning everything.
They do this by being a little clever and a little imperfect. Most production vector indexes are approximate nearest neighbor indexes. They trade a small amount of accuracy for a large gain in speed.
The Core Idea
Every vector index answers the same question:
How can we avoid comparing the query against every vector?Different indexes use different data structures:
| Index Type | Data Structure Idea |
|---|---|
| HNSW | Multi-layer proximity graph |
| IVF | K-means clusters plus inverted lists |
| IVFPQ | IVF plus compressed vector codes |
| LSH | Hash tables that preserve locality |
| DiskANN | Graph index optimized for SSD storage |
Let’s start with the two big ones: HNSW and IVF.
HNSW: Search As Graph Navigation
HNSW stands for Hierarchical Navigable Small World.
The data structure is a graph. Each vector is a node. Edges connect nearby vectors.
At the bottom layer, almost every node exists. This layer contains many short-range connections between nearby points.
Above that, HNSW creates sparse layers. Only some nodes appear in higher layers. The higher the layer, the fewer nodes there are. This is similar in spirit to a skip list: upper layers let you move quickly across the space, while lower layers refine the search.
Visually:
Layer 3: A ----------- M
Layer 2: A ----- F ---- M
Layer 1: A -- C -- F -- H -- M
Layer 0: A-B-C-D-E-F-G-H-I-J-K-L-MNot literally a line, of course. In real vector space, this is a graph over high-dimensional points.
HNSW Search
HNSW search starts from an entry point, usually one of the nodes in the top layer.
At upper layers, it performs greedy graph search:
current = entry_point
for each layer from top to bottom:
while there is a neighbor closer to the query:
move to that neighborThis gives a fast rough location.
At the bottom layer, HNSW does a wider search. It maintains candidate nodes using priority queues. One heap tracks nodes to explore. Another tracks the best results found so far.
The parameter efSearch controls how wide this bottom-layer search is.
Higher efSearch = better recall, more work
Lower efSearch = faster search, more missesSo HNSW is not just “walk to the nearest neighbor.” It is closer to a best-first graph traversal with a bounded search budget.
HNSW Insertion
When inserting a new vector, HNSW randomly assigns it a maximum layer. Most vectors only live at the bottom. A few are promoted to higher layers. Very few reach the top.
Then it searches for good neighbors at each layer and connects the new node to them.
The main parameter is M, the maximum number of neighbors per node. Larger M gives more routes through the graph, which usually improves recall, but increases memory and build cost.
During construction, efConstruction controls how carefully neighbors are selected.
Higher efConstruction = better graph, slower indexing
Lower efConstruction = faster indexing, weaker graphFrom a data structures perspective, HNSW is:
Adjacency lists + random levels + greedy search + priority queuesIts strength is fast, high-recall search when the graph fits in memory.
IVF: Search By Partitioning Space
IVF stands for Inverted File.
The name comes from classic search engines. In text search, an inverted index maps a word to a posting list of documents:
"database" -> [doc1, doc7, doc19]IVF does something similar for vectors, but instead of words, it uses cluster centroids.
First, the system runs k-means over training vectors. Suppose we choose nlist = 1000. That means we create 1000 centroids.
Each vector is assigned to its nearest centroid.
centroid_17 -> [vector ids near centroid_17]
centroid_42 -> [vector ids near centroid_42]
centroid_913 -> [vector ids near centroid_913]Each centroid owns an inverted list.
IVF Search
At query time, IVF does not search every vector. It first compares the query to the centroids.
Then it chooses the closest nprobe centroids and scans only those lists.
query -> find nearest centroids
-> scan vectors in those selected lists
-> return top kThe parameter nprobe controls the search width.
Higher nprobe = search more clusters, better recall
Lower nprobe = search fewer clusters, lower latencyA rough cost model looks like this:
Brute force: O(N * d)
IVF: O(nlist * d) + O((N / nlist) * nprobe * d)Where:
N = number of vectors
d = vector dimensions
nlist = number of clusters
nprobe = number of clusters searchedThe approximation comes from cluster boundaries. A true nearest neighbor may live in a nearby cluster that was not probed.
IVF is basically:
k-means partitioning + hash-table-like buckets + partial scanningIt shines when datasets are large, relatively stable, and batch/GPU search matters.
IVFFlat Versus IVFPQ
In IVFFlat, the vectors inside each inverted list are stored as full-precision vectors. Once IVF selects lists to scan, distances are computed against real vectors.
That means:
IVFFlat = approximate cluster selection + exact distance inside selected clustersIn IVFPQ, vectors are compressed using Product Quantization.
PQ splits a vector into smaller subvectors. Each subvector is replaced by the ID of its nearest learned codeword.
For example, a 768-dimensional vector might be split into 96 chunks of 8 dimensions. Each chunk is quantized separately.
Instead of storing:
[0.13, -0.82, 0.44, ...]PQ stores compact codes:
[12, 201, 44, 8, ...]This massively reduces memory. It also speeds up distance approximation using lookup tables.
The tradeoff is accuracy. You are no longer comparing against the original vector, but against a compressed approximation.
IVFFlat = more memory, better accuracy
IVFPQ = less memory, faster scans, more approximationWhy HNSW And IVF Feel So Different
HNSW is graph-first.
It asks:
Can I navigate from a known point to better and better neighbors?IVF is partition-first.
It asks:
Can I divide space into regions and search only the most promising regions?HNSW usually has excellent recall-latency behavior for in-memory search. IVF is often easier to scale to huge datasets, especially when combined with compression and GPU-friendly scanning.
Quick Comparison
| Index | Main Data Structure | Best For | Tradeoff |
|---|---|---|---|
| HNSW | Layered graph | Low-latency, high-recall in-memory search | Higher memory use |
| IVF | Centroids + inverted lists | Large-scale approximate search | Can miss boundary neighbors |
| IVFPQ | IVF + compressed codes | Billion-scale search with limited memory | Lower precision |
| DiskANN | SSD-aware graph | Very large datasets beyond RAM | More complex storage/search design |
| LSH | Locality-preserving hash tables | Some special similarity workloads | Often weaker recall/space tradeoffs for modern embeddings |
Configuring Vector Indexes In Real Databases
The same algorithmic ideas show up across products, but the knobs are named differently. Some systems expose HNSW/IVF directly. Others, like Pinecone, manage the internals and ask you to configure higher-level choices like dimension, metric, cloud, and index type.
| Concept | pgvector | MongoDB Atlas | Milvus | Qdrant | Pinecone |
|---|---|---|---|---|---|
| Distance metric | opclass like vector_cosine_ops | similarity | metric_type | distance | metric |
| HNSW connections | m | maxEdges | M | m | managed |
| HNSW build quality | ef_construction | numEdgeCandidates | efConstruction | ef_construct | managed |
| HNSW query quality | hnsw.ef_search | numCandidates at query time | ef | query/search params | managed/top-k/filtering |
| IVF clusters | lists | not exposed as IVF | nlist | not IVF-style | managed |
| IVF probes | ivfflat.probes | not exposed as IVF | nprobe | not IVF-style | managed |
pgvector
pgvector exposes both HNSW and IVFFlat directly.
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE items (
id bigserial PRIMARY KEY,
content text,
embedding vector(1536)
);For HNSW with cosine distance:
CREATE INDEX items_embedding_hnsw_idx
ON items
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);At query time:
SET hnsw.ef_search = 100;
SELECT id, content
FROM items
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;For IVFFlat:
CREATE INDEX items_embedding_ivfflat_idx
ON items
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);At query time:
SET ivfflat.probes = 10;
SELECT id, content
FROM items
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;Rule of thumb: use HNSW for better recall-latency behavior; use IVFFlat when you want faster builds and lower memory, especially after bulk loading data.
Pinecone
Pinecone generally does not ask you to choose HNSW or IVF directly. It manages the index internals. You configure the vector shape, similarity metric, deployment type, and region.
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="YOUR_API_KEY")
pc.create_index(
name="docs-index",
vector_type="dense",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
),
deletion_protection="disabled"
)Querying is configured through things like top_k, metadata filters, namespace, and whether you use dense, sparse, or hybrid retrieval.
index = pc.Index("docs-index")
results = index.query(
vector=query_embedding,
top_k=10,
include_metadata=True,
filter={"tenant_id": {"$eq": "acme"}}
)So with Pinecone, think less “tune HNSW parameters” and more “choose metric, dimension, deployment, metadata model, and retrieval pipeline.”
MongoDB Atlas Vector Search
MongoDB Atlas exposes HNSW-style configuration through Vector Search index definitions.
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 1536,
"similarity": "cosine",
"quantization": "scalar",
"indexingMethod": "hnsw",
"hnswOptions": {
"maxEdges": 16,
"numEdgeCandidates": 100
}
},
{
"type": "filter",
"path": "tenant_id"
}
]
}Query example:
db.docs.aggregate([
{
$vectorSearch: {
index: "docs_vector_index",
path: "embedding",
queryVector: queryEmbedding,
numCandidates: 100,
limit: 10,
filter: { tenant_id: "acme" }
}
}
])In MongoDB terms:
maxEdgesis similar to HNSWMnumEdgeCandidatesis similar toefConstructionnumCandidatescontrols how many candidates are explored at query timequantizationcan benone,scalar, orbinary
Milvus
Milvus exposes many index types directly, including HNSW, IVF_FLAT, IVF_PQ, and DISKANN.
HNSW:
index_params = MilvusClient.prepare_index_params()
index_params.add_index(
field_name="embedding",
index_type="HNSW",
index_name="embedding_hnsw_idx",
metric_type="COSINE",
params={
"M": 16,
"efConstruction": 100
}
)Search:
search_params = {
"params": {
"ef": 100
}
}IVF_FLAT:
index_params.add_index(
field_name="embedding",
index_type="IVF_FLAT",
index_name="embedding_ivf_idx",
metric_type="COSINE",
params={
"nlist": 128
}
)Search:
search_params = {
"params": {
"nprobe": 10
}
}Qdrant
Qdrant uses HNSW as its core vector index and lets you tune it at collection level.
from qdrant_client import QdrantClient, models
client = QdrantClient(url="http://localhost:6333")
client.create_collection(
collection_name="docs",
vectors_config=models.VectorParams(
size=1536,
distance=models.Distance.COSINE
),
hnsw_config=models.HnswConfigDiff(
m=16,
ef_construct=100,
full_scan_threshold=10000
)
)You can also configure quantization:
client.update_collection(
collection_name="docs",
quantization_config=models.ScalarQuantization(
scalar=models.ScalarQuantizationConfig(
type=models.ScalarType.INT8,
quantile=0.99,
always_ram=True
)
)
)The Practical Rule
If the database exposes HNSW knobs:
Increase m / maxEdges / M -> better graph, more memory.
Increase efConstruction / numEdgeCandidates -> better build quality, slower indexing.
Increase efSearch / numCandidates / ef -> better recall, slower queries.If the database exposes IVF knobs:
Increase lists / nlist -> more clusters, smaller scan regions.
Increase probes / nprobe -> search more clusters, better recall, slower queries.If the database is fully managed:
Tune metric, dimensions, metadata filters, top_k, namespaces, reranking, and evaluation.The best configuration is not universal. Start with defaults, build a small retrieval evaluation set, and tune for your actual recall, latency, memory, and cost targets.
The Big Picture
Vector indexes are not magic. They are classic algorithms wearing modern AI clothing.
HNSW borrows from graphs, greedy search, skip-list-like layering, and priority queues.
IVF borrows from clustering, inverted indexes, partitioning, and partial scanning.
PQ borrows from compression, quantization, and lookup-table acceleration.
All of them are trying to spend distance computations where they matter most.
That is the heart of vector indexing:
Do not compare against everything.
Build a data structure that points search toward likely neighbors.
Accept a controlled amount of approximation.
Tune recall, latency, memory, and build cost.Once you see the data structures underneath, vector databases feel much less mysterious. They are search engines built on graphs, clusters, heaps, arrays, and clever shortcuts.