Part 3: Vectors And Similarity Search: How Machines Find “Close Enough”

Databases are usually good at exact answers.

Find user where id = 123
Find orders where status = 'paid'
Find products where category = 'laptops'

But many useful questions are not exact.

Which documents are similar to this question?
Which products look like this product?
Which support tickets describe the same bug?
Which users have similar behavior?
Which image matches this description?

These are “close enough” questions. They are not about exact matches. They are about similarity.

That is where vectors and similarity search come in.

What Is A Vector?

A vector is a list of numbers.

For example:

[0.12, -0.44, 0.89, 0.03]

In real AI systems, vectors are usually much longer. They may have 384, 768, 1536, 3072, or more dimensions.

Each number is a coordinate in a high-dimensional space. That sounds abstract, but the intuition is simple: every piece of data becomes a point on a map.

A sentence becomes a point.
A document chunk becomes a point.
A product becomes a point.
An image becomes a point.

Similar items should land near each other. Different items should land far apart.

So if we embed these sentences:

“How do I reset my password?”
“I forgot my password.”
“Can I change my login credentials?”

their vectors should be close together.

But this sentence:

“What is your annual billing policy?”

should be farther away.

Vectors are the machine-readable form. Similarity search is how we find nearby points.

What Is Similarity Search?

Similarity search means finding the items closest to a query.

The query might be text:

“How do I recover my account?”

The system turns it into a vector, then searches a database of stored vectors.

Instead of asking:

Which documents contain these exact words?

it asks:

Which vectors are closest to this query vector?

The closest vectors are returned as the most semantically relevant results.

This is why vector search can find a document about “account recovery” even if the user typed “forgot password.”

A Simple Mental Model

Imagine a giant map.

On this map:

Password reset articles are clustered together.
Billing articles are clustered somewhere else.
API authentication docs form another neighborhood.
Refund policy docs sit in their own area.

When a user asks a question, we place the question on the same map.

Then we ask:

What are the nearest points?

Those nearest points are the search results.

The map has many more dimensions than a normal 2D map, but the idea is the same.

Distance Metrics

To find nearby vectors, the database needs a way to measure distance or similarity.

The most common metrics are:

Distance Metrics

Metric	What It Measures	Common Use
Cosine similarity	Direction/alignment between vectors	Text embeddings, semantic search
Dot product	Alignment and magnitude	Some embedding models and recommendation systems
Euclidean distance	Straight-line distance	Geometry-style similarity, some ML systems

Cosine similarity is especially common for text. It cares about the angle between vectors, not their raw size. If two vectors point in a similar direction, they are considered similar.

Dot product is also widely used, especially when the embedding model was trained for it. It can be fast and effective, but you should use the metric recommended by your embedding model.

Euclidean distance measures literal distance between points. It is intuitive, but not always the best choice for modern embeddings.

The rule of thumb: do not randomly choose a metric. Use the one your embedding model expects.

Exact Search Versus Approximate Search

In a small dataset, similarity search can be exact.

If you have 1,000 vectors, the database can compare the query vector against every stored vector and return the nearest ones.

But what if you have 10 million vectors?

Comparing against every vector becomes expensive. That is why many vector databases use approximate nearest neighbor search, often called ANN.

ANN does not always guarantee the mathematically perfect nearest result. Instead, it tries to return very good results much faster.

This tradeoff matters:

Exact search = highest accuracy, slower at scale
Approximate search = much faster, slightly imperfect

For most real-world applications, approximate search is the right choice. Users usually care that the results are useful, not that the database found the theoretically perfect nearest vector.

Indexes: Making Search Fast

A vector index is a data structure that makes similarity search efficient.

Without an index, the database may need to scan every vector. With an index, it can navigate toward likely neighbors quickly.

Common index families include:

HNSW: graph-based, popular for fast and accurate ANN search
IVF: partitions vectors into clusters, then searches likely clusters
ScaNN: optimized for scalable nearest-neighbor search
Disk-based indexes: designed for very large datasets where not everything fits in memory

You do not always need to understand every indexing detail to build a good application. But you do need to understand the tradeoff: indexes usually balance speed, memory, recall, and update cost.

Recall And Latency

Two important terms show up again and again in vector search: recall and latency.

Recall asks:

Of the truly relevant results, how many did we retrieve?

Latency asks:

How long did the search take?

Higher recall usually costs more time, memory, or compute. Lower latency often means accepting some approximation.

A search system is always a balancing act:

Fast enough for users
Accurate enough to be useful
Cheap enough to operate
Fresh enough to trust

There is no single perfect setting. You tune based on the product.

A customer support chatbot may need strong recall because missing the right answer hurts quality. A recommendation feed may accept more approximation because there are many acceptable results.

Metadata Filtering

Similarity alone is not enough.

Suppose a user asks:

“How do I configure billing?”

The closest document might be from another customer’s workspace. That would be a security problem.

So vector databases usually support metadata filters:

{
  "tenant_id": "customer_123",
  "doc_type": "billing",
  "language": "en"
}

This lets the system search only within allowed or relevant records.

Metadata filtering is essential for:

Multitenant SaaS apps
Enterprise search
Permissions
Freshness rules
Product categories
Regions and languages
Document types

Good vector search is rarely just “nearest vector wins.” It is usually nearest vector plus filters, ranking rules, and sometimes reranking.

Hybrid Search

Vector search is powerful, but keyword search still matters.

Imagine a user searches for:

ERR_CONN_504

An embedding model may not understand this exact error code well. Keyword search probably will.

Or a user searches for:

Invoice INV-2026-00482

Again, exact matching matters.

That is why many production systems use hybrid search, combining:

Keyword search + vector search

Vector search captures meaning. Keyword search captures exact terms. Together, they are often better than either alone.

Reranking

A common retrieval pipeline looks like this:

Retrieve 50 candidates using vector or hybrid search.
Rerank the top 50 using a stronger model.
Return the best 5.

The first search step is fast and broad. The reranker is slower but more precise.

Reranking is especially useful when the initial vector search brings back results that are roughly related but not perfectly ordered.

In RAG systems, reranking can significantly improve answer quality because the LLM sees better context.

Where Similarity Search Is Used

Similarity search appears in many AI systems:

RAG: find document chunks relevant to a user question
Recommendations: find similar products, posts, songs, or videos
Duplicate detection: find similar support tickets or documents
Image search: search by image or natural language description
Code search: find relevant functions, files, or examples
Personalization: match users to content or offers
Fraud and anomaly detection: find unusual behavior patterns
Clustering: group similar documents or users

Any time you ask “what is similar to this?”, vectors may be involved.

Common Mistakes

One mistake is assuming vector search replaces keyword search. It does not. It complements it.

Another mistake is ignoring the embedding model’s recommended distance metric. If a model expects cosine similarity and you use Euclidean distance, results may suffer.

A third mistake is skipping evaluation. You need test queries and expected results. Otherwise, tuning index parameters becomes guesswork.

A fourth mistake is retrieving too few candidates. Sometimes the best final answer appears only after retrieving a larger candidate set and reranking it.

A fifth mistake is forgetting metadata filters. In real systems, access control and data boundaries matter as much as semantic relevance.

The Big Picture

Vectors let machines represent data as points in mathematical space. Similarity search lets machines find nearby points.

That simple idea powers a surprising amount of modern AI:

Embed the data.
Store the vectors.
Embed the query.
Find the nearest vectors.
Return the most relevant content.

The magic is not that the database “understands” like a person. The magic is that meaning can be approximated well enough to make search, recommendations, and AI retrieval feel dramatically better than exact matching alone.

Vectors give us the map.
Similarity search gives us the route to the nearest useful thing.