Part 2: Vector Database Concepts - Embeddings: Turning Meaning Into Something Machines Can Search

Search used to be mostly about matching words. If you searched for "reset password," a system looked for pages containing those exact words. That works for simple keyword search — but it struggles with meaning. A document titled "Recover access to your account" might be exactly what you need, even though it never says "reset password."

Embeddings solve this problem.

An embedding is a list of numbers that represents the meaning of a piece of data. That data could be a sentence, paragraph, product, image, audio clip, code snippet, or user profile. The numbers themselves are not meaningful to humans, but they place related items close together in mathematical space.

For example:

"How do I reset my password?"
"I forgot my login credentials."
"Change my account password."

These sentences use different words, but they mean similar things. A good embedding model will place their vectors near each other.

Meanwhile:

"What is your refund policy?"

should land somewhere else.

That is the core idea: embeddings let software compare meaning, not just text.

How Embeddings Work

Embeddings are created by an embedding model. You send the model some input text, and it returns a vector:

"How do I reset my password?"

becomes something like:

[0.021, -0.184, 0.772, 0.039, ...]

Real embeddings usually contain hundreds or thousands of numbers. The exact values do not matter to us directly. What matters is the relationship between vectors.

If two vectors are close together, the original pieces of content are likely similar. If they are far apart, they are probably unrelated.

This makes embeddings useful for:

Semantic search
Recommendation systems
Duplicate detection
Document clustering
RAG systems
Image search
Code search
Personalization
Anomaly detection

Why Embeddings Matter For AI Applications

Large language models are powerful, but they do not automatically know your private data. If you want an AI assistant to answer questions about your company documents, product catalog, support tickets, or codebase, you need retrieval.

That is where embeddings are usually used.

A common RAG workflow looks like this:

Split documents into chunks.
Create embeddings for each chunk.
Store those embeddings in a vector database.
Embed the user’s question.
Search for the closest stored chunks.
Send those chunks to the LLM as context.

So when a user asks:

"How do I configure SSO for enterprise accounts?"

the system can retrieve the most semantically relevant chunks even if the documentation says:

"Setting up single sign-on for organization workspaces"

This is the magic of embeddings: they bridge the gap between human phrasing and machine retrieval.

Embeddings Are Not Understanding, But They Are Useful

It is tempting to say embeddings "understand" meaning. That is a little too generous. Embeddings are statistical representations learned from data. They capture patterns: which words, concepts, images, or code structures tend to appear in similar contexts.

But even if they are not human understanding, they are extremely useful approximations.

They allow applications to answer questions like:

Which documents are related to this question?
Which products are similar to this product?
Which support tickets describe the same issue?
Which code files are relevant to this bug?

Traditional databases are good at exact lookups. Embeddings are good at fuzzy meaning.

Choosing An Embedding Model

Different embedding models are optimized for different jobs. Some are general-purpose. Some are multilingual. Some specialize in code. Some handle images + text. Some are smaller and cheaper. Others are larger and more accurate.

Here’s a practical map of model types, with common examples and what they’re best at.

Embedding Model Types

Type	Examples	Best For
General text embeddings	OpenAI text-embedding-3-small, text-embedding-3-large; Google gemini-embedding-001; Cohere embed-v4.0; Voyage voyage-4	RAG, semantic search, support docs, FAQs
Multilingual embeddings	Cohere embed-v4.0, Google gemini-embedding-001, BGE-M3, multilingual E5, Jina embeddings	Search across many languages
Code embeddings	Mistral codestral-embed, Voyage voyage-code-3, BGE-Code	Code search, coding assistants, repo Q&A
Multimodal embeddings	Cohere embed-v4.0, Jina jina-embeddings-v4, CLIP/SigLIP-style models	Text + image/PDF/image search
Open-source embeddings	Sentence Transformers, BGE, E5, Nomic, Jina, GTE/Qwen embeddings	Self-hosting, privacy, lower infra control
Domain-specific embeddings	Voyage finance/law/code models, medical-tuned models	Specialized retrieval where vocabulary matters

For many RAG apps, I’d start with a strong general model like OpenAI text-embedding-3-small, Google gemini-embedding-001, Cohere embed-v4.0, or Voyage voyage-4, then evaluate retrieval quality on your actual data.

The right model depends on your use case:

For support docs or internal knowledge bases, use a strong text embedding model.
For global products, use a multilingual model.
For code search, use a code-specific embedding model.
For image or PDF-heavy workflows, consider multimodal embeddings.
For privacy-sensitive systems, consider open-source or self-hosted models.

The important thing is to evaluate the model on your real data. Benchmarks are helpful, but retrieval quality is highly domain-specific.

Chunking: The Hidden Partner Of Embeddings

Embeddings are only as useful as the content you embed.

If you embed an entire 100-page PDF as one vector, the result becomes too broad. The embedding might represent the general topic, but it will not help retrieve one specific paragraph.

That is why documents are usually split into chunks first.

A common default is something like:

500–800 tokens, with ~10–20% overlap

Good chunks are large enough to preserve meaning but small enough to be specific. Tiny chunks can lose context. Huge chunks can retrieve too much irrelevant material.

Here are the most common chunking approaches and how they work.

Chunking Approaches

Chunking Type	Uses A Model?	How It Works
Fixed-size chunking	No	Split every N characters/tokens, often with overlap
Recursive chunking	No	Try paragraphs first, then sentences, then words until chunks fit
Sentence/paragraph chunking	Light NLP/tokenizer	Split on sentence or paragraph boundaries
Structure-aware chunking	Sometimes	Use headings, Markdown, HTML, PDF layout, sections
Semantic chunking	Yes, embedding model	Embed sentences/paragraphs and split when meaning changes
LLM-based chunking	Yes, LLM	Ask an LLM to split document into meaningful units
Code-aware chunking	Parser/model optional	Split by functions, classes, modules, symbols

Chunking is one of the most underrated parts of building a good vector search system. If retrieval feels "random," it is often chunking (or metadata) — not the vector database.

The Role Of Vector Databases

Once you have embeddings, you need somewhere to store and search them. That is what vector databases do.

A vector database stores:

The embedding vector
The original text or content
Metadata like document ID, title, page number, author, tenant, date, or category

When a query comes in, the database finds the nearest vectors using similarity search.

This allows you to ask:

Find the 5 chunks most similar to this question.

and get results back in milliseconds.

Common Mistakes With Embeddings

The first mistake is assuming embeddings are magic. They are not. You still need good data, clean parsing, sensible chunking, metadata, evaluation, and sometimes reranking.

The second mistake is using only vector search. Keyword search is still valuable, especially for exact terms like product IDs, error codes, names, dates, and acronyms. Many strong systems use hybrid search, combining keyword search with embedding-based search.

The third mistake is ignoring metadata. If a user should only see documents from their workspace, you need filtering. If freshness matters, you may need to boost recent documents. If source quality differs, you may need ranking rules.

The fourth mistake is never testing retrieval. Create a small evaluation set of real user questions and expected documents. Without evaluation, you are flying by vibes — which is fun until production starts asking questions back.

The Big Picture

Embeddings are one of the foundational ideas behind modern AI applications. They let us turn messy, human data into searchable mathematical representations.

They do not replace databases. They do not replace search engines. They do not replace LLMs. Instead, they connect them.

In simple terms:

Embeddings turn meaning into numbers.
Vector databases search those numbers.
LLMs use the retrieved context to produce useful answers.

That is why embeddings matter. They are the quiet layer that makes AI systems feel like they know where to look.