
Befriending Vector Databases · Part 2
Part 2: Vector Database Concepts - Embeddings: Turning Meaning Into Something Machines Can Search
May 7, 2026 · 8 min read
Search used to be mostly about matching words. If you searched for "reset password," a system looked for pages containing those exact words. That works for simple keyword search — but it struggles with meaning. A document titled "Recover access to your account" might be exactly what you need, even though it never says "reset password."
Embeddings solve this problem.
An embedding is a list of numbers that represents the meaning of a piece of data. That data could be a sentence, paragraph, product, image, audio clip, code snippet, or user profile. The numbers themselves are not meaningful to humans, but they place related items close together in mathematical space.
For example:
"How do I reset my password?"
"I forgot my login credentials."
"Change my account password."These sentences use different words, but they mean similar things. A good embedding model will place their vectors near each other.
Meanwhile:
"What is your refund policy?"should land somewhere else.
That is the core idea: embeddings let software compare meaning, not just text.
How Embeddings Work
Embeddings are created by an embedding model. You send the model some input text, and it returns a vector:
"How do I reset my password?"becomes something like:
[0.021, -0.184, 0.772, 0.039, ...]Real embeddings usually contain hundreds or thousands of numbers. The exact values do not matter to us directly. What matters is the relationship between vectors.
If two vectors are close together, the original pieces of content are likely similar. If they are far apart, they are probably unrelated.
This makes embeddings useful for:
- Semantic search
- Recommendation systems
- Duplicate detection
- Document clustering
- RAG systems
- Image search
- Code search
- Personalization
- Anomaly detection
Why Embeddings Matter For AI Applications
Large language models are powerful, but they do not automatically know your private data. If you want an AI assistant to answer questions about your company documents, product catalog, support tickets, or codebase, you need retrieval.
That is where embeddings are usually used.
A common RAG workflow looks like this:
- Split documents into chunks.
- Create embeddings for each chunk.
- Store those embeddings in a vector database.
- Embed the user’s question.
- Search for the closest stored chunks.
- Send those chunks to the LLM as context.
So when a user asks:
"How do I configure SSO for enterprise accounts?"the system can retrieve the most semantically relevant chunks even if the documentation says:
"Setting up single sign-on for organization workspaces"This is the magic of embeddings: they bridge the gap between human phrasing and machine retrieval.
Embeddings Are Not Understanding, But They Are Useful
It is tempting to say embeddings "understand" meaning. That is a little too generous. Embeddings are statistical representations learned from data. They capture patterns: which words, concepts, images, or code structures tend to appear in similar contexts.
But even if they are not human understanding, they are extremely useful approximations.
They allow applications to answer questions like:
Which documents are related to this question?
Which products are similar to this product?
Which support tickets describe the same issue?
Which code files are relevant to this bug?Traditional databases are good at exact lookups. Embeddings are good at fuzzy meaning.
Choosing An Embedding Model
Different embedding models are optimized for different jobs. Some are general-purpose. Some are multilingual. Some specialize in code. Some handle images + text. Some are smaller and cheaper. Others are larger and more accurate.
Here’s a practical map of model types, with common examples and what they’re best at.
| Type | Examples | Best For |
|---|---|---|
| General text embeddings | OpenAI text-embedding-3-small, text-embedding-3-large; Google gemini-embedding-001; Cohere embed-v4.0; Voyage voyage-4 | RAG, semantic search, support docs, FAQs |
| Multilingual embeddings | Cohere embed-v4.0, Google gemini-embedding-001, BGE-M3, multilingual E5, Jina embeddings | Search across many languages |
| Code embeddings | Mistral codestral-embed, Voyage voyage-code-3, BGE-Code | Code search, coding assistants, repo Q&A |
| Multimodal embeddings | Cohere embed-v4.0, Jina jina-embeddings-v4, CLIP/SigLIP-style models | Text + image/PDF/image search |
| Open-source embeddings | Sentence Transformers, BGE, E5, Nomic, Jina, GTE/Qwen embeddings | Self-hosting, privacy, lower infra control |
| Domain-specific embeddings | Voyage finance/law/code models, medical-tuned models | Specialized retrieval where vocabulary matters |
For many RAG apps, I’d start with a strong general model like OpenAI text-embedding-3-small, Google gemini-embedding-001, Cohere embed-v4.0, or Voyage voyage-4, then evaluate retrieval quality on your actual data.
The right model depends on your use case:
- For support docs or internal knowledge bases, use a strong text embedding model.
- For global products, use a multilingual model.
- For code search, use a code-specific embedding model.
- For image or PDF-heavy workflows, consider multimodal embeddings.
- For privacy-sensitive systems, consider open-source or self-hosted models.
The important thing is to evaluate the model on your real data. Benchmarks are helpful, but retrieval quality is highly domain-specific.
Chunking: The Hidden Partner Of Embeddings
Embeddings are only as useful as the content you embed.
If you embed an entire 100-page PDF as one vector, the result becomes too broad. The embedding might represent the general topic, but it will not help retrieve one specific paragraph.
That is why documents are usually split into chunks first.
A common default is something like:
500–800 tokens, with ~10–20% overlapGood chunks are large enough to preserve meaning but small enough to be specific. Tiny chunks can lose context. Huge chunks can retrieve too much irrelevant material.
Here are the most common chunking approaches and how they work.
| Chunking Type | Uses A Model? | How It Works |
|---|---|---|
| Fixed-size chunking | No | Split every N characters/tokens, often with overlap |
| Recursive chunking | No | Try paragraphs first, then sentences, then words until chunks fit |
| Sentence/paragraph chunking | Light NLP/tokenizer | Split on sentence or paragraph boundaries |
| Structure-aware chunking | Sometimes | Use headings, Markdown, HTML, PDF layout, sections |
| Semantic chunking | Yes, embedding model | Embed sentences/paragraphs and split when meaning changes |
| LLM-based chunking | Yes, LLM | Ask an LLM to split document into meaningful units |
| Code-aware chunking | Parser/model optional | Split by functions, classes, modules, symbols |
Chunking is one of the most underrated parts of building a good vector search system. If retrieval feels "random," it is often chunking (or metadata) — not the vector database.
The Role Of Vector Databases
Once you have embeddings, you need somewhere to store and search them. That is what vector databases do.
A vector database stores:
- The embedding vector
- The original text or content
- Metadata like document ID, title, page number, author, tenant, date, or category
When a query comes in, the database finds the nearest vectors using similarity search.
This allows you to ask:
Find the 5 chunks most similar to this question.and get results back in milliseconds.
Common Mistakes With Embeddings
The first mistake is assuming embeddings are magic. They are not. You still need good data, clean parsing, sensible chunking, metadata, evaluation, and sometimes reranking.
The second mistake is using only vector search. Keyword search is still valuable, especially for exact terms like product IDs, error codes, names, dates, and acronyms. Many strong systems use hybrid search, combining keyword search with embedding-based search.
The third mistake is ignoring metadata. If a user should only see documents from their workspace, you need filtering. If freshness matters, you may need to boost recent documents. If source quality differs, you may need ranking rules.
The fourth mistake is never testing retrieval. Create a small evaluation set of real user questions and expected documents. Without evaluation, you are flying by vibes — which is fun until production starts asking questions back.
The Big Picture
Embeddings are one of the foundational ideas behind modern AI applications. They let us turn messy, human data into searchable mathematical representations.
They do not replace databases. They do not replace search engines. They do not replace LLMs. Instead, they connect them.
In simple terms:
Embeddings turn meaning into numbers.
Vector databases search those numbers.
LLMs use the retrieved context to produce useful answers.That is why embeddings matter. They are the quiet layer that makes AI systems feel like they know where to look.