Foundations · Day 7
Embeddings: How AI Understands What Words Actually Mean
Meaning, turned into numbers. Distance, used as similarity. One of the most elegant ideas in modern AI.
May 1, 2026·16 min read
✦ The Library That Was Not Arranged by Alphabet
Imagine a vast library.
But this library is strange.
The books are not arranged alphabetically. They are not arranged by author, or year, or genre.
Instead, every book is placed near other books that are similar in meaning.
A book about meditation sits close to books on breath, attention, nervous system regulation, and deep focus.
A book about React hooks sits close to books on state management, component lifecycle, and frontend architecture.
A book about password reset sits close to books on login issues, account recovery, and authentication flows.
Now imagine you walk in and say:
I cannot access my account.
The librarian does not scan for the exact words "cannot" and "access."
Instead, the librarian understands the meaning of your question, and walks toward the shelf where account recovery articles live.
That is the world embeddings create.
They arrange text by meaning, not by letters.
Not perfectly. Not with human understanding. But well enough that software can search language the way a thoughtful librarian would.
✦ What an Embedding Actually Is
An embedding is a list of numbers that represents the meaning of a piece of text.
The sentence:
I forgot my password.
becomes something like this:
The real list contains thousands of numbers. At first glance, they look meaningless.
But each number is not random. Together, they place the sentence at a specific location inside a large mathematical space. And the rule governing that space is simple:
Texts with similar meaning are placed closer together. Texts with different meaning are placed farther apart.
So these sentences land near each other:
I forgot my password. How do I reset my login? I cannot access my account. Where is the recovery link?
But these are far away:
How do I cook rice? What is the capital of Japan? Explain CSS specificity.
An embedding does not store the sentence as language.
It stores the sentence as position.
✦ Meaning as a Location
This is the shift that makes embeddings powerful.
Words become coordinates.
Think of a map. Mumbai and Pune are closer than Mumbai and London. Not because they share letters. Not because they sound similar. Because their physical positions are near each other.
Embeddings do something similar for meaning.
In embedding space:
"forgot password"
is closer to:
"reset login credentials"
than it is to:
"make chocolate cake"
Even though "forgot password" and "reset login credentials" share almost no words, they live in the same neighborhood of meaning.
That is the core insight.
It is not magic in any mystical sense. It is pattern compressed into geometry. The model learned, from billions of examples, which ideas tend to appear together, which questions point toward the same answers, which concepts share a neighborhood.
Now those neighborhoods have coordinates.
Each dot is a sentence. Similar meanings cluster together. Sentences with different meanings live far apart. Click any dot to read it.
Positions are illustrative. Real embeddings live in 1,536+ dimensions and cannot be drawn directly. This 2D projection preserves the clustering idea, not the exact distances.
You are searching a library of customer support articles. A user types: 'The app keeps crashing after I update my card details.' No article contains the word 'crashing.' No article mentions 'card details' in the context of crashes. But there is an article about app instability after payment flow changes. Would keyword search find it? What would an embedding model need to understand about both sentences to connect them?
✦ How the Comparison Works
Once two pieces of text have been converted into vectors, we can measure the distance between them.
Not emotionally. Mathematically.
The common technique is called cosine similarity. You do not need the formula. Just the intuition:
Are these two vectors pointing in a similar direction?
If two vectors point in nearly the same direction, the text is likely similar in meaning. If they point in very different directions, the text is likely unrelated.
This gives your system a clean, numeric signal:
The highest score is not a guarantee of a perfect result. But it gives your application a strong signal.
And AI systems are built from signals.
✦ Chat Models and Embedding Models Are Different Things
This distinction matters more than it first appears.
A chat model answers. An embedding model represents.
When you send a message to Claude or ChatGPT, the model produces text: a response. That is a chat model.
An embedding model does something entirely different. It does not answer. It does not speak. It silently converts a piece of text into a numerical form that software can compare.
| Model type | Input | Output | Used for |
|---|---|---|---|
| Chat model | Prompt | Text response | Answering, writing, reasoning |
| Embedding model | Text | Vector of numbers | Search, similarity, retrieval |
Many beginners assume embeddings are another kind of chatbot.
They are not.
Embeddings do not answer the user. They help your system find what the answer should be based on.
Think of a research assistant who, before the expert opens their mouth, has already pulled the three most relevant documents from the archive. The expert answers. The assistant retrieved.
That retrieval step is what embeddings do.
✦ Why Keyword Search Is Not Enough
Keyword search has its place.
If you search for invoice number INV-2026-771, you want an exact match. If you search for user ID 98231, you want the precise record. For those cases, keyword search is exactly right.
But many human questions are not keyword-shaped. They are intention-shaped.
A learner may ask:
Why does my page jump when the image loads?
But your article is titled:
Understanding Cumulative Layout Shift
A user may ask:
How do I stop the AI from making things up?
But your lesson is titled:
Reducing Hallucinations with Retrieval-Augmented Generation
A developer may ask:
How do I make the model answer in JSON?
But your article is titled:
Structured Outputs: Teaching AI to Answer in the Shape You Need
Keyword search misses these connections. Embeddings can find them.
Because embeddings do not search for matching words.
They search for the neighborhood of meaning.
Pick a query. Watch which document rises to the top — and notice that the winning match shares almost no words with the query.
Choose a query
Documents ranked by meaning similarity
✦Cumulative Layout Shift happens when visible elements move unexpectedly.
FrontendReact useState lets a component remember values between renders.
FrontendTo reset your password, click the recovery link on the login page.
AuthRAG helps an AI answer using external documents instead of only training data.
AIYou can update your billing address from the account settings page.
BillingScores are pre-computed from OpenAI text-embedding-3-small and rounded for clarity. Switch queries to see how the ranking shifts.
✦ What This Looks Like in Code
We will build this in layers.
First the setup. Then the function that creates an embedding. Then the function that compares two embeddings. Then the full search. Each piece is small enough to understand on its own. Together, they produce your first semantic search.
Step 1: Setup
Install the SDK and create your environment file:
npm install openai dotenv
# .env OPENAI_API_KEY=your_api_key_here
Create a file called embeddings-demo.js. Start with just two lines:
import "dotenv/config"; import OpenAI from "openai";
What these do:
import "dotenv/config" reads your .env file and loads the API key into the environment before anything else runs. It must come first.
import OpenAI from "openai" brings in OpenAI's SDK. Without this, you would be writing raw HTTP requests. The SDK handles authentication, retries, and response parsing for you.
Now create the client:
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
new OpenAI(...) creates a configured connection to OpenAI's servers. Every call you make will flow through this object. The API key is read from your environment variable, not hardcoded.
Step 2: The Embedding Function
This is the heart of the file. Add this function:
async function getEmbedding(text) {
const response = await openai.embeddings.create({
model: "text-embedding-3-small",
input: text,
encoding_format: "float",
});
return response.data[0].embedding;
}
What this does, piece by piece:
openai.embeddings.create(...) calls the embedding endpoint. This is different from openai.chat.completions.create. No message, no system prompt, no response to read. You are not asking for an answer. You are asking for a representation.
model: "text-embedding-3-small" is OpenAI's fast, capable embedding model. It converts text into 1,536 numbers. That is the vector.
input: text is the sentence you want to embed. It can be a word, a sentence, a paragraph, or a full document chunk.
encoding_format: "float" tells the API to return the vector as plain floating-point numbers, which is what you need for comparison.
response.data[0].embedding digs into the response to return just the array of numbers. The [0] is because the API can embed multiple inputs at once and returns an array of results. We are sending one, so we take the first.
The function is async because the API call takes time. await pauses execution until the response arrives.
Step 3: See What Comes Back
Now call the function and log the result:
const embedding = await getEmbedding("I forgot my password and cannot login.");
console.log("Vector length:", embedding.length);
console.log("First 10 values:", embedding.slice(0, 10));
Run it:
node embeddings-demo.js
You will not see a paragraph. You will see 1,536 numbers.
That is the point.
Your sentence has been converted into a position in meaning space. The text is gone. The meaning is encoded.
Step 4: The Similarity Function
Now add the function that compares two embeddings:
function dotProduct(a, b) {
return a.reduce((sum, val, i) => sum + val * b[i], 0);
}
This looks more mathematical than the idea it carries. Set aside the .reduce for a moment.
The only question this function answers is:
Are these two vectors pointing in a similar direction?
When two sentences carry similar meaning, their embedding vectors lean toward the same region of that vast numerical space. When the meanings diverge, the vectors diverge too.
dotProduct measures that lean. It returns one number. High: the directions are close, the meanings are likely related. Low: the directions differ, the meanings are probably not.
You do not need to understand the arithmetic to use this well. What matters is the shape of the question it asks: not "do these sentences share words?" but "do these vectors travel in the same direction?"
That shift in question is the whole idea of semantic search.
One more thing worth knowing: in a real application, you would not write this function at all. Vector databases compute similarity automatically, at scale, across millions of stored vectors in milliseconds. We are writing it here so you can see the logic with nothing hidden.
Step 5: The Full Search
Now put it all together. Add a list of documents and a query:
const documents = [ "To reset your password, click the recovery link on the login page.", "You can update your billing address from the account settings page.", "React useState lets a component remember values between renders.", "Cumulative Layout Shift happens when visible elements move unexpectedly.", "RAG helps an AI answer using external documents instead of only training data.", ]; const query = "My page jumps when content loads. What is happening?";
Embed the query, then embed each document and score it:
const queryEmbedding = await getEmbedding(query);
const results = [];
for (const doc of documents) {
const docEmbedding = await getEmbedding(doc);
results.push({ doc, score: dotProduct(queryEmbedding, docEmbedding) });
}
What is happening here:
The query is embedded once. Then each document is embedded one by one. For each document, we compute the dot product between the query vector and the document vector. The result is a score: how similar is this document to the query, in meaning.
Now sort by score and print:
results.sort((a, b) => b.score - a.score);
for (const { doc, score } of results) {
console.log(`${score.toFixed(4)} ${doc}`);
}
results.sort(...) arranges the documents from highest similarity to lowest. The most relevant document rises to the top.
The layout shift document rises to the top with a score of 0.87. The query said "page jumps." The document says "visible elements move unexpectedly." No words in common. The meaning is the same.
That is your first semantic search.
Small. Local. Imperfect. But real.
The Complete File
Here is everything together, ready to copy and run:
import "dotenv/config";
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
async function getEmbedding(text) {
const response = await openai.embeddings.create({
model: "text-embedding-3-small",
input: text,
encoding_format: "float",
});
return response.data[0].embedding;
}
function dotProduct(a, b) {
return a.reduce((sum, val, i) => sum + val * b[i], 0);
}
const documents = [
"To reset your password, click the recovery link on the login page.",
"You can update your billing address from the account settings page.",
"React useState lets a component remember values between renders.",
"Cumulative Layout Shift happens when visible elements move unexpectedly.",
"RAG helps an AI answer using external documents instead of only training data.",
];
const query = "My page jumps when content loads. What is happening?";
const queryEmbedding = await getEmbedding(query);
const results = [];
for (const doc of documents) {
const docEmbedding = await getEmbedding(doc);
results.push({ doc, score: dotProduct(queryEmbedding, docEmbedding) });
}
results.sort((a, b) => b.score - a.score);
for (const { doc, score } of results) {
console.log(`${score.toFixed(4)} ${doc}`);
}
A Note on Production
The code above embeds every document on every run. A real application would not do this.
You embed documents once, store them in a database, and only embed the user's question at query time. That is where vector databases enter the picture.
But the core logic is identical:
Convert text into vectors. Compare vectors. Retrieve the closest meaning.
✦ Two Phases: Indexing and Retrieval
A real application separates the work into two distinct moments.
Embeddings work in two separate phases that happen at different times. Switch between them to see what each one does.
Happens once, before any user asks anything
Your knowledge: articles, support docs, notes, codebases.
Long documents are cut into smaller, focused pieces. Each chunk covers one idea.
Each chunk gets its own embedding. A document split into 50 chunks produces 50 separate vectors. Not one vector per document, one per chunk.
Each chunk is stored with its vector and its origin: which document it came from, which section, which page. This metadata is what lets you cite a source later.
One thing worth making precise here, because it is easy to imagine it wrong.
The embedding is not created once per document. It is created once per chunk.
Split 10 documents into 100 chunks each and you have 1,000 chunk vectors sitting in your database. When a user asks a question, that question gets embedded and compared against all 1,000 chunk vectors simultaneously. The closest chunks rise to the top.
What comes back is not the document. It is the specific passage that matched, three or five of the most relevant chunks, each carrying source metadata: which document it came from, which section, which page. That metadata is what lets the model say "according to the support article on password recovery..." rather than making something up.
Each stored chunk record looks like this:
{
"text": "Structured outputs help AI responses become predictable objects.",
"embedding": [0.021, -0.088, 0.134, ...],
"source": "structured-outputs",
"section": "JSON Is Not the Goal"
}
The document embeddings are created ahead of time. The question embedding is created at query time. The two phases run at different moments, for different reasons.
This separation is what makes the system fast and cost-effective in production.
Tomorrow, when we study RAG, we will take the retrieved chunks and pass them into the model so the answer is grounded in your own content.
That is the bridge: embeddings help you find. RAG helps you answer using what you found.
✦ Where Embeddings Fall Short
Embeddings are powerful. But a good builder must also understand the failure modes.
Similarity is a signal, not a guarantee.
An embedding can retrieve something related but not sufficient. A question about cancelling after renewal may retrieve an article titled "How to cancel your subscription" without addressing the renewal policy specifically. Close enough to retrieve. Not complete enough to answer.
Short queries carry less context.
The query "apple support" could mean Apple the company, apple fruit farming, or support for an apple-themed school project. The shorter the query, the less the embedding model has to work with. Ambiguity at query time produces ambiguous retrieval.
Exact identifiers need exact search.
Embeddings are not the right tool for invoice IDs, error codes, user IDs, or legal clause numbers. For these, keyword search or filters are more reliable. A strong production system often combines both: semantic search for meaning, keyword search for precision, filters for hard boundaries.
Bad chunks create bad retrieval.
If a document is split poorly, a single chunk might contain password reset steps, billing policy, CSS layout notes, and deployment instructions all together. The embedding for that chunk represents a confused average of unrelated ideas. Good retrieval begins with good chunking. We will go deeper on this in RAG.
Embeddings do not update themselves.
If a document changes, the old embedding still represents the old text. Every content change requires regenerating the embedding and updating the stored vector. Without a re-indexing strategy, your search layer slowly becomes stale, silently.
| Embeddings work well for | Exact search works better for |
|---|---|
| Meaning-based search | Invoice numbers |
| Similar question matching | Error codes |
| Recommendations | User IDs |
| RAG context retrieval | Order identifiers |
| Concept clustering | Legal clause numbers |
✦ Embeddings and the Context Window
Day 3 taught us something important: the context window is a fixed container with a hard limit. Everything the model reasons from must fit inside it.
One naive way to answer from documents is to paste everything into the prompt and hope the model finds what it needs. This works for very small knowledge bases. It breaks quickly as the knowledge grows: too many tokens, too much noise, too much cost. And as we learned, accuracy drops sharply for information buried in the middle of a long context.
Embeddings solve this by helping you choose what enters the context window.
Instead of sending the whole library, you send only the most relevant pages.
The goal is not to give the model more context. The goal is to give it better context.
This is one of the most important architectural shifts in AI application development. The model's intelligence is fixed. What you can improve is the quality of what you place in front of it.
You are building an AI assistant for your own learning notes, covering tokens, context windows, prompting, structured outputs, and embeddings. A learner asks: 'Why does the model forget what I said earlier?' Which note should your system retrieve? Would keyword search find it? Would embeddings do better? And if the retrieved note is too long, which part should be sent to the model? Sit with this. This is the shift from using AI to designing AI systems.
✦ What This Opens Up for Builders
Once you understand embeddings, you begin seeing AI features differently.
A document Q&A bot is not just a chatbot.
It is: document chunks, embeddings, retrieval, and answer generation working together.
A resume reviewer can go beyond a single prompt.
It can be: resume text, job description embedding, skill gap search, and structured feedback.
An AI learning coach can do more than answer questions.
It can use: learner history, concept embeddings, weak area retrieval, and a personalized next step.
A support assistant can stop guessing.
It can be: knowledge base articles, semantic search, and source-aware responses.
Embeddings are one of the first places where AI stops feeling like a chat window and starts feeling like infrastructure.
✦ Try It Yourself
Use the code from this lesson and test these queries against the same five documents:
I cannot access my account. My bill has the wrong address. Why does my UI move after loading? How can AI answer from my own files? How do components remember values?
For each query, observe which document ranked first. Was the result expected? Did any surprising document rank highly? What does that tell you about how semantic similarity works across different domains?
Then add your own documents. Use notes from the previous Tapovan lessons:
Tokens are fragments of text that hold predictive value. The context window is the fixed container of tokens the model can see. Structured outputs make AI responses predictable for applications. Prompts shape the probability space of the model's response.
Now ask:
How do I make AI output usable in my frontend?
The structured outputs note should rise toward the top.
When it does, pause for a moment.
You just built the first piece of a RAG system.
✦ Takeaway Summary
| Concept | What It Means |
|---|---|
| Embedding | A numerical representation of text meaning |
| Vector | An array of numbers that places text in meaning space |
| Cosine similarity | A measure of how closely two vectors point in the same direction |
| Semantic search | Search based on meaning, not only exact words |
| Indexing | The pre-query phase: embed documents and store the vectors |
| Retrieval | The query phase: embed the question, find nearest vectors, return relevant chunks |
| RAG connection | Embeddings retrieve the context. The model generates an answer from that context. |
✦ Learn More
- OpenAI Embeddings Guide: the official reference for text-embedding-3-small and how to use it in production
- MongoDB Atlas Vector Search: a practical guide to storing and querying embeddings in a real database
- OpenAI Cookbook: Semantic Search with Embeddings: a working notebook that puts everything from this lesson into practice
- Pinecone: Vector Embeddings Explained: a detailed look at how vectors are stored, indexed, and queried at scale
A word is what we see on the surface. Meaning is the current beneath it.
Embeddings do not give the machine a human mind. But they give our software a way to sense nearness.
And once meaning has distance, search becomes something more than matching.
It becomes discovery.