Embeddings: How AI Understands What Words Actually Mean

The Gap Between Words and Meaning

In Day 6, you gave AI a shape.

You stopped asking for paragraphs and started asking for objects: a score, a list of strengths, a next step your application could act on without parsing English first.

That was a quiet shift. AI stopped being something you read and started becoming something your software could hold.

But there is still a layer missing.

Before AI can answer from your documents, your notes, your company knowledge base, or your codebase, it must first find the right information. And finding is not as simple as it sounds.

A user may ask:

How do I get back into my account?

But the help article says:

Reset your forgotten password using the recovery link.

No keyword in the question matches any keyword in the answer. A traditional search engine would not connect them.

And yet, you and I know they are asking about the same thing.

That invisible relatedness, the gap between the words someone uses and the words a document contains, is what embeddings are designed to close.

Today, we study the quiet layer that sits beneath search, retrieval, and every AI system that reasons from your own data.

We study how meaning becomes measurable.

✦ The Library That Was Not Arranged by Alphabet

Imagine a vast library.

But this library is strange.

The books are not arranged alphabetically. They are not arranged by author, or year, or genre.

Instead, every book is placed near other books that are similar in meaning.

A book about meditation sits close to books on breath, attention, nervous system regulation, and deep focus.

A book about React hooks sits close to books on state management, component lifecycle, and frontend architecture.

A book about password reset sits close to books on login issues, account recovery, and authentication flows.

Now imagine you walk in and say:

I cannot access my account.

The librarian does not scan for the exact words "cannot" and "access."

Instead, the librarian understands the meaning of your question, and walks toward the shelf where account recovery articles live.

That is the world embeddings create.

They arrange text by meaning, not by letters.

Not perfectly. Not with human understanding. But well enough that software can search language the way a thoughtful librarian would.

✦ What an Embedding Actually Is

An embedding is a list of numbers that represents the meaning of a piece of text.

The sentence:

I forgot my password.

becomes something like this:

✦An embedding vector

Input: "I forgot my password." Output: [0.018, -0.241, 0.087, 0.004, -0.119, 0.203, -0.076, ...]

The real list contains thousands of numbers. At first glance, they look meaningless.

But each number is not random. Together, they place the sentence at a specific location inside a large mathematical space. And the rule governing that space is simple:

Texts with similar meaning are placed closer together. Texts with different meaning are placed farther apart.

So these sentences land near each other:

I forgot my password. How do I reset my login? I cannot access my account. Where is the recovery link?

But these are far away:

How do I cook rice? What is the capital of Japan? Explain CSS specificity.

An embedding does not store the sentence as language.

It stores the sentence as position.

✦ Meaning as a Location

This is the shift that makes embeddings powerful.

Words become coordinates.

Think of a map. Mumbai and Pune are closer than Mumbai and London. Not because they share letters. Not because they sound similar. Because their physical positions are near each other.

Embeddings do something similar for meaning.

In embedding space:

"forgot password"

is closer to:

"reset login credentials"

than it is to:

"make chocolate cake"

Even though "forgot password" and "reset login credentials" share almost no words, they live in the same neighborhood of meaning.

That is the core insight.

It is not magic in any mystical sense. It is pattern compressed into geometry. The model learned, from billions of examples, which ideas tend to appear together, which questions point toward the same answers, which concepts share a neighborhood.

Now those neighborhoods have coordinates.

✦Visual · Meaning Map

Each dot is a sentence. Similar meanings cluster together. Sentences with different meanings live far apart. Click any dot to read it.

Click any dot to read the sentence.

Account & Access

Frontend & Code

Food & Cooking

Positions are illustrative. Real embeddings live in 1,536+ dimensions and cannot be drawn directly. This 2D projection preserves the clustering idea, not the exact distances.

✦Pause & Reflect

You are searching a library of customer support articles. A user types: 'The app keeps crashing after I update my card details.' No article contains the word 'crashing.' No article mentions 'card details' in the context of crashes. But there is an article about app instability after payment flow changes. Would keyword search find it? What would an embedding model need to understand about both sentences to connect them?

0/500

✦ How the Comparison Works

Once two pieces of text have been converted into vectors, we can measure the distance between them.

Not emotionally. Mathematically.

The common technique is called cosine similarity. You do not need the formula. Just the intuition:

Are these two vectors pointing in a similar direction?

If two vectors point in nearly the same direction, the text is likely similar in meaning. If they point in very different directions, the text is likely unrelated.

This gives your system a clean, numeric signal:

✦Similarity scores

Query: "I cannot login" Article A: "How to reset your password" → Similarity: 0.86 Article B: "How to change your profile photo" → Similarity: 0.41 Article C: "How to export invoices" → Similarity: 0.22

The highest score is not a guarantee of a perfect result. But it gives your application a strong signal.

And AI systems are built from signals.

✦ Chat Models and Embedding Models Are Different Things

This distinction matters more than it first appears.

A chat model answers. An embedding model represents.

When you send a message to Claude or ChatGPT, the model produces text: a response. That is a chat model.

An embedding model does something entirely different. It does not answer. It does not speak. It silently converts a piece of text into a numerical form that software can compare.

Model type	Input	Output	Used for
Chat model	Prompt	Text response	Answering, writing, reasoning
Embedding model	Text	Vector of numbers	Search, similarity, retrieval

Many beginners assume embeddings are another kind of chatbot.

They are not.

Embeddings do not answer the user. They help your system find what the answer should be based on.

Think of a research assistant who, before the expert opens their mouth, has already pulled the three most relevant documents from the archive. The expert answers. The assistant retrieved.

That retrieval step is what embeddings do.

✦ Why Keyword Search Is Not Enough

Keyword search has its place.

If you search for invoice number INV-2026-771, you want an exact match. If you search for user ID 98231, you want the precise record. For those cases, keyword search is exactly right.

But many human questions are not keyword-shaped. They are intention-shaped.

A learner may ask:

Why does my page jump when the image loads?

But your article is titled:

Understanding Cumulative Layout Shift

A user may ask:

How do I stop the AI from making things up?

But your lesson is titled:

Reducing Hallucinations with Retrieval-Augmented Generation

A developer may ask:

How do I make the model answer in JSON?

But your article is titled:

Structured Outputs: Teaching AI to Answer in the Shape You Need

Keyword search misses these connections. Embeddings can find them.

Because embeddings do not search for matching words.

They search for the neighborhood of meaning.

✦Interactive · Semantic Search

Pick a query. Watch which document rises to the top — and notice that the winning match shares almost no words with the query.

Choose a query

Documents ranked by meaning similarity

✦Cumulative Layout Shift happens when visible elements move unexpectedly.

Frontend

Very high

React useState lets a component remember values between renders.

Frontend

Moderate

To reset your password, click the recovery link on the login page.

Auth

Low

RAG helps an AI answer using external documents instead of only training data.

Low

You can update your billing address from the account settings page.

Billing

Low

"Page jumps" has no words in common with "visible elements move unexpectedly." But both describe the same physical experience. The meaning is near. The words are not.

Scores are pre-computed from OpenAI text-embedding-3-small and rounded for clarity. Switch queries to see how the ranking shifts.

✦ What This Looks Like in Code

We will build this in layers.

First the setup. Then the function that creates an embedding. Then the function that compares two embeddings. Then the full search. Each piece is small enough to understand on its own. Together, they produce your first semantic search.

Step 1: Setup

Install the SDK and create your environment file:

Bash

npm install openai dotenv

# .env
OPENAI_API_KEY=your_api_key_here

Create a file called embeddings-demo.js. Start with just two lines:

JavaScript

import "dotenv/config";
import OpenAI from "openai";

What these do:

import "dotenv/config" reads your .env file and loads the API key into the environment before anything else runs. It must come first.

import OpenAI from "openai" brings in OpenAI's SDK. Without this, you would be writing raw HTTP requests. The SDK handles authentication, retries, and response parsing for you.

Now create the client:

JavaScript

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

new OpenAI(...) creates a configured connection to OpenAI's servers. Every call you make will flow through this object. The API key is read from your environment variable, not hardcoded.

Step 2: The Embedding Function

This is the heart of the file. Add this function:

JavaScript

async function getEmbedding(text) {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
    encoding_format: "float",
  });
  return response.data[0].embedding;
}

What this does, piece by piece:

openai.embeddings.create(...) calls the embedding endpoint. This is different from openai.chat.completions.create. No message, no system prompt, no response to read. You are not asking for an answer. You are asking for a representation.

model: "text-embedding-3-small" is OpenAI's fast, capable embedding model. It converts text into 1,536 numbers. That is the vector.

input: text is the sentence you want to embed. It can be a word, a sentence, a paragraph, or a full document chunk.

encoding_format: "float" tells the API to return the vector as plain floating-point numbers, which is what you need for comparison.

response.data[0].embedding digs into the response to return just the array of numbers. The [0] is because the API can embed multiple inputs at once and returns an array of results. We are sending one, so we take the first.

The function is async because the API call takes time. await pauses execution until the response arrives.

Step 3: See What Comes Back

Now call the function and log the result:

JavaScript

const embedding = await getEmbedding("I forgot my password and cannot login.");

console.log("Vector length:", embedding.length);
console.log("First 10 values:", embedding.slice(0, 10));

Run it:

Bash

node embeddings-demo.js

✦What you will see

Vector length: 1536 First 10 values: [ 0.018, -0.241, 0.087, 0.004, -0.119, 0.203, -0.076, 0.031, -0.055, 0.142 ]

You will not see a paragraph. You will see 1,536 numbers.

That is the point.

Your sentence has been converted into a position in meaning space. The text is gone. The meaning is encoded.

Step 4: The Similarity Function

Now add the function that compares two embeddings:

JavaScript

function dotProduct(a, b) {
  return a.reduce((sum, val, i) => sum + val * b[i], 0);
}

This looks more mathematical than the idea it carries. Set aside the .reduce for a moment.

The only question this function answers is:

Are these two vectors pointing in a similar direction?

When two sentences carry similar meaning, their embedding vectors lean toward the same region of that vast numerical space. When the meanings diverge, the vectors diverge too.

dotProduct measures that lean. It returns one number. High: the directions are close, the meanings are likely related. Low: the directions differ, the meanings are probably not.

You do not need to understand the arithmetic to use this well. What matters is the shape of the question it asks: not "do these sentences share words?" but "do these vectors travel in the same direction?"

That shift in question is the whole idea of semantic search.

One more thing worth knowing: in a real application, you would not write this function at all. Vector databases compute similarity automatically, at scale, across millions of stored vectors in milliseconds. We are writing it here so you can see the logic with nothing hidden.

Step 5: The Full Search

Now put it all together. Add a list of documents and a query:

JavaScript

const documents = [
  "To reset your password, click the recovery link on the login page.",
  "You can update your billing address from the account settings page.",
  "React useState lets a component remember values between renders.",
  "Cumulative Layout Shift happens when visible elements move unexpectedly.",
  "RAG helps an AI answer using external documents instead of only training data.",
];

const query = "My page jumps when content loads. What is happening?";

Embed the query, then embed each document and score it:

JavaScript

const queryEmbedding = await getEmbedding(query);

const results = [];

for (const doc of documents) {
  const docEmbedding = await getEmbedding(doc);
  results.push({ doc, score: dotProduct(queryEmbedding, docEmbedding) });
}

What is happening here:

The query is embedded once. Then each document is embedded one by one. For each document, we compute the dot product between the query vector and the document vector. The result is a score: how similar is this document to the query, in meaning.

Now sort by score and print:

JavaScript

results.sort((a, b) => b.score - a.score);

for (const { doc, score } of results) {
  console.log(`${score.toFixed(4)}  ${doc}`);
}

results.sort(...) arranges the documents from highest similarity to lowest. The most relevant document rises to the top.

✦What you will see

0.8712 Cumulative Layout Shift happens when visible elements move unexpectedly. 0.5198 React useState lets a component remember values between renders. 0.2341 To reset your password, click the recovery link on the login page. 0.1887 RAG helps an AI answer using external documents instead of only training data. 0.1423 You can update your billing address from the account settings page.

The layout shift document rises to the top with a score of 0.87. The query said "page jumps." The document says "visible elements move unexpectedly." No words in common. The meaning is the same.

That is your first semantic search.

Small. Local. Imperfect. But real.

The Complete File

Here is everything together, ready to copy and run:

JavaScript

import "dotenv/config";
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

async function getEmbedding(text) {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
    encoding_format: "float",
  });
  return response.data[0].embedding;
}

function dotProduct(a, b) {
  return a.reduce((sum, val, i) => sum + val * b[i], 0);
}

const documents = [
  "To reset your password, click the recovery link on the login page.",
  "You can update your billing address from the account settings page.",
  "React useState lets a component remember values between renders.",
  "Cumulative Layout Shift happens when visible elements move unexpectedly.",
  "RAG helps an AI answer using external documents instead of only training data.",
];

const query = "My page jumps when content loads. What is happening?";

const queryEmbedding = await getEmbedding(query);

const results = [];

for (const doc of documents) {
  const docEmbedding = await getEmbedding(doc);
  results.push({ doc, score: dotProduct(queryEmbedding, docEmbedding) });
}

results.sort((a, b) => b.score - a.score);

for (const { doc, score } of results) {
  console.log(`${score.toFixed(4)}  ${doc}`);
}

A Note on Production

The code above embeds every document on every run. A real application would not do this.

You embed documents once, store them in a database, and only embed the user's question at query time. That is where vector databases enter the picture.

But the core logic is identical:

Convert text into vectors. Compare vectors. Retrieve the closest meaning.

✦ Two Phases: Indexing and Retrieval

A real application separates the work into two distinct moments.

✦Visual · Indexing and Retrieval

Embeddings work in two separate phases that happen at different times. Switch between them to see what each one does.

Happens once, before any user asks anything

📄

Documentssource

Your knowledge: articles, support docs, notes, codebases.

✂

Split into chunks

Long documents are cut into smaller, focused pieces. Each chunk covers one idea.

⬡

Create embeddings

Each chunk gets its own embedding. A document split into 50 chunks produces 50 separate vectors. Not one vector per document, one per chunk.

🗄

Store chunk + vector + sourcestored once

Each chunk is stored with its vector and its origin: which document it came from, which section, which page. This metadata is what lets you cite a source later.

Chunk embeddings are created ahead of time, not when the user asks. The heavy work is done in advance. At query time, only the question needs to be embedded.

One thing worth making precise here, because it is easy to imagine it wrong.

The embedding is not created once per document. It is created once per chunk.

Split 10 documents into 100 chunks each and you have 1,000 chunk vectors sitting in your database. When a user asks a question, that question gets embedded and compared against all 1,000 chunk vectors simultaneously. The closest chunks rise to the top.

What comes back is not the document. It is the specific passage that matched, three or five of the most relevant chunks, each carrying source metadata: which document it came from, which section, which page. That metadata is what lets the model say "according to the support article on password recovery..." rather than making something up.

Each stored chunk record looks like this:

JSON

{
  "text": "Structured outputs help AI responses become predictable objects.",
  "embedding": [0.021, -0.088, 0.134, ...],
  "source": "structured-outputs",
  "section": "JSON Is Not the Goal"
}

The document embeddings are created ahead of time. The question embedding is created at query time. The two phases run at different moments, for different reasons.

This separation is what makes the system fast and cost-effective in production.

Tomorrow, when we study RAG, we will take the retrieved chunks and pass them into the model so the answer is grounded in your own content.

That is the bridge: embeddings help you find. RAG helps you answer using what you found.

✦ Where Embeddings Fall Short

Embeddings are powerful. But a good builder must also understand the failure modes.

Similarity is a signal, not a guarantee.

An embedding can retrieve something related but not sufficient. A question about cancelling after renewal may retrieve an article titled "How to cancel your subscription" without addressing the renewal policy specifically. Close enough to retrieve. Not complete enough to answer.

Short queries carry less context.

The query "apple support" could mean Apple the company, apple fruit farming, or support for an apple-themed school project. The shorter the query, the less the embedding model has to work with. Ambiguity at query time produces ambiguous retrieval.

Exact identifiers need exact search.

Embeddings are not the right tool for invoice IDs, error codes, user IDs, or legal clause numbers. For these, keyword search or filters are more reliable. A strong production system often combines both: semantic search for meaning, keyword search for precision, filters for hard boundaries.

Bad chunks create bad retrieval.

If a document is split poorly, a single chunk might contain password reset steps, billing policy, CSS layout notes, and deployment instructions all together. The embedding for that chunk represents a confused average of unrelated ideas. Good retrieval begins with good chunking. We will go deeper on this in RAG.

Embeddings do not update themselves.

If a document changes, the old embedding still represents the old text. Every content change requires regenerating the embedding and updating the stored vector. Without a re-indexing strategy, your search layer slowly becomes stale, silently.

Embeddings work well for	Exact search works better for
Meaning-based search	Invoice numbers
Similar question matching	Error codes
Recommendations	User IDs
RAG context retrieval	Order identifiers
Concept clustering	Legal clause numbers

✦ Embeddings and the Context Window

Day 3 taught us something important: the context window is a fixed container with a hard limit. Everything the model reasons from must fit inside it.

One naive way to answer from documents is to paste everything into the prompt and hope the model finds what it needs. This works for very small knowledge bases. It breaks quickly as the knowledge grows: too many tokens, too much noise, too much cost. And as we learned, accuracy drops sharply for information buried in the middle of a long context.

Embeddings solve this by helping you choose what enters the context window.

Instead of sending the whole library, you send only the most relevant pages.

The goal is not to give the model more context. The goal is to give it better context.

This is one of the most important architectural shifts in AI application development. The model's intelligence is fixed. What you can improve is the quality of what you place in front of it.

✦Pause & Reflect

You are building an AI assistant for your own learning notes, covering tokens, context windows, prompting, structured outputs, and embeddings. A learner asks: 'Why does the model forget what I said earlier?' Which note should your system retrieve? Would keyword search find it? Would embeddings do better? And if the retrieved note is too long, which part should be sent to the model? Sit with this. This is the shift from using AI to designing AI systems.

0/500

✦ What This Opens Up for Builders

Once you understand embeddings, you begin seeing AI features differently.

A document Q&A bot is not just a chatbot.

It is: document chunks, embeddings, retrieval, and answer generation working together.

A resume reviewer can go beyond a single prompt.

It can be: resume text, job description embedding, skill gap search, and structured feedback.

An AI learning coach can do more than answer questions.

It can use: learner history, concept embeddings, weak area retrieval, and a personalized next step.

A support assistant can stop guessing.

It can be: knowledge base articles, semantic search, and source-aware responses.

Embeddings are one of the first places where AI stops feeling like a chat window and starts feeling like infrastructure.

✦ Try It Yourself

Use the code from this lesson and test these queries against the same five documents:

I cannot access my account.
My bill has the wrong address.
Why does my UI move after loading?
How can AI answer from my own files?
How do components remember values?

For each query, observe which document ranked first. Was the result expected? Did any surprising document rank highly? What does that tell you about how semantic similarity works across different domains?

Then add your own documents. Use notes from the previous Tapovan lessons:

Tokens are fragments of text that hold predictive value.
The context window is the fixed container of tokens the model can see.
Structured outputs make AI responses predictable for applications.
Prompts shape the probability space of the model's response.

Now ask:

How do I make AI output usable in my frontend?

The structured outputs note should rise toward the top.

When it does, pause for a moment.

You just built the first piece of a RAG system.

✦ Takeaway Summary

Concept	What It Means
Embedding	A numerical representation of text meaning
Vector	An array of numbers that places text in meaning space
Cosine similarity	A measure of how closely two vectors point in the same direction
Semantic search	Search based on meaning, not only exact words
Indexing	The pre-query phase: embed documents and store the vectors
Retrieval	The query phase: embed the question, find nearest vectors, return relevant chunks
RAG connection	Embeddings retrieve the context. The model generates an answer from that context.

✦ Learn More

OpenAI Embeddings Guide: the official reference for text-embedding-3-small and how to use it in production
MongoDB Atlas Vector Search: a practical guide to storing and querying embeddings in a real database
OpenAI Cookbook: Semantic Search with Embeddings: a working notebook that puts everything from this lesson into practice
Pinecone: Vector Embeddings Explained: a detailed look at how vectors are stored, indexed, and queried at scale

✦ ✦ ✦

A word is what we see on the surface. Meaning is the current beneath it.

Embeddings do not give the machine a human mind. But they give our software a way to sense nearness.

And once meaning has distance, search becomes something more than matching.

It becomes discovery.

The Gap Between Words and Meaning

In Day 6, you gave AI a shape.

You stopped asking for paragraphs and started asking for objects: a score, a list of strengths, a next step your application could act on without parsing English first.

That was a quiet shift. AI stopped being something you read and started becoming something your software could hold.

But there is still a layer missing.

Before AI can answer from your documents, your notes, your company knowledge base, or your codebase, it must first find the right information. And finding is not as simple as it sounds.

A user may ask:

How do I get back into my account?

But the help article says:

Reset your forgotten password using the recovery link.

No keyword in the question matches any keyword in the answer. A traditional search engine would not connect them.

And yet, you and I know they are asking about the same thing.

That invisible relatedness, the gap between the words someone uses and the words a document contains, is what embeddings are designed to close.

Today, we study the quiet layer that sits beneath search, retrieval, and every AI system that reasons from your own data.

We study how meaning becomes measurable.

✦ The Library That Was Not Arranged by Alphabet

Imagine a vast library.

But this library is strange.

The books are not arranged alphabetically. They are not arranged by author, or year, or genre.

Instead, every book is placed near other books that are similar in meaning.

A book about meditation sits close to books on breath, attention, nervous system regulation, and deep focus.

A book about React hooks sits close to books on state management, component lifecycle, and frontend architecture.

A book about password reset sits close to books on login issues, account recovery, and authentication flows.

Now imagine you walk in and say:

I cannot access my account.

The librarian does not scan for the exact words "cannot" and "access."

Instead, the librarian understands the meaning of your question, and walks toward the shelf where account recovery articles live.

That is the world embeddings create.

They arrange text by meaning, not by letters.

Not perfectly. Not with human understanding. But well enough that software can search language the way a thoughtful librarian would.

✦ What an Embedding Actually Is

An embedding is a list of numbers that represents the meaning of a piece of text.

The sentence:

I forgot my password.

becomes something like this:

✦An embedding vector

Input: "I forgot my password." Output: [0.018, -0.241, 0.087, 0.004, -0.119, 0.203, -0.076, ...]

The real list contains thousands of numbers. At first glance, they look meaningless.

But each number is not random. Together, they place the sentence at a specific location inside a large mathematical space. And the rule governing that space is simple:

Texts with similar meaning are placed closer together. Texts with different meaning are placed farther apart.

So these sentences land near each other:

I forgot my password. How do I reset my login? I cannot access my account. Where is the recovery link?

But these are far away:

How do I cook rice? What is the capital of Japan? Explain CSS specificity.

An embedding does not store the sentence as language.

It stores the sentence as position.

✦ Meaning as a Location

This is the shift that makes embeddings powerful.

Words become coordinates.

Think of a map. Mumbai and Pune are closer than Mumbai and London. Not because they share letters. Not because they sound similar. Because their physical positions are near each other.

Embeddings do something similar for meaning.

In embedding space:

"forgot password"

is closer to:

"reset login credentials"

than it is to:

"make chocolate cake"

Even though "forgot password" and "reset login credentials" share almost no words, they live in the same neighborhood of meaning.

That is the core insight.

Now those neighborhoods have coordinates.

✦Visual · Meaning Map

Each dot is a sentence. Similar meanings cluster together. Sentences with different meanings live far apart. Click any dot to read it.

Click any dot to read the sentence.

Account & Access

Frontend & Code

Food & Cooking

Positions are illustrative. Real embeddings live in 1,536+ dimensions and cannot be drawn directly. This 2D projection preserves the clustering idea, not the exact distances.

✦Pause & Reflect

0/500

✦ How the Comparison Works

Once two pieces of text have been converted into vectors, we can measure the distance between them.

Not emotionally. Mathematically.

The common technique is called cosine similarity. You do not need the formula. Just the intuition:

Are these two vectors pointing in a similar direction?

If two vectors point in nearly the same direction, the text is likely similar in meaning. If they point in very different directions, the text is likely unrelated.

This gives your system a clean, numeric signal:

✦Similarity scores

The highest score is not a guarantee of a perfect result. But it gives your application a strong signal.

And AI systems are built from signals.

✦ Chat Models and Embedding Models Are Different Things

This distinction matters more than it first appears.

A chat model answers. An embedding model represents.

When you send a message to Claude or ChatGPT, the model produces text: a response. That is a chat model.

An embedding model does something entirely different. It does not answer. It does not speak. It silently converts a piece of text into a numerical form that software can compare.

Model type	Input	Output	Used for
Chat model	Prompt	Text response	Answering, writing, reasoning
Embedding model	Text	Vector of numbers	Search, similarity, retrieval

Many beginners assume embeddings are another kind of chatbot.

They are not.

Embeddings do not answer the user. They help your system find what the answer should be based on.

Think of a research assistant who, before the expert opens their mouth, has already pulled the three most relevant documents from the archive. The expert answers. The assistant retrieved.

That retrieval step is what embeddings do.

✦ Why Keyword Search Is Not Enough

Keyword search has its place.

If you search for invoice number INV-2026-771, you want an exact match. If you search for user ID 98231, you want the precise record. For those cases, keyword search is exactly right.

But many human questions are not keyword-shaped. They are intention-shaped.

A learner may ask:

Why does my page jump when the image loads?

But your article is titled:

Understanding Cumulative Layout Shift

A user may ask:

How do I stop the AI from making things up?

But your lesson is titled:

Reducing Hallucinations with Retrieval-Augmented Generation

A developer may ask:

How do I make the model answer in JSON?

But your article is titled:

Structured Outputs: Teaching AI to Answer in the Shape You Need

Keyword search misses these connections. Embeddings can find them.

Because embeddings do not search for matching words.

They search for the neighborhood of meaning.

✦Interactive · Semantic Search

Pick a query. Watch which document rises to the top — and notice that the winning match shares almost no words with the query.

Choose a query

Documents ranked by meaning similarity

✦Cumulative Layout Shift happens when visible elements move unexpectedly.

Frontend

Very high

React useState lets a component remember values between renders.

Frontend

Moderate

To reset your password, click the recovery link on the login page.

Auth

Low

RAG helps an AI answer using external documents instead of only training data.

Low

You can update your billing address from the account settings page.

Billing

Low

"Page jumps" has no words in common with "visible elements move unexpectedly." But both describe the same physical experience. The meaning is near. The words are not.

Scores are pre-computed from OpenAI text-embedding-3-small and rounded for clarity. Switch queries to see how the ranking shifts.

✦ What This Looks Like in Code

We will build this in layers.

Step 1: Setup

Install the SDK and create your environment file:

Bash

npm install openai dotenv

# .env
OPENAI_API_KEY=your_api_key_here

Create a file called embeddings-demo.js. Start with just two lines:

JavaScript

import "dotenv/config";
import OpenAI from "openai";

What these do:

import "dotenv/config" reads your .env file and loads the API key into the environment before anything else runs. It must come first.

import OpenAI from "openai" brings in OpenAI's SDK. Without this, you would be writing raw HTTP requests. The SDK handles authentication, retries, and response parsing for you.

Now create the client:

JavaScript

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

new OpenAI(...) creates a configured connection to OpenAI's servers. Every call you make will flow through this object. The API key is read from your environment variable, not hardcoded.

Step 2: The Embedding Function

This is the heart of the file. Add this function:

JavaScript

async function getEmbedding(text) {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
    encoding_format: "float",
  });
  return response.data[0].embedding;
}

What this does, piece by piece:

model: "text-embedding-3-small" is OpenAI's fast, capable embedding model. It converts text into 1,536 numbers. That is the vector.

input: text is the sentence you want to embed. It can be a word, a sentence, a paragraph, or a full document chunk.

encoding_format: "float" tells the API to return the vector as plain floating-point numbers, which is what you need for comparison.

The function is async because the API call takes time. await pauses execution until the response arrives.

Step 3: See What Comes Back

Now call the function and log the result:

JavaScript

const embedding = await getEmbedding("I forgot my password and cannot login.");

console.log("Vector length:", embedding.length);
console.log("First 10 values:", embedding.slice(0, 10));

Run it:

Bash

node embeddings-demo.js

✦What you will see

Vector length: 1536 First 10 values: [ 0.018, -0.241, 0.087, 0.004, -0.119, 0.203, -0.076, 0.031, -0.055, 0.142 ]

You will not see a paragraph. You will see 1,536 numbers.

That is the point.

Your sentence has been converted into a position in meaning space. The text is gone. The meaning is encoded.

Step 4: The Similarity Function

Now add the function that compares two embeddings:

JavaScript

function dotProduct(a, b) {
  return a.reduce((sum, val, i) => sum + val * b[i], 0);
}

This looks more mathematical than the idea it carries. Set aside the .reduce for a moment.

The only question this function answers is:

Are these two vectors pointing in a similar direction?

When two sentences carry similar meaning, their embedding vectors lean toward the same region of that vast numerical space. When the meanings diverge, the vectors diverge too.

dotProduct measures that lean. It returns one number. High: the directions are close, the meanings are likely related. Low: the directions differ, the meanings are probably not.

That shift in question is the whole idea of semantic search.

Step 5: The Full Search

Now put it all together. Add a list of documents and a query:

JavaScript

const documents = [
  "To reset your password, click the recovery link on the login page.",
  "You can update your billing address from the account settings page.",
  "React useState lets a component remember values between renders.",
  "Cumulative Layout Shift happens when visible elements move unexpectedly.",
  "RAG helps an AI answer using external documents instead of only training data.",
];

const query = "My page jumps when content loads. What is happening?";

Embed the query, then embed each document and score it:

JavaScript

const queryEmbedding = await getEmbedding(query);

const results = [];

for (const doc of documents) {
  const docEmbedding = await getEmbedding(doc);
  results.push({ doc, score: dotProduct(queryEmbedding, docEmbedding) });
}

What is happening here:

Now sort by score and print:

JavaScript

results.sort((a, b) => b.score - a.score);

for (const { doc, score } of results) {
  console.log(`${score.toFixed(4)}  ${doc}`);
}

results.sort(...) arranges the documents from highest similarity to lowest. The most relevant document rises to the top.

✦What you will see

The layout shift document rises to the top with a score of 0.87. The query said "page jumps." The document says "visible elements move unexpectedly." No words in common. The meaning is the same.

That is your first semantic search.

Small. Local. Imperfect. But real.

The Complete File

Here is everything together, ready to copy and run:

JavaScript

import "dotenv/config";
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

async function getEmbedding(text) {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
    encoding_format: "float",
  });
  return response.data[0].embedding;
}

function dotProduct(a, b) {
  return a.reduce((sum, val, i) => sum + val * b[i], 0);
}

const documents = [
  "To reset your password, click the recovery link on the login page.",
  "You can update your billing address from the account settings page.",
  "React useState lets a component remember values between renders.",
  "Cumulative Layout Shift happens when visible elements move unexpectedly.",
  "RAG helps an AI answer using external documents instead of only training data.",
];

const query = "My page jumps when content loads. What is happening?";

const queryEmbedding = await getEmbedding(query);

const results = [];

for (const doc of documents) {
  const docEmbedding = await getEmbedding(doc);
  results.push({ doc, score: dotProduct(queryEmbedding, docEmbedding) });
}

results.sort((a, b) => b.score - a.score);

for (const { doc, score } of results) {
  console.log(`${score.toFixed(4)}  ${doc}`);
}

A Note on Production

The code above embeds every document on every run. A real application would not do this.

You embed documents once, store them in a database, and only embed the user's question at query time. That is where vector databases enter the picture.

But the core logic is identical:

Convert text into vectors. Compare vectors. Retrieve the closest meaning.

✦ Two Phases: Indexing and Retrieval

A real application separates the work into two distinct moments.

✦Visual · Indexing and Retrieval

Embeddings work in two separate phases that happen at different times. Switch between them to see what each one does.

Happens once, before any user asks anything

📄

Documentssource

Your knowledge: articles, support docs, notes, codebases.

✂

Split into chunks

Long documents are cut into smaller, focused pieces. Each chunk covers one idea.

⬡

Create embeddings

Each chunk gets its own embedding. A document split into 50 chunks produces 50 separate vectors. Not one vector per document, one per chunk.

🗄

Store chunk + vector + sourcestored once

Each chunk is stored with its vector and its origin: which document it came from, which section, which page. This metadata is what lets you cite a source later.

Chunk embeddings are created ahead of time, not when the user asks. The heavy work is done in advance. At query time, only the question needs to be embedded.

One thing worth making precise here, because it is easy to imagine it wrong.

The embedding is not created once per document. It is created once per chunk.

Each stored chunk record looks like this:

JSON

{
  "text": "Structured outputs help AI responses become predictable objects.",
  "embedding": [0.021, -0.088, 0.134, ...],
  "source": "structured-outputs",
  "section": "JSON Is Not the Goal"
}

The document embeddings are created ahead of time. The question embedding is created at query time. The two phases run at different moments, for different reasons.

This separation is what makes the system fast and cost-effective in production.

Tomorrow, when we study RAG, we will take the retrieved chunks and pass them into the model so the answer is grounded in your own content.

That is the bridge: embeddings help you find. RAG helps you answer using what you found.

✦ Where Embeddings Fall Short

Embeddings are powerful. But a good builder must also understand the failure modes.

Similarity is a signal, not a guarantee.

Short queries carry less context.

Exact identifiers need exact search.

Bad chunks create bad retrieval.

Embeddings do not update themselves.

Embeddings work well for	Exact search works better for
Meaning-based search	Invoice numbers
Similar question matching	Error codes
Recommendations	User IDs
RAG context retrieval	Order identifiers
Concept clustering	Legal clause numbers

✦ Embeddings and the Context Window

Day 3 taught us something important: the context window is a fixed container with a hard limit. Everything the model reasons from must fit inside it.

Embeddings solve this by helping you choose what enters the context window.

Instead of sending the whole library, you send only the most relevant pages.

The goal is not to give the model more context. The goal is to give it better context.

This is one of the most important architectural shifts in AI application development. The model's intelligence is fixed. What you can improve is the quality of what you place in front of it.

✦Pause & Reflect

0/500

✦ What This Opens Up for Builders

Once you understand embeddings, you begin seeing AI features differently.

A document Q&A bot is not just a chatbot.

It is: document chunks, embeddings, retrieval, and answer generation working together.

A resume reviewer can go beyond a single prompt.

It can be: resume text, job description embedding, skill gap search, and structured feedback.

An AI learning coach can do more than answer questions.

It can use: learner history, concept embeddings, weak area retrieval, and a personalized next step.

A support assistant can stop guessing.

It can be: knowledge base articles, semantic search, and source-aware responses.

Embeddings are one of the first places where AI stops feeling like a chat window and starts feeling like infrastructure.

✦ Try It Yourself

Use the code from this lesson and test these queries against the same five documents:

I cannot access my account.
My bill has the wrong address.
Why does my UI move after loading?
How can AI answer from my own files?
How do components remember values?

Then add your own documents. Use notes from the previous Tapovan lessons:

Tokens are fragments of text that hold predictive value.
The context window is the fixed container of tokens the model can see.
Structured outputs make AI responses predictable for applications.
Prompts shape the probability space of the model's response.

Now ask:

How do I make AI output usable in my frontend?

The structured outputs note should rise toward the top.

When it does, pause for a moment.

You just built the first piece of a RAG system.

✦ Takeaway Summary

Concept	What It Means
Embedding	A numerical representation of text meaning
Vector	An array of numbers that places text in meaning space
Cosine similarity	A measure of how closely two vectors point in the same direction
Semantic search	Search based on meaning, not only exact words
Indexing	The pre-query phase: embed documents and store the vectors
Retrieval	The query phase: embed the question, find nearest vectors, return relevant chunks
RAG connection	Embeddings retrieve the context. The model generates an answer from that context.

✦ Learn More

OpenAI Embeddings Guide: the official reference for text-embedding-3-small and how to use it in production
MongoDB Atlas Vector Search: a practical guide to storing and querying embeddings in a real database
OpenAI Cookbook: Semantic Search with Embeddings: a working notebook that puts everything from this lesson into practice
Pinecone: Vector Embeddings Explained: a detailed look at how vectors are stored, indexed, and queried at scale

✦ ✦ ✦

A word is what we see on the surface. Meaning is the current beneath it.

Embeddings do not give the machine a human mind. But they give our software a way to sense nearness.

And once meaning has distance, search becomes something more than matching.

It becomes discovery.