Foundations · Day 1
What is AI, Really?
Before we learn to build with AI, let us first understand who we are building with.
May 3, 2024·7 min read
✦ The Story of the Curious Mirror
In a quiet village, there was a magical mirror. You could walk up to it and say anything (a question, a worry, an idea) and it would respond like a wise old friend.
- Some villagers used it to find lost goats.
- Some used it to compose poetry.
- Some just sat with it when they felt lonely.
The mirror never judged. It simply listened… and responded.
This mirror is what AI feels like today.
It reflects what we say, reshapes it through the lens of all it has "read,"
and offers a response that feels almost human.
But here's the key truth:
AI is not intelligent the way a human is.
It has no understanding. No intent. No heart.
But it does have pattern memory, structure, and incredible response skill.
✦ So What Is AI? (Simple Definition)
AI is a system that uses data + logic to mimic human-like abilities, like:
- Answering questions
- Recognizing images
- Writing text
- Predicting actions
In our context, we're focusing on:
✨ Generative AI – AI that can create text, images, code, etc.
🤖 Large Language Models (LLMs) – models trained on huge amounts of text to predict the next word (think GPT-4).
✦ Analogy: The Brilliant Parrot
Think of LLMs like a parrot raised in a great library:
- It repeats what it has heard,
- Mixes things cleverly,
- Sometimes sounds deeply wise…
But it doesn't know anything. It has no self-awareness. All it has is structure + patterns.
It's your prompts, your intent, and your direction
that bring out wisdom from this parrot.
✦ What Powers AI Under the Hood?
So what makes a machine look intelligent?
Let's break it into three key components that power today's AI:
1. Massive Data + Patterns
LLMs like GPT-4 are trained on:
- Books
- Wikipedia
- Websites
- Code
- Research papers
- Conversations
Imagine a machine that has "read" more than any human ever has,
but doesn't understand a single word like you do.
Instead, it learns the patterns:
If sentence A ends with 'sun rises in', then 'the east' is likely to follow.
2. Neural Networks (Transformers)
At its heart is a mathematical structure called a Transformer, the architecture behind GPT, BERT, and others.
Think of a sentence as a string of beads.
The model first breaks this string into beads (tokens), each carrying a fragment of meaning.
Then it examines how each bead relates to every other, using something called attention.
Finally, it calculates: 👉 "Which bead should come next to keep this necklace coherent?"
In machine terms:
Let's try this:
Input: "Einstein once said,"
Output: "imagination is more important than knowledge."
The model doesn't "know" who Einstein is. It has simply seen that pattern before in the text it was trained on.
So it picks the most likely continuation based on token-level patterns.
3. Tokens, Not Words
Here's a key idea: AI doesn't understand words the way we do. It understands pieces of words, called tokens.
Token = a fragment of text that holds predictive value.
The model doesn't know what "words" are. It only recognizes useful fragments, based on patterns it has seen millions of times.
Some examples:
- "Einstein" → 1 token
- "Intelligence" → 2 tokens: Int, elligence
- "Unbelievable" → 3 tokens: Un, believ, able
Some tokens may look like gibberish (like elligence) but they exist because they help the model break long, common words into efficient chunks.
The model isn't trying to make sense. It's trying to save space.
The tokenizer doesn't care about grammar or meaning. It only cares about character patterns it has seen a lot.
Some words like "sun" or "light" are so common, the model sees them as single tokens.
But longer or uncommon words like "indivisible" may be broken into pieces, because the model has seen those pieces more often than the whole (there is some additional logic behind this, but we can circle back later. Read Byte Pair Encoding if interested).
It's like AI doesn't read words; it recognizes familiar fragments.
Tokens are the currency of LLMs - every prompt and every response spends them. Models have limits (4k, 8k, 32k tokens) and cost is per token too. Understanding this will help you write smarter prompts. We'll go deep on this in Day 3: Tokenization & Cost Awareness.
✦ How Does AI Know What Matters in a Sentence?
– Understanding "Attention" with Intuition and Insight
✦ Intuition: The Classroom Analogy
Imagine you're in a classroom. A teacher asks, "What is consciousness?" You hear that word and your mind scans:
- "Where have I heard this before?"
- "Oh, in that philosophy lecture!"
- "Also in that documentary last week…"
- "Also when Arjuna was confused."
- "And wasn't there something about awareness and thought?"
You're giving attention to certain past thoughts to understand the present. Your mind is not just reacting. It's assigning weight to memories. That's attention.
That's what attention layers in AI do: For every token, they ask: "Which previous tokens are most important to focus on?"
✦ A Simple Example: Attention Scores
Let's take a real sentence, and then show how attention plays out.
Sentence:
Arjuna stood in the battlefield, confused about his dharma.
Now the model wants to predict the next word after "dharma." Let's say we're trying to predict the next word: "Krishna"
"Confused" and "dharma" get the highest weights,
because they are most relevant to the next word.
So the model says:
Hmm… confusion about dharma?
Oh! I've seen this before. Likely continuation is: 'Krishna said…'
This is how AI uses attention:
It doesn't read left to right. It creates a map of relevance, then predicts.
Click any word to see which tokens it focuses on most.
Weights are illustrative. Real transformer attention is computed across hundreds of dimensions simultaneously.
🧠 Technically Speaking
- Attention assigns scores to every pair of tokens
- These scores say how relevant one token is to another
- Output = weighted sum of these connections
- This allows the model to see the whole sentence at once and decide what matters most
Simple Takeaway:
Attention = AI deciding what parts of the sentence matter most
It lets the model focus – like a wise reader who reads between the lines.
In the sentence 'Arjuna stood in the battlefield, confused about his dharma', why do you think 'confused' and 'dharma' scored higher attention than 'battlefield' or 'Arjuna', even though those are equally key parts of the scene?
✦ Takeaway Summary
| Concept | Insight |
|---|---|
| AI | Mimics intelligence using patterns, not understanding |
| LLM | Trained on huge data to predict next word/token |
| Transformer | Neural network architecture using attention |
| Tokens | Building blocks of language inside models |
| Who is the driver? | You, the prompt designer |
✦ Try It Yourself
Explore how AI sees your words:
Open the OpenAI tokenizer playground and paste a favourite quote, a code snippet, or a poem. Notice how the tokens break, how cost grows, and how LLMs "see" your input.
Try your first prompt:
Use OpenAI, Poe, or any LLM playground and run these two prompts back to back:
Notice how the tone shifts completely while the underlying knowledge stays the same. That gap between knowledge and delivery is exactly where your role as a prompt designer lives.
✦ Learn More (For Developers)
The river remembers every bend, just as AI remembers every pattern. But it is your intent, quiet yet powerful, that decides where the current will go.