Your First API Call: Where Understanding Becomes Real

✦ Before a Single Line of Code

At first, AI is something you visit.

You open Claude.ai, ChatGPT, or Gemini. You type a question. You wait. A response appears.

You are using AI, but you are standing outside the system. Someone else has already made every decision: which model receives your message, how many tokens the response can use, whether streaming is on, what happens when the model fails, how the output is rendered.

The interface handles all of that. You see only the front door.

When you call an API yourself, you step inside.

Every one of those decisions is now yours. You choose the model. You set the token limit. You decide whether to stream. You handle what happens when something goes wrong.

The intelligence is no longer floating somewhere inside a polished product. It is connected to your application through a request that you write.

This is the difference between a passenger and a driver. Both reach the destination. Only one understands the route.

At its core, the shape of an API call is simple:

Your Code
   → sends a request with your message and settings
AI Provider
   → model generates a response, token by token
Your Code
   → receives it and decides what to do next

That is the whole structure. Everything else fills in from there. And once you understand the shape, the rest of AI application development becomes much less mysterious.

✦ What the Model Actually Receives

Before we write anything, let's look at what we are actually sending.

Every conversation with a language model is structured as a list of messages. Each message has exactly two things: a role and content.

role: "user"
content: "What is the most important thing to understand about AI?"

The role tells the model who is speaking. It can be user (you), assistant (the model), or system (background instructions the model follows throughout the conversation).

The content is the actual text of that message.

When you send a multi-turn conversation, you send the full history:

[
  { role: "user",      content: "What is the most important thing about AI?" },
  { role: "assistant", content: "Probably that it learns from data..." },
  { role: "user",      content: "Can you say more about that?" }
]

The model sees this entire list, in order, every single time. It has no other memory. There is no context carried between API calls. There is no "it already knows who I am." It knows only what you put in the messages array.

This should feel familiar. In Day 3, we talked about the context window: that single fixed container holding everything the model sees.

The messages array is exactly that container, made visible.

✦ What Your Request Needs

Every call to a language model needs four things. None of these are optional.

An API key. This is how the provider knows the request is from your account. Think of it as a private access pass.

It belongs on the server side, inside an environment variable, never in code you share or commit to git. We will set this up properly in a moment.

A model. Your code explicitly names which model receives your message. Different models have different speeds, costs, and capabilities. Some reason better. Some are faster. Some cost less.

For this first call, we will use claude-opus-4-7. What matters now is that your code chooses. It does not happen automatically.

A messages array. This is the conversation the model sees. For a first call, a single user message.

In a real application, this same array can hold full conversation history, system instructions, retrieved documents, and tool results. The model's entire understanding of the situation comes from this array and nothing else.

A token budget. The model does not produce unlimited output. You set max_tokens to define how long the response can be.

This is the same ledger Day 3 described: a fixed container, a hard ceiling. Without a budget, AI applications become unpredictable and expensive. Setting it is not a technical detail you skip.

✦ Experience It First

Before we look at code, experience the call directly. Type something you actually want to know. Press send. Watch what comes back.

✦Interactive · Your First API Call

Type something you genuinely want to know. Press Send. Watch what happens.

Cmd+Enter to send

Take a moment with what you just saw.

✦ What You Just Witnessed

That response did not appear all at once. It arrived piece by piece, each fragment pushed to you as soon as it was generated.

In Day 2, we said language models generate one token at a time. Each token conditioned on everything before it. No word decided before the word that precedes it.

You just watched that happen.

Streaming is not only a user experience trick. It reveals something true about how language models work.

A model does not begin with a complete paragraph sitting inside it. It generates the next likely token based on the context it holds. Then the next, based on the updated context. Then the next.

This continues until the response is complete, the max_tokens limit is reached, or the connection closes.

In a non-streaming call, this generation still happens the same way, token by token. But the server waits until all tokens are ready and sends the full response at once. You see nothing until it is finished.

Streaming simply lets each fragment cross the wire as it is produced. The intelligence is the same. The delivery is different.

✦Delivery comparison

Without streaming

Press play…

With streaming

Press play…

The token counter in the component above was an approximation, roughly one token per four characters. But the principle was real. Every fragment you saw was generated and immediately sent, before the model had produced the rest.

Every token you received counts against the max_tokens budget you set.

✦ Now, the Code

Before you read anything below, do one thing first.

Open the Anthropic quickstart in a new tab: platform.claude.com/docs/en/get-started

Scroll to the code example. You will see the program this section is about to explain, already written and ready to copy.

This is how it works with every major AI provider. OpenAI publishes the same starting code in their quickstart. Anthropic does too.

The code already exists in the documentation. You do not construct it from scratch. You find it, copy it, and then you understand it.

That shift matters. You are not expected to invent this. You are expected to read it.

Two versions, one idea

There are two ways to call the model. They use the same four ingredients. They differ in how the response arrives.

The first is simpler to understand:

JavaScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ apiKey: "your-key-here" });

const message = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 500,
  messages: [
    { role: "user", content: "What is the most important thing to understand about AI?" }
  ]
});

console.log(message.content[0].text);

This waits for the complete response, then prints it. message.content[0].text reaches into the response object to get the text.

The model is still generating token by token behind the scenes. Your code just does not see anything until the full answer arrives.

The second version is what you experienced in the component above:

first_call.js

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ apiKey: "your-api-key-here" });

const stream = client.messages.stream({
  model: "claude-opus-4-7",
  maxTokens: 1024,
  messages: [
    { role: "user", content: "What is the most important thing to understand about AI?" },
  ],
});

for await (const text of stream.textStream) {
  process.stdout.write(text);
}

The tabs show Node.js and Python. Pick whichever language is home for you. The walkthrough below follows Node.js. If you are reading the Python tab, every concept maps directly. Only the syntax is different.

import Anthropic from "@anthropic-ai/sdk"

Think of this as hiring a translator. Without the SDK, you would hand-craft raw HTTP requests, parse a streaming response byte by byte, and handle every network edge case yourself.

The SDK collapses all of that into a clean interface. The first line of almost every project that talks to an external API looks exactly like this: import the library that handles the plumbing.

const client = new Anthropic({ apiKey: "..." })

This creates your client: a configured, ready-to-use connection to Anthropic's servers. Every call you make will flow through this object. The apiKey is how Anthropic authenticates your requests. Notice the placeholder value. We are about to give it a proper home.

One thing to understand before you put a real key anywhere: API keys in public code get harvested by automated bots within minutes. Not hours. Minutes.

Bots scan GitHub continuously. If a key lands in a commit, you will see unexpected charges and have to rotate it immediately. Storing a key safely is not optional style advice. We will handle it in the next section.

client.messages.stream({ model, maxTokens, messages })

This is the call itself. Three things to understand:

model: "claude-opus-4-7" tells Anthropic which model to run your message against. Different models have different speeds, capabilities, and costs. Later articles will cover model selection in depth.

maxTokens: 1024 sets the output budget. The model generates up to this many tokens, then stops, whether or not its thought is complete. This is the same ledger Day 3 described: a fixed container, a hard ceiling.

messages: [...] is the conversation history you are sending. A single user message here. As a conversation grows, you add to this array and the model sees the full history on every call. There is no memory between calls. Only what you put here.

for await (const text of stream.textStream)

This is where you start seeing streaming in action. Instead of waiting for the full response, the model sends small pieces of text as it generates them.

process.stdout.write(text) prints each piece immediately, without moving to a new line. So the response appears gradually, building on your screen word by word.

You watched this happen in the live component above. Now you see the two lines that produce it.

That is the entire program. Everything more advanced is built on exactly this foundation.

✦ Setting Up

Two things need to be in place before this runs: an API key, and a safe place to keep it.

Getting your key

Go to console.anthropic.com, create an account, and generate an API key. It will look like sk-ant-api03-.... Copy it somewhere temporary. We are about to give it a proper home.

The .env file

Here is how real projects store secrets. Not just this project. Every project.

In the same folder as your code, create a file called .env:

ANTHROPIC_API_KEY=sk-ant-api03-your-key-here

No quotes. Just the key name, an equals sign, and the value.

Now create a .gitignore file in the same folder and add .env as the first line.

This tells git to never include that file in a commit. Your key stays on your machine, out of your version history, and out of any repository you push to.

Here is what this changes in your code. The Anthropic SDK reads ANTHROPIC_API_KEY from your environment automatically. So your client initialization becomes:

JavaScript

const client = new Anthropic();

No key in the code. The code is safe to share. The secret stays in .env, which stays on your machine.

This pattern, .env for secrets and .gitignore to exclude it, is how almost every project that touches an external API works. You will use it with every provider you ever work with, not just Anthropic.

Installing and running

Bash

npm install @anthropic-ai/sdk

Save your code as first_call.mjs. The .mjs extension tells Node.js you are using modern JavaScript module imports. Then run it, passing your .env file:

Bash

node --env-file=.env first_call.mjs

The --env-file flag is built into Node.js version 20 and later. Run node --version to check. If you are behind, nvm install --lts will get you current.

You should see the response stream in, token by token.

If you are working in Python

Bash

pip install anthropic python-dotenv

Add this to the top of your file:

Python

from dotenv import load_dotenv
load_dotenv()

The rest of the .env setup is identical. The Python SDK reads ANTHROPIC_API_KEY automatically once the env is loaded.

✦ What Can Go Wrong

A good builder does not only ask "how do I make it work?" They also ask "how can this fail?"

The key is missing or misspelled. If your .env file is absent or the variable name is wrong, the SDK cannot authenticate. You will see an error before the model is ever reached. Check spelling first.

The model name is outdated. Model names change. An old name from a tutorial written last year may no longer exist. A practical habit: store the model name in an environment variable so you can update it in one place without touching code.

The prompt is too vague. The model may respond, but the response may not be useful. This is not a network failure. It is an instruction failure. In AI applications, prompt quality is part of product quality.

The output is cut off. If max_tokens is too low, the model stops before completing its thought. This is not hallucination. It is a budget limit. If answers feel truncated, raise the limit.

The stream breaks midway. Network conditions or provider load can interrupt streaming. Production applications need retry logic, graceful error states, and logging.

The goal is not to fear these. The goal is to see them as normal engineering problems. Once AI enters your application, it must be treated like software: tested, observed, and handled when things go sideways.

✦ Try This

Do not move to the next article immediately. Run the same program a few more times with these experiments.

✦Try This

Pick an audience. The prompt updates to match. Edit it freely, then press Send and watch how the answer adapts.

✦ Build the Full Version

The code in this article runs in a terminal: you send a message, the response streams in, and the program exits.

The natural next step is a real chat interface: a React app where messages accumulate, streaming happens in the browser, and the experience looks like what you use every day.

That project is already built and ready to clone.

◆Mini Project

ReactViteExpressOpenAI

Streaming Chat App

A minimal GPT-like chat app that shows everything from this article running in a real browser UI. Messages stream in token by token, conversation history is maintained, and the OpenAI API does the work.

◆Streaming via Server-Sent Events: each token arrives as it is generated
◆Full conversation history sent on every request, the way the API actually works
◆API key kept server-side only, never exposed to the browser
◆Clean separation between the Express backend and the React frontend

ayushrajsd/ai-stream-chat↗

Not ready for a local environment yet?

Visit console.anthropic.com and open the Workbench. You can send messages directly to the model, adjust max_tokens, and watch streaming responses without writing any code. It is the fastest way to observe the model before wiring it into your own project.

✦ What This Opens Up

Until now, understanding AI and building with it were separate worlds. You knew how it worked, but hadn’t yet stepped into it. That gap is now closed.

Once your code can speak to a model, you are no longer only a user of AI. You are beginning to become a builder.

The same pattern you just used, input, context, model, output, is the foundation of every AI-powered product. A resume reviewer, a code explainer, a research assistant, a writing coach: they all begin with exactly what you did today.

The sophistication comes later. The foundation starts here.

Every concept from Days 1 through 4 is now something you can test and build with.

Curious how different prompts change the response? Change the messages array and run it again. Want to see what happens when you set max_tokens to 10? Try it. Wondering what a system prompt does? Add one and observe.

The remaining articles in this series will introduce more powerful ideas: structured outputs, retrieval, memory, agents. All of them are built on exactly this. An API call with messages, a model, and a token budget.

You now have the foundation.

✦Pause & Reflect

The model you just interacted with does not inherently remember you. It sees only the text placed in front of it in this moment unless a system chooses to store and reintroduce that context later. And yet, something meaningful came back. Where do you think the intelligence actually lives: in the model, in the prompt, or in the interaction between them?

0/500

✦ Learn More

Anthropic API: Getting Started: the official guide to authentication, your first call, and core concepts
Streaming with Claude: a deeper look at how streaming works and when to use it
Anthropic Node.js SDK: the source code and full reference for every method you will use

✦ ✦ ✦

The clay holds many forms, just as the model holds many patterns. But it is your intent, quiet yet steady,

that decides what begins to emerge.

✦ Before a Single Line of Code

At first, AI is something you visit.

You open Claude.ai, ChatGPT, or Gemini. You type a question. You wait. A response appears.

The interface handles all of that. You see only the front door.

When you call an API yourself, you step inside.

Every one of those decisions is now yours. You choose the model. You set the token limit. You decide whether to stream. You handle what happens when something goes wrong.

The intelligence is no longer floating somewhere inside a polished product. It is connected to your application through a request that you write.

This is the difference between a passenger and a driver. Both reach the destination. Only one understands the route.

At its core, the shape of an API call is simple:

Your Code
   → sends a request with your message and settings
AI Provider
   → model generates a response, token by token
Your Code
   → receives it and decides what to do next

That is the whole structure. Everything else fills in from there. And once you understand the shape, the rest of AI application development becomes much less mysterious.

✦ What the Model Actually Receives

Before we write anything, let's look at what we are actually sending.

Every conversation with a language model is structured as a list of messages. Each message has exactly two things: a role and content.

role: "user"
content: "What is the most important thing to understand about AI?"

The role tells the model who is speaking. It can be user (you), assistant (the model), or system (background instructions the model follows throughout the conversation).

The content is the actual text of that message.

When you send a multi-turn conversation, you send the full history:

[
  { role: "user",      content: "What is the most important thing about AI?" },
  { role: "assistant", content: "Probably that it learns from data..." },
  { role: "user",      content: "Can you say more about that?" }
]