How to use the OpenAI API for beginners is one of those topics where the answer is shorter than the question. The Python library is two lines of setup, the first call is five more, and you have working code that talks to GPT in under a minute. The harder part isn’t the code – it’s understanding which model to use, which parameters actually matter, how to handle multi-turn conversations cleanly, and how to avoid surprise bills as you scale from prototype to real usage.
I’ve onboarded a few developers to the OpenAI API over the past year, and the same patterns help every time. What follows is the working beginner’s guide: get your key, install the library, make your first call, handle conversations, stream responses, and understand what’s actually driving cost. Everything you need to start building with OpenAI’s API in Python, without the bloat that fills most beginner tutorials.
Quick answer: using the OpenAI API in Python
Install the openai Python library with pip install openai, set your API key as the OPENAI_API_KEY environment variable, and call client.chat.completions.create() with a model name and a list of messages. Start with gpt-5-mini to keep costs low while learning. The minimum working call is about 5 lines of Python. Once you’re past the basics, add multi-turn conversations by appending messages, streaming for better UX, and structured outputs when you need JSON responses.
Get your API key and install the library
Sign in at platform.openai.com, navigate to API Keys, and create a new key. Copy it immediately because it won’t be shown again. Add it to your environment:
export OPENAI_API_KEY="sk-your-key-here"
You’ll also want to add billing information. OpenAI gives free trial credits to new accounts, but they expire and the API stops working after they’re consumed. Set a low usage limit on the billing page (something like $20) while you’re learning – this prevents accidentally running up a bill if your code loops unexpectedly.
Install the Python library:
pip install openai
That’s all the setup. The library reads your API key from the environment variable automatically when you create a client.
Your first OpenAI API call
The minimum working call looks like this:
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-5-mini",
messages=[
{"role": "user", "content": "What is the OpenAI API in one sentence?"}
],
)
print(response.choices[0].message.content)
That’s the whole pattern. Run it and you get a response back in a second or two. The messages list is the conversation – in this case just one user message. The role field is user for things the user said, assistant for things the model said, and system for instructions about how the model should behave.
The model parameter picks which OpenAI model handles the request. In 2026, the practical options are gpt-5 (the most capable flagship), gpt-5-mini (cheaper and faster, good enough for most tasks), gpt-5-nano (cheapest, simpler tasks), and the reasoning models like o3 and o4-mini for problems requiring deep thinking. Start with gpt-5-mini while learning – it’s cheap enough that you can iterate freely without worrying about the bill.
The response object contains more than just the text. response.choices[0].message.content is the text. response.usage has token counts (input, output, total). response.id is the unique identifier for the call, useful for debugging or referencing later.
Multi-turn conversations
A single API call is stateless – the model doesn’t remember previous calls. To build a conversation, you append each response to your messages list and send the growing list with each call:
messages = [
{"role": "system", "content": "You are a helpful assistant who answers concisely."},
{"role": "user", "content": "What's the capital of France?"},
]
response = client.chat.completions.create(model="gpt-5-mini", messages=messages)
assistant_message = response.choices[0].message
# Append the assistant's response to keep the conversation going
messages.append(assistant_message)
messages.append({"role": "user", "content": "What's its population?"})
response = client.chat.completions.create(model="gpt-5-mini", messages=messages)
print(response.choices[0].message.content)
The system message at the top sets the assistant’s behavior for the whole conversation. The user/assistant alternation builds the conversation history. Each new call sends the full history because the API has no server-side memory of past calls.
The cost implication: longer conversations cost more per turn because you’re sending the entire history as input tokens every time. For chat applications that need long-running conversations, you’ll eventually need to summarize old turns or implement context windowing. For most beginner work, the natural conversation lengths fit comfortably within the model’s context window.
Parameters worth knowing
A few parameters beyond model and messages actually matter in practice.
temperature controls how deterministic the output is. Values run from 0 to 2. At 0, the model picks the most likely token every time – useful for tasks where you want consistent outputs like classification or structured generation. At 1 (the default), there’s natural variation. Above 1, output gets more random and creative. For learning, leave it at the default.
max_tokens caps the response length. Useful for limiting cost on tasks where you don’t need long responses. Leaving it unset lets the model respond up to its full output limit.
response_format={"type": "json_object"} forces the model to return valid JSON. Combined with a system prompt instructing the format, this is the easiest way to get structured outputs. For more rigorous JSON schemas, use response_format={"type": "json_schema", "json_schema": {...}}.
stop specifies sequences that should end generation. Useful when you want the model to stop at a specific delimiter rather than running to its natural endpoint.
Most other parameters (top_p, frequency_penalty, presence_penalty, etc.) exist but aren’t worth tuning until you’ve hit a specific problem they solve. Default values are sensible for most beginner work.
Streaming responses
For chat-style UX where you want the response to appear as it’s generated rather than waiting for the full response, use streaming:
stream = client.chat.completions.create(
model="gpt-5-mini",
messages=[{"role": "user", "content": "Tell me a short story."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
The stream yields chunks as the model generates them. Each chunk has a delta field containing the new content since the last chunk. Concatenating all chunks gives you the full response.
Streaming feels dramatically better for user-facing chat. The same response that takes 5 seconds to arrive in full appears starting at 200ms, with text flowing as the model generates it. For backend processing where you only need the final result, skip streaming.
Cost and common gotchas
OpenAI charges per token – roughly 4 characters of English per token. Input tokens (your prompt) cost less than output tokens (the model’s response). Pricing varies by model: gpt-5-mini is typically around $0.20 per million input tokens and $0.80 per million output, while gpt-5 runs $2-5 per million tokens depending on the variant. Reasoning models like o3 cost more because they generate internal reasoning tokens you pay for even though you don’t see them.
For beginner experimentation, costs stay under a few dollars even with extensive trial-and-error. For production usage, monitor token usage carefully via response.usage and set spending alerts on your account.
The common gotchas that bite first-time users:
Forgetting that history compounds. Each turn of a conversation sends the entire history. A 20-turn conversation costs 10x what the first turn costs because you’re paying for the growing message list every time.
Confusing model names. gpt-5 and gpt-5-mini look similar but cost 10x different. Always double-check which model your code is calling.
Hardcoding API keys. Never commit client = OpenAI(api_key="sk-...") to git. Use environment variables. If you accidentally commit a key, revoke it immediately – bots scan GitHub for exposed keys within minutes.
Ignoring rate limits. New accounts have low rate limits. Tier-up by adding billing and using the API consistently. Hitting limits returns 429 errors that should be retried with exponential backoff.
FAQ
If you’ve onboarded other developers to the OpenAI API and have honest notes on what tripped them up most – the rate limits, the model selection, the cost surprises – that writeup is worth sharing. Beginner tutorials cover the happy path well; the production friction points where new users actually struggle are scarce in published material.