In the world of large language models, context is everything. The OpenAI Chat Completion API provides a structured way to maintain context and personality through the concept of “roles.” By assigning messages a system, user, assistant, or function role, and by carefully managing how you send these messages to the API, you ensure that the model understands not only what’s being asked, but also how it should respond.
For prompt engineers, mastering these roles and context-management techniques can transform a run-of-the-mill conversation into a rich, coherent dialogue that feels like it’s happening in real-time. Let’s break down what roles are, why they matter, and how to maintain conversation context effectively.
Roles: The Four Pillars of Conversation
1. system:
Think of the system role as the conversation’s guiding star. This initial message sets the stage, establishing the model’s persona, style, and constraints. It’s like the director’s note to the actors before the play begins.
For example:
{
"role": "system",
"content": "You are a helpful assistant that answers questions about travel."
}
This sets the environment and behavior rules that persist throughout the conversation. The model remembers to be a travel expert not just once, but throughout the entire session—as long as you keep including this system message in every API call.
2. user:
The user role represents the person interacting with the assistant. These messages contain questions, requests, or prompts.
{
"role": "user",
"content": "Can you recommend some places to visit in France?"
}
User messages drive the conversation forward. The assistant is always trying to address the user’s needs, guided by the rules established in the system message.
3. assistant:
The assistant role is the model’s response. Here’s where the assistant speaks in its own voice, incorporating the system message’s style guidelines and the user’s requests.
{
"role": "assistant",
"content": "You might enjoy visiting Paris for its museums and cafes, the Loire Valley for its chateaux, and the French Riviera for beautiful beaches."
}
This response is influenced by all previous messages—especially the system prompt that told it to act as a travel expert.
4. function:
The function role adds an extra layer of sophistication. It appears when the assistant calls an external function to fetch structured data or perform a specialized task.
For instance, if your assistant wants to look up current flight prices, it might say:
{
"role": "assistant",
"content": "I'm calling a function to find the best flight deals to Paris."
}
And the returned data might come back like:
{
"role": "function",
"name": "get_flight_deals",
"content": "..."
}
This keeps external calls and responses neatly organized within the conversation, making the entire dialogue context-aware and streamlined.
How the Model Maintains Context
You might be wondering: how does the model remember the system prompt and all the previous messages? The answer lies in how you structure your API calls. The model doesn’t have persistent, built-in memory. Instead, you supply the entire conversation history with each request.
Here’s how it works:
1. Initial System Prompt
Your first message in messages is usually a system prompt. It sets the stage for the entire conversation. Every subsequent API call should re-include this system message so the model continues to “remember” it.
{
"role": "system",
"content": "You are a helpful assistant that answers questions about travel."
}
2. Adding User Messages
When the user asks something, you append their query to the conversation array:
{
"role": "user",
"content": "Can you recommend some places to visit in France?"
}
The model now sees both the system message and this user message. It understands it should respond like a helpful travel assistant and address the user’s query.
3. Model’s Response (Assistant Messages)
When you send this full message history to the API, the model generates an answer that respects the system prompt and considers the user’s request:
{
"role": "assistant",
"content": "You might enjoy visiting Paris for its museums and cafes..."
}
4. Memory Through the Entire Thread
Every time the user asks another question, you include the entire conversation history—the original system prompt, previous user messages, and the assistant’s responses—when making the next API call. By doing this, the model “sees” all prior context and instructions, allowing it to maintain a coherent narrative across multiple turns.
5. No Separate Memory Storage
Crucially, the model does not store this conversation state anywhere on the server side. Every new API call is stateless. It’s up to you to re-send the entire message array so the model can behave consistently and “remember” what was said before.
Practical Steps: Managing the Conversation History
Since the API is stateless, you handle memory on the application side. Here’s how a typical workflow might look in code:
Initialization with a System Message:
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that always responds politely."
}
]
}
User’s First Prompt:
messages.push({ role: "user", content: "Can you tell me about quantum computing?" });
Send Entire History to the API:
{ "response": { "model": "gpt-3.5-turbo", "messages": "messages" } }
Add Assistant’s Response to History:
messages.push({ role: "assistant", content: response.choices[0].message.content });
Now, your messages array includes the system message, the user’s query, and the assistant’s reply. The next time the user asks another question, you append it to messages and send the entire array again.
For Subsequent Requests:
messages.push({ role: "user", content: "How does it differ from classical computing?" });
{
"response": {
"model": "gpt-3.5-turbo",
"messages": "messages"
}
}
messages.push({ role: "assistant", content: response.choices[0].message.content });
As the conversation grows, you build a transcript. This transcript (array of messages) is how the model “remembers” its role, the initial instructions, and all previous exchanges.
If the conversation becomes very long, you can consider strategies like truncating older parts of the conversation or summarizing previous segments to save tokens—just remember to keep the system message and enough context to maintain coherence.
Why This Matters for Prompt Engineering
As a prompt engineer, the power to manage roles and context puts you in the director’s seat. Here’s what you gain:
• Reduced Ambiguity: By clearly labeling who’s who and consistently including the system message, the model knows how to behave and to whom it should respond.
• Improved Consistency: Maintaining context ensures that your model’s tone, style, and focus remain steady throughout a multi-turn conversation.
• Enhanced Relevance: With every message in the conversation array, the model can refer back to earlier statements, follow the established narrative, and generate more relevant responses.
• Scalable Complexity: By using function calls, you can integrate external data sources without losing the thread of the dialogue.
Conclusion
Mastering the roles of system, user, assistant, and function, along with learning how to manage and resend the entire conversation history with each new API call, is the key to creating rich, context-aware interactions. This approach ensures the model continuously “remembers” the initial instructions and the conversation’s history, allowing it to produce coherent, on-brand responses across multiple turns.
With these techniques, you’re not just writing prompts; you’re crafting experiences. You’re empowering the model to become a reliable guide, whether it’s a travel planner, a math tutor, or a friendly expert on quantum computing. As you fine-tune your approach, you’ll find that the right combination of roles, careful prompting, and context management can lead to truly engaging and intelligent dialogues.
📚 Further Reading & Related Topics
If you’re exploring OpenAI’s Chat Completion API and prompt engineering, these related articles will provide deeper insights:
• Mastering ChatGPT Prompt Frameworks: A Comprehensive Guide – Learn structured approaches to crafting effective prompts and improving AI responses.
• Ensuring Security and Cost Efficiency When Using OpenAI API with SpringAI – Discover best practices for optimizing API usage, reducing costs, and securing AI-driven applications.









Leave a reply to The Evolution from Prompt to Context Engineering Explained – Scalable Human Blog Cancel reply