Is OpenAI’s New Open Responses API: A Game Changer for Open Models?

Everyone is talking about OpenAI’s latest move to support open models, but we need to ask if this is a genuine helping hand or just a marketing strategy. In this lesson, we are going to explore the new “Open Responses” initiative, compare it with how Anthropic is doing things, and actually look at the code to see how you can build agents with it.

To understand why this matters, we first need to look at how we talk to Artificial Intelligence models. Until recently, if you wanted to build an app, you mostly used the standard OpenAI “chat completions” style. It was the rule everyone followed. However, technology moves fast, and now developers want more control. We have seen this with Gemini’s interactions API, and recently, Anthropic has become very popular. Many developers are switching to the Claude API because of tools like “Claude Code.” In fact, even Chinese model providers like Moonshot AI and ZAI are making their models compatible with Claude’s style just so they can work with these cool new coding tools. This created a bit of a problem for OpenAI because not everyone was using their standard anymore.

This is where the new “Open Responses” initiative comes in. OpenAI realized they could not force everyone to use their proprietary method, so they proposed a new standard designed specifically for open models. The goal is to make sure that whether you use a model from Hugging Face, Ollama, or VLLM, the way you send commands—especially for complex tasks like calling tools or analyzing images—remains the same. This is excellent news for us because it means we do not have to learn a completely new coding language for every single new robot brain that gets released. It is designed to be multi-provider by default, which means big community players like Hugging Face and OpenRouter are already supporting it.

Let’s dig into the technical details of how this standard actually works. The core concept here relies on an “agentic loop.” Instead of just sending a text and getting a text back, the system breaks things down into “Items.” An item can be a message, a function call, or even a reasoning state. This is very helpful because it allows the computer to track if a task is currently in progress or if it has been completed. For example, if you are asking the AI to solve a math problem, the API can now handle the “reasoning tokens”—which are the AI’s internal thoughts—in a standard way. Previously, extracting these thoughts from different open models required rewriting your code constantly. Now, the API supports both raw reasoning and summaries out of the box, making it much easier to see how the model arrived at an answer.

Another major feature of this standard is how it handles tools. We are moving toward a future where model providers act more like system providers. This means the tools might be hosted internally on the server, such as a sandbox for running code or a direct Google Search integration. The Open Responses standard supports these internal tools as well as external tools that you might build yourself. It also includes “tool choice,” which gives you control over whether the AI must use a tool, cannot use a tool, or can decide for itself. This level of control is something open models struggled with before, but this standard provides a blueprint for training them to handle these complex instructions reliably.

Now, let us look at how to implement this in code using Python and Hugging Face. To start, you would set up your client just like you normally would, but you will point it to a model supported by the Open Responses standard, such as the Kimmy K2 or Qwen models. Instead of the old method, you will call client.responses.create. In your code, you can define instructions and inputs, and enabling features like event-based streaming is very straightforward. You write the command to stream the response, and the API will send back events one by one. This allows you to see the text appearing on your screen in real-time. If you want to use tool calling, you simply define the tools in your request, and the model will return a “function calling item” with the name of the function and the arguments needed to run it.

If you prefer to run things locally on your own computer, you can use Ollama. The process is almost identical, which proves how useful this standard is. You would initialize your client by pointing it to your local host address, usually something like localhost:11434. You do not even need a real API key for Ollama; you can just put a placeholder text there. Once your client is ready, you can write a script to check if the specific model you have loaded supports the Open Responses format. If it does, you can run the exact same client.responses commands. You might notice that loading the model takes a moment if it is not already running, but once it is active, you can stream reasoning traces and tool calls right from your own machine. This bridges the gap between powerful cloud models and the private models running on your laptop.

In conclusion, the Open Responses initiative is a significant step forward for the open-source community. By creating a unified standard, it allows powerful open models to behave more like the top-tier proprietary ones, specifically regarding agentic workflows and tool usage. While some companies might still lean toward Anthropic’s style due to the popularity of Claude Code, having a robust standard supported by OpenAI, Hugging Face, and Ollama ensures that developers have a reliable way to build complex systems. I highly recommend you try running a local model with Ollama using this new format to see the “reasoning tokens” in action. It gives you a fascinating look into how the AI “thinks” before it speaks.