Skip to content
Tutorial emka
Menu
  • Home
  • Debian Linux
  • Ubuntu Linux
  • Red Hat Linux
Menu
openresponse in agentic ai

Is OpenAI’s New Open Responses API: A Game Changer for Open Models?

Posted on January 20, 2026

Everyone is talking about OpenAI’s latest move to support open models, but we need to ask if this is a genuine helping hand or just a marketing strategy. In this lesson, we are going to explore the new “Open Responses” initiative, compare it with how Anthropic is doing things, and actually look at the code to see how you can build agents with it.

To understand why this matters, we first need to look at how we talk to Artificial Intelligence models. Until recently, if you wanted to build an app, you mostly used the standard OpenAI “chat completions” style. It was the rule everyone followed. However, technology moves fast, and now developers want more control. We have seen this with Gemini’s interactions API, and recently, Anthropic has become very popular. Many developers are switching to the Claude API because of tools like “Claude Code.” In fact, even Chinese model providers like Moonshot AI and ZAI are making their models compatible with Claude’s style just so they can work with these cool new coding tools. This created a bit of a problem for OpenAI because not everyone was using their standard anymore.

This is where the new “Open Responses” initiative comes in. OpenAI realized they could not force everyone to use their proprietary method, so they proposed a new standard designed specifically for open models. The goal is to make sure that whether you use a model from Hugging Face, Ollama, or VLLM, the way you send commands—especially for complex tasks like calling tools or analyzing images—remains the same. This is excellent news for us because it means we do not have to learn a completely new coding language for every single new robot brain that gets released. It is designed to be multi-provider by default, which means big community players like Hugging Face and OpenRouter are already supporting it.

Let’s dig into the technical details of how this standard actually works. The core concept here relies on an “agentic loop.” Instead of just sending a text and getting a text back, the system breaks things down into “Items.” An item can be a message, a function call, or even a reasoning state. This is very helpful because it allows the computer to track if a task is currently in progress or if it has been completed. For example, if you are asking the AI to solve a math problem, the API can now handle the “reasoning tokens”—which are the AI’s internal thoughts—in a standard way. Previously, extracting these thoughts from different open models required rewriting your code constantly. Now, the API supports both raw reasoning and summaries out of the box, making it much easier to see how the model arrived at an answer.

Another major feature of this standard is how it handles tools. We are moving toward a future where model providers act more like system providers. This means the tools might be hosted internally on the server, such as a sandbox for running code or a direct Google Search integration. The Open Responses standard supports these internal tools as well as external tools that you might build yourself. It also includes “tool choice,” which gives you control over whether the AI must use a tool, cannot use a tool, or can decide for itself. This level of control is something open models struggled with before, but this standard provides a blueprint for training them to handle these complex instructions reliably.

Now, let us look at how to implement this in code using Python and Hugging Face. To start, you would set up your client just like you normally would, but you will point it to a model supported by the Open Responses standard, such as the Kimmy K2 or Qwen models. Instead of the old method, you will call client.responses.create. In your code, you can define instructions and inputs, and enabling features like event-based streaming is very straightforward. You write the command to stream the response, and the API will send back events one by one. This allows you to see the text appearing on your screen in real-time. If you want to use tool calling, you simply define the tools in your request, and the model will return a “function calling item” with the name of the function and the arguments needed to run it.

If you prefer to run things locally on your own computer, you can use Ollama. The process is almost identical, which proves how useful this standard is. You would initialize your client by pointing it to your local host address, usually something like localhost:11434. You do not even need a real API key for Ollama; you can just put a placeholder text there. Once your client is ready, you can write a script to check if the specific model you have loaded supports the Open Responses format. If it does, you can run the exact same client.responses commands. You might notice that loading the model takes a moment if it is not already running, but once it is active, you can stream reasoning traces and tool calls right from your own machine. This bridges the gap between powerful cloud models and the private models running on your laptop.

In conclusion, the Open Responses initiative is a significant step forward for the open-source community. By creating a unified standard, it allows powerful open models to behave more like the top-tier proprietary ones, specifically regarding agentic workflows and tool usage. While some companies might still lean toward Anthropic’s style due to the popularity of Claude Code, having a robust standard supported by OpenAI, Hugging Face, and Ollama ensures that developers have a reliable way to build complex systems. I highly recommend you try running a local model with Ollama using this new format to see the “reasoning tokens” in action. It gives you a fascinating look into how the AI “thinks” before it speaks.

Leave a Reply Cancel reply

You must be logged in to post a comment.

Recent Posts

  • What is Reflex Framework? A Full-stack Python Framework
  • CloudFlare Acquired AstroJS!
  • How to Completely Remove AI Features from Windows 11 Explained
  • How to AI Fine-Tuning with a New Red Hat’s New Modular Tools
  • When to Use ChatGPT, Gemini, and Claude for Beginners
  • The Complete Roadmap to Becoming a Data Engineer: From Beginner to Pro Explained
  • Is OpenAI’s New Open Responses API: A Game Changer for Open Models?
  • The Top 5 Tech Certifications You Need for 2026 Explained
  • X.509 Certificates Explained for Beginners
  • How to Create a Local User on Windows 11: Bypass the Online Account Requirement Easily
  • Ini Kronologi Hacking ESA (European Space Agency) 2025
  • Apa itu Zoom Stealer? Ini Definisi dan Bahaya Tersembunyi di Balik Ekstensi Browser Kalian
  • Apa itu Skandal BlackCat Ransomware?
  • Grain DataLoader Python Library Explained for Beginners
  • Controlling Ansible with AI: The New MCP Server Explained for Beginners
  • Is Your Headset Safe? The Scary Truth Bluetooth Vulnerability WhisperPair
  • Dockhand Explained, Manage Docker Containers for Beginners
  • Claude Co-Work Explained: How AI Can Control Your Computer to Finish Tasks
  • Apa itu ToneShell? Backdoor atau Malware Biasa?
  • Apa itu Parrot OS 7? Ini Review dan Update Terbesarnya
  • NVIDIA Rubin Explained: The 6-Chip Supercomputer That Changes Everything
  • What is OpenEverest? The Future of Database Management on Kubernetes
  • T3g: Code is Cheap Now, Software Isn’t
  • Is the New $130 Raspberry Pi AI Hat+ 2 Worth Your Allowance? A Detailed Review
  • Create AI Voices on Your CPU: Pocket TTS Explained for Beginners
  • Caranya Mengatasi Kode Verifikasi PayPal yang Nggak Pernah Nyampe di HP
  • Inilah Cara Cek Pencairan KJP Plus Januari 2026 Biar Nggak Bingung Lagi
  • Inilah Cara Cek Dana PIP yang Cair Senin 19 Januari 2026 Lewat HP!
  • Ingin Kuliah Gratis di 2026? Ini Cara Daftar KIP Kuliah via HP dan Syarat Lengkapnya!
  • Inilah Cara Cek Status KIS Bansos Aktif Secara Instan Lewat Smartphone Kamu!
  • Cara Membuat AI Agent Super Cerdas dengan DeepAgents dan LangGraph
  • Perbedaan GPU vs TPU, Mana yang Terbaik
  • Tutorial Langfuse: Pantau & Optimasi Aplikasi LLM
  • Begini Teknik KV Caching dan Hemat Memori GPU saat Menjalankan LLM
  • Apa itu State Space Models (SSM) dalam AI?
  • Ini Kronologi Hacking ESA (European Space Agency) 2025
  • Apa itu Zoom Stealer? Ini Definisi dan Bahaya Tersembunyi di Balik Ekstensi Browser Kalian
  • Apa itu Skandal BlackCat Ransomware?
  • Apa itu ToneShell? Backdoor atau Malware Biasa?
  • Apa itu Parrot OS 7? Ini Review dan Update Terbesarnya
©2026 Tutorial emka | Design: Newspaperly WordPress Theme