AI Agents AI Gadgets & HW AI Models - LLM AI Open Source AI Security AI for Coding AI for Gaming AI for Images AI for Music AI for Videos Artificial Intelligence Editor's Choice NVIDIA AI Other News Robotics Tech Face-off Tech Satire

Thinking Machines Unveils Real-Time AI Interaction Models

By Artūras Malašauskas May 12, 2026 4 min read Share:
Mira Murati's Thinking Machines Lab announced "interaction models" that process audio, video, and text simultaneously with 0.40-second latency, challenging turn-based AI paradigms.

The AI startup Thinking Machines has introduced a new class of systems called "interaction models," designed to process and respond to human input in real time rather than waiting for complete prompts. Founded by former OpenAI CTO Mira Murati, the company is attempting to solve what it describes as a fundamental bandwidth bottleneck in current human-AI collaboration.

Current frontier models experience reality in a single thread. Until a user finishes typing or speaking, the model waits with no perception of what the user is doing. Until the model finishes generating, its perception freezes, receiving no new information. This creates a narrow channel that limits how much of a person's knowledge and intent can reach the system. Picture trying to resolve a crucial disagreement over email rather than in person.

According to The Verge, Thinking Machines is demonstrating AI interaction models that respond to users in real time. The company claims its model, TML-Interaction-Small, responds in 0.40 seconds, which is roughly the speed of natural human conversation and significantly faster than comparable models from OpenAI and Google.

The technical term for this is "full duplex," and it represents a fundamental shift in how AI perceives time and presence. Instead of the standard alternating token sequence, the system uses a multi-stream, micro-turn design that processes 200ms chunks of input and output simultaneously. This architecture allows the model to listen, talk, and see in real time, enabling it to backchannel while a user speaks or interject when it notices a visual cue.

Key features include seamless dialogue management where the model implicitly tracks whether the speaker is thinking, yielding, self-correcting, or inviting a response. The system supports verbal and visual interjections, allowing it to respond based on context rather than waiting for the user to finish speaking. It can handle simultaneous speech for live translation and maintains time awareness during interaction.

When you actually use this, the physical experience differs from current chat interfaces. There's no loading spinner waiting for your message to complete. No awkward pause while the system processes. The AI can interrupt you mid-sentence if it detects an error in your code snippet, or it can generate a UI chart while continuing to listen to your feedback. (This is the kind of responsiveness that feels less like talking to a bot and more like having a colleague on the line.)

TechCrunch reports that TML-Interaction-Small is a 276-billion parameter Mixture-of-Experts model with 12 billion active parameters. Because real-time interaction requires near-instantaneous response times that often conflict with deep reasoning, the company has architected a two-part system. The Interaction Model stays in constant exchange with the user, handling dialog management and immediate follow-ups. The Background Model is an asynchronous agent that handles sustained reasoning, web browsing, or complex tool calls, streaming results back to be woven naturally into the conversation.

To prove the efficacy of this approach, the lab utilized FD-bench, a benchmark specifically designed to measure interaction quality rather than just raw intelligence. The results show TML-Interaction-Small significantly outperforms existing real-time systems. It achieved a turn-taking latency of 0.40 seconds, compared to 0.57s for Gemini-3.1-flash-live and 1.18s for GPT-realtime-2.0. On FD-bench V1.5, it scored 77.8, nearly doubling the scores of its primary competitors.

Long sessions remain difficult because continuous audio and video quickly fill up context, so managing very long conversations is still an open challenge. The system works better for short- and medium-length interactions, but extended use still requires careful context management. This limitation means the technology won't immediately replace existing workflows for tasks requiring sustained attention over hours.

Still, this is a research preview, not a product. The company isn't releasing it to the public yet. A "limited research preview" is coming in the next few months, with a wider release set for later this year. Whether the real-world experience lives up to the technical claims is something we won't know until people can actually use it.

For now, interaction models remain a research concept. Whether they reshape how people use AI will depend on how well they work outside the lab. The benchmarks look impressive on paper, but nobody's going to pay for an AI that interrupts them constantly. Whether users actually tolerate this level of interactivity remains the real question.

Arturas Malas Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt Connect on LinkedIn
Share:

Comments

Sign in to comment:
    <