Google’s Agentic Pivot: Gemini 3.5 Flash and the "Do-Anything" Omni Era

By Artūras Malašauskas May 19, 2026 8 min read Share:

Google has officially killed the chatbot era by launching the agent-optimized Gemini 3.5 Flash and a multimodal "Omni" model designed to act as a digital foreman for your entire life.

Google isn't just playing catch-up anymore; it's trying to rewrite the rules of how we actually use AI. At Google I/O 2026, the company unveiled Gemini 3.5 Flash, a model that doesn’t just sit there and wait for a prompt—it’s built to go out and do the work. While we’ve grown used to LLMs being fast, Flash 3.5 is purportedly four times faster than rival frontier models, a speed bump that Ars Technica notes is essential for the "agentic" era where AI spawns sub-agents to handle parallel workflows. It’s an aggressive play to move from chatbots that chat to agents that act, effectively turning the Gemini app into a digital foreman for your life.

Then there’s Gemini Omni, the "do-anything" model that feels like a fever dream of multimodality come to life. Unlike previous models that stitched together separate text and image systems, Omni is a unified, native multimodal powerhouse. According to The Verge, the initial "Flash" version of Omni can generate and edit high-quality video from almost any combination of text, audio, and imagery. It’s a significant leap toward what Google calls "world understanding," where the AI doesn't just predict the next word but understands the spatial and temporal logic of a video clip well enough to let you "edit the world" just by talking to it.

The Speed of Action: Gemini 3.5 Flash

Flash 3.5 isn't just a minor iteration; it's a structural shift. Google claims it outperforms last year’s Gemini 3.1 Pro on nearly every benchmark while maintaining the low-cost, high-speed profile that makes it viable for developers to run complex, multi-step loops. It’s already the default engine for the Gemini app and the updated AI Mode in Search. The real magic happens in Antigravity 2.0, Google’s new developer platform where Flash can deploy "collaborative sub-agents" to tackle massive coding projects or financial audits that used to take human teams weeks to finish.

The Omni Experience: Video and Beyond

While Flash handles the heavy lifting of logic and code, Gemini Omni is clearly aimed at the creative and "everything" interfaces of the future. The first rollout, Omni Flash, allows for incredibly intuitive video creation. You can take a video of someone drawing a circle and tell Omni to turn it into a portal, or ask it to change the camera angle on a pre-existing clip of a violinist. It’s a unified system that collapses the messy workflows of the past into a single, conversational interface that Google eventually hopes will "create anything from any input."

The New AI Economy

Beyond the models, Google is aggressively retooling its business model to keep users locked into this new ecosystem. They’ve slashed the price of the AI Ultra plan to $100 a month to stay competitive, while introducing Gemini Spark, a 24/7 personal agent that lives in the background to catch hidden fees in your emails or organize your workspace. It’s clear that Google's goal isn't just to provide a better search box, but to provide a complete, agent-driven infrastructure that handles the friction of digital life so you don't have to.

The Agentic Shift: Why Speed is the New Context

Beyond the Spec Sheet: While the tech world obsesses over benchmark scores, the real story here is the pivot from "thinking" models to "acting" ones. Gemini 3.5 Flash represents a fundamental change in architecture designed specifically for high-frequency loops. In the past, AI latency was the "uncanny valley" of productivity; if a model took five seconds to think, it couldn't effectively manage a real-time system. By slashing that response time, Google is enabling what engineers call "agentic workflows," where the AI can self-correct, browse the web, and execute code in a recursive cycle that feels instantaneous to the user.

Historically, Google’s Achilles' heel has been its cautious rollout of integrated features, often letting OpenAI or Anthropic define the narrative. With Gemini Omni, however, Mountain View is leveraging its greatest asset: the sheer breadth of its ecosystem. Omni isn't just a standalone video generator; it’s being woven into the fabric of Android and Workspace. This means the model isn't just "dreaming" up video from a prompt—it's using your existing files, calendar events, and real-world sensor data to provide a level of personalization that "closed" models simply cannot match without massive data privacy hurdles.

Industry insiders suggest that the "do-anything" nature of Omni is a direct response to the plateauing of text-only LLMs. We’ve reached a point where adding more text data provides diminishing returns. By training Omni natively on video and audio from the ground up, Google has bypassed the "translation layer" that usually causes AI hallucinations in multimodal tasks. This native understanding allows the model to grasp physical intuition—like how shadows should fall in a generated video or how a human voice should crack with emotion—making the output feel less like a digital hallucination and more like a captured reality.

Stakeholders in the developer community are particularly zeroed in on the "sub-agent" capabilities of Flash 3.5. By allowing a primary model to spawn smaller, specialized workers, Google is effectively commoditizing complex project management. For a tech journalist, this looks like the end of the "chatbot" era and the beginning of the "operating system" era. We are moving away from asking an AI to write a draft and toward telling an AI to "manage this entire product launch," with the model autonomously handling everything from asset creation to sentiment analysis across social platforms.

There is, of course, the looming question of the "AI tax" on the open web. As these agents become more proficient at fetching and summarizing information without a human ever clicking a link, the economic tension between Google and content creators will reach a breaking point. Flash 3.5 is so efficient at data scraping and synthesis that it threatens to bypass the very publishers it cites. Google's attempt to mitigate this through the "Gemini Spark" ecosystem suggests they want to turn the entire web into a structured database for their agents, a move that is already drawing scrutiny from antitrust regulators and intellectual property lawyers alike.

Ultimately, the launch of Gemini 3.5 and Omni signals that Google is no longer content with being the world’s librarian. They want to be the world’s executive assistant. The technical leap in Flash 3.5 isn't just about raw power; it's about the reliability of execution. If these agents can truly handle the "boring" parts of digital life with the 99% accuracy Google claims, we are looking at a permanent shift in how humans interact with silicon. The era of the prompt is dying, and the era of the objective is just beginning.

The Paradox of Universal Agency

Reading Between the Lines: For all the luster of the "do-anything" model, there is a fundamental tension between Google’s promise of total agency and the inherent limitations of a platform-locked ecosystem. Google frames Gemini Omni as a tool of liberation, yet its utility is suspiciously tethered to the "Google Glass" of the modern era—a high-bandwidth connection to the company's own cloud infrastructure. The industry assumption that "faster is better" ignores the reality that an agent capable of executing a thousand mistakes a second is not an assistant; it is a liability. Flash 3.5 might be the fastest model on the circuit, but speed without a significant reduction in hallucination rates simply means we are automating chaos at an unprecedented scale.

There is also a glaring contradiction in the "Omni" philosophy. Google touts the model’s native multimodality as a breakthrough in world understanding, yet the demonstrations remain curiously sterilized. We see agents organizing pristine calendars and editing cinematic videos, but we rarely see them navigating the messy, unoptimized reality of legacy third-party software. If Omni’s "do-anything" capabilities stop at the border of the Google Workspace, then it isn't a universal agent; it’s a very expensive concierge for a walled garden. Measured skepticism suggests that the friction of the real world—broken APIs, inconsistent data formats, and human unpredictability—will remain the ultimate bottleneck that no amount of FLOPs can solve.

Furthermore, the economic implications of the "agentic" shift are being undersold as a mere productivity gain. By positioning Gemini Spark as a 24/7 background worker, Google is effectively proposing a new form of digital labor that thrives on total data visibility. The trade-off for an agent that catches hidden fees in your email is a model that must ingest every line of your private correspondence to be effective. This creates a feedback loop where privacy becomes a premium feature that few can afford to keep. As we delegate our executive functions to Flash 3.5, we aren't just saving time; we are outsourcing the very cognitive habits that allow us to verify the information we consume.

The long-term projection for the "Omni" era is one of extreme centralization. If Google successfully transitions from a search engine to an execution engine, the "open web" becomes little more than a training set for a single entity. The measured reality is that Google is no longer building tools for users to find answers; it is building a filter that decides which actions are worth taking. While the technical achievement of Gemini 3.5 is undeniable, the sociological cost of a "do-anything" model is the gradual erosion of user autonomy in favor of a frictionless, algorithmic path of least resistance.

We are rapidly approaching a future where your AI agent will spend its morning arguing with another AI agent over a refund you didn't know you needed, for a product you don't remember buying, while you sit back and wonder why your digital life has never felt so productive yet so entirely out of your control.

Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt Connect on LinkedIn