Xnurta Launches The Agentic Retail Media Council to Define EVAL Standards for AI Agents in Advertising
Xnurta, a leader in AI-powered retail media management, has launched the Agentic Retail Media Council to create industry-standard evaluation frameworks (EVAL) for AI agents in advertising, addressing the lack of unified metrics for autonomous ad bidding. This initiative aims to establish a "glass box" for AI, moving beyond the "black box" approach by focusing on five key areas: Instruction Following, Analysis Coverage, Data Accuracy, Reasoning Quality, and Recommendation Quality.
What Most Reports Miss: The Push for "Agentic" Accountability
Behind the Scenes: The council addresses the "trust gap" in programmatic advertising, where, unlike simple automation, agentic AI makes independent, multi-step decisions. A critical, yet overlooked risk is the "double penalty," where flawed AI reasoning doesn't just waste budget, but compounds errors by training future decisions. As AI gains more autonomy, the Council’s push for "cognitive auditing" over simple fraud detection becomes essential for maintaining human oversight, say experts involved with the initiative.
The council intends to offer open-source benchmarks and scorecards that can be applied across different platforms, potentially breaking down "walled garden" limitations in the industry. By standardizing how AI is evaluated, this initiative enables marketers to transition from manual campaign management to strategic oversight of AI agents, with potential for significant reductions in manual reporting tasks.
The wild west of autonomous advertising just got its first sheriff. Bellevue-based Xnurta, a heavyweight in the AI-powered retail media space, has officially launched the Agentic Retail Media Council. This isn't just another industry talk shop; the group is tasked with establishing "EVAL" standards—a rigorous framework designed to measure and certify the performance of AI agents that are increasingly taking the wheel of global ad spend across giants like Amazon, Walmart, and Criteo.
The End of the "Black Box" Era
For years, brands have handed over budgets to algorithms with little more than a "trust us" from their vendors. The new Council aims to turn that "black box" into a "glass box" by utilizing a methodology that evaluates nearly 100 retail-media-specific criteria. These cover five critical dimensions: Instruction Following, Analysis Coverage, Data Accuracy, Reasoning Quality, and Recommendation Quality. It’s a bold move to professionalize a sector where "AI" is often used as a vague marketing buzzword rather than a precise technical description of an autonomous partner.
Reading Between the Lines: While Xnurta frames this as a selfless industry-wide olive branch, there is a clear competitive undercurrent at play. By being the first to define what "good" looks like, Xnurta essentially forces its rivals—from established players like Criteo to newer entrants—to play on a field where the rules were written by the Bellevue camp. It's a classic platform play: if you can't just beat the competition on features, you define the standards by which those features are judged. The skepticism here lies in whether other major tech providers will willingly submit to a framework launched by a direct competitor, or if we are headed for a fragmented "standards war" that leaves advertisers more confused than before.
The contradiction at the heart of "agentic" media is the tension between autonomy and accountability. Xnurta’s EVAL framework highlights "reasoning quality," yet the very nature of advanced neural networks often involves non-linear logic that is notoriously difficult to audit in real-time. Measuring "nearly 100 questions" sounds exhaustive, but in a live bidding environment where millions of decisions happen per second, a retrospective scorecard might be as useful as a weather report from last Tuesday. For agencies, the lure is clear: a standardized seal of approval makes the pitch to cautious brand managers significantly easier, potentially unlocking the 80-90% automation rates Xnurta claims its platform can achieve.
Projecting forward, the Council’s success depends entirely on its ability to attract high-profile neutral voices. If the "State of Agentic Advertising" reports promised by the Council end up looking like Xnurta press releases, the initiative will wither as a niche marketing tool. However, if it manages to integrate with broader bodies or gain the blessing of major retail networks, it could finally provide the "manual" that human operators have been missing since the machines took over. For now, it’s a necessary, if self-interested, step toward making sure the AI agents spending billions of dollars aren't just hallucinating their way through the sales funnel.
It’s comforting to know that while AI agents are busy spending your quarterly budget at the speed of light, we finally have a council to debate whether they followed the instructions properly after the money is already gone.
Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt Connect on LinkedIn
Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt
Comments