Zero Latency Launches Zerogrid Closed Beta for AI Inference

By Artūras Malašauskas May 07, 2026 5 min read Share:

Zero Latency has opened closed beta access to Zerogrid, a distributed AI inference grid modeled on virtual power plant architecture that routes workloads by constraint rather than region.

Zero Latency, a Charlottesville-based distributed AI infrastructure company, has announced the launch of Zerogrid closed beta. The platform functions as a distributed AI inference grid that routes workloads to edge capacity based on specific operational constraints rather than geographic regions. The announcement came via official press release on May 6, 2026, with coverage from industry publication HPCwire the following day.

The company, formerly known as Hyphastructure, is now operating under the 0.lat domain. Zerogrid matches each inference decision to edge capacity that simultaneously satisfies low-latency, data-gravity, and bursting constraints. This represents a fundamental shift from how cloud providers currently handle AI workloads.

Beta access is restricted to a select cohort. Fortune 1000 enterprises, tier 1 telecommunications and fiber operators, and leading enterprise DevOps application platforms can apply. Participants receive immediate access to a Zerogrid workload and image management dashboard. A command-line interface will be introduced during the program and refined based on user feedback.

The architecture borrows heavily from energy infrastructure. Zerogrid is modeled on behind-the-meter distributed virtual power plants (VPPs), a design pattern the founders have spent nearly a hundred collective years building across the energy sector. Zero Latency owns and operates a network of edge computing clusters across the United States and coordinates them as a single pool of capacity.

Instead of provisioning capacity statically, these clusters aggregate and dispatch against workloads on a day-ahead and real-time basis, plus longer-term arrangements. This mirrors how distributed energy resources operate in modern power markets. The approach aligns with the AI grid concept that Nvidia articulated—a networked, dispatched layer of compute treating inference as a grid service.

Zerogrid extends that vision with constraint-aware dispatch. Each workload reaches not just available capacity, but capacity satisfying its specific operational envelope. The result is an infrastructure tier purpose-built for regulatory fracturing, sovereign AI requirements, and heterogeneous enterprise constraints. Compute must come to the workload, not the other way around.

The Zero Latency team has pioneered and deployed first-of-a-kind battery, solar, demand response, electric bus, vehicle, and distributed natural gas infrastructure. They are applying the same architectural logic that unlocked over a billion dollars of decentralized power infrastructure behind customer meters to a new challenge: routing the right compute capacity to the right place, under constraint, at the moment it is needed.

Michael Huerta, Co-founder of Zero Latency, stated: "Innovation through decentralization is not a thesis we arrived at recently. It is the lens through which we have built, financed and operated infrastructure for decades. We have applied the successes and hard lessons from deploying decentralized power infrastructure to unlock architectural and routing innovations for AI workloads. Zerogrid is the result: infrastructure designed for an inference world that the cloud was never built to serve."

The problem Zerogrid addresses is structural. For AI training, the market is well-served. Hyperscalers and neoclouds have built enormous capacity, and Zero Latency does not compete in that space. But AI inference is a structurally different problem, particularly inference that must satisfy hard constraints around latency, data residency, or regulatory geography.

Cloud providers route regionally, not by constraint. On-premises deployments are rigid by design. Zero Latency was founded on the conviction that AI workloads deserve to be treated as first-class routing primitives. Not "pick a region." Route this specific inference decision to where burst, data-gravity, and latency requirements are all satisfied.

That is the problem Zerogrid was built to solve, and the one that neither cloud nor on-premises architectures address at scale. The official press release from GlobeNewswire confirms the technical specifications and beta eligibility criteria.

From a user experience perspective, the dashboard interface represents a tangible shift. Engineers will no longer select regions from dropdown menus and hope latency requirements are met. Instead, they define constraints—latency thresholds, data residency requirements, burst capacity needs—and the grid handles the routing. The physical reality of this means fewer failed inference calls, less manual capacity planning, and reduced friction when deploying models across geographies.

The CLI introduction during beta suggests the company expects power users to want programmatic control. This is standard for infrastructure tools, but the constraint-based routing logic embedded in the CLI will be the differentiator. Developers won't just specify where to run code; they'll specify what conditions must be satisfied for execution.

Industry context matters here. The AI grid concept has been discussed for years, but implementation has lagged. Nvidia's vision treats inference as a grid service, but actual deployment requires ownership or coordination of distributed edge capacity. Zero Latency owns its clusters, which gives them direct control over dispatch logic. This is not an aggregator layer on top of existing cloud providers.

The energy sector parallel is not merely rhetorical. Virtual power plants aggregate distributed energy resources—solar panels, batteries, demand response systems—and dispatch them as a single unit. The same logic applies to compute: aggregate distributed edge clusters, dispatch them as a single pool, and route workloads based on real-time constraints rather than static geography.

Whether this model scales remains to be seen. The closed beta limits visibility into actual performance metrics. Enterprise adoption will depend on whether constraint-based routing delivers measurable improvements over existing cloud region selection. The infrastructure investment required to own and operate edge clusters across the United States is substantial.

Zero Latency's background in energy infrastructure provides credibility, but AI inference is a different market with different economics. The billion dollars of decentralized power infrastructure experience is impressive, yet the translation to compute routing is unproven at scale. Beta participants will generate the first real-world data points.

For now, the platform remains in closed beta. The company directs interested parties to www.0.lat for more information. Whether users actually pay for constraint-based routing over traditional cloud regions remains the real question. The technology is novel, but market adoption will determine if Zerogrid becomes infrastructure or remains a niche solution.

Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt Connect on LinkedIn

Zero Latency Launches Zerogrid Closed Beta for AI Inference

Comments