RadixArk Raises $100M Seed to Optimize AI Inference Infrastructure

By Artūras Malašauskas May 05, 2026 4 min read Share:

Palo Alto startup RadixArk secured $100 million in seed funding to commercialize SGLang, an open-source inference engine that reduces memory overhead for large-scale AI deployments.

A new infrastructure startup is attempting to solve one of AI's most expensive problems: the massive computational waste that occurs when models run at scale. RadixArk officially launched today with $100 million in seed funding, valuing the Palo Alto-based company at $400 million post-money.

The round was co-led by Accel and Spark Capital, with participation from hardware giants including Nvidia's NVentures, AMD, and Broadcom CEO Hock Tan. Individual backers include John Schulman (OpenAI co-founder), Soumith Chintala (creator of PyTorch), and Olivier Pomel (CEO of Datadog).

At the core of RadixArk's technology is SGLang, an open-source high-speed inference engine originally developed at LMSYS Org (the research collective behind the famous Chatbot Arena). The engine acts as an intelligent middle layer between AI models and the chips they run on, managing what's called the "KV cache"—a type of short-term memory for AI systems.

According to the company's official launch announcement, SGLang already serves trillions of tokens per day for enterprise users requiring high-throughput, self-hosted AI deployments. That's a staggering number when you consider the physical reality of what happens during inference: servers humming, fans spinning, electricity flowing through circuits that are often underutilized due to memory inefficiency.

CEO and co-founder Ying Sheng, a former engineer at Elon Musk's xAI, explained the company's approach in the blog post. "We are treating inference, training, and post-training as first-class citizens," Sheng said. "Our goal is to build an end-to-end infrastructure that gives developers more speed and control without the current overhead."

The technical innovation here is specific and physical. RadixArk uses a Radix tree filing system to manage the KV cache. Think of it like a professional restaurant kitchen. Instead of chopping onions for every single order, the kitchen preps them in bulk. RadixArk's software scans incoming queries to see if they share a common beginning (like a long legal document or specific system instructions). If they do, the software skips redundant processing, reusing prepped data to generate answers faster and cheaper.

This matters because current hardware often struggles with memory inefficiency, leading to wasted compute power and high latency. As AI models like GPT-4 and Llama 3 become more integrated into daily enterprise workflows, the cost of inference has skyrocketed (a problem that has plagued users for years, frankly).

The involvement of both Nvidia and AMD in the round underscores a rare industry consensus: hardware alone cannot solve the AI scaling problem. Software-level efficiency is now seen as the key to making AI economically viable for the long term. "The demand for AI compute is infinite, but the physical supply of chips is not," noted a spokesperson from Accel. "RadixArk is essentially expanding the capacity of existing hardware by making every cycle count."

Independent reporting from Ventureburn corroborates the investor list and technical details, noting the company's focus on solving the "massive computational and financial drain" that occurs during AI model deployment at scale.

RadixArk is also commercializing a second open-source foundation: Miles, a framework for large-scale reinforcement learning and post-training. The company plans to use the new capital to expand its team of systems researchers and scale its managed infrastructure. Beyond inference, Miles signals RadixArk's intent to become the primary backbone for the next generation of efficient-first AI models.

For practitioners, this represents a shift in how infrastructure is built. Teams building high-throughput production systems should watch open inference-engine developments because they affect choices around model placement, batching strategies, and hardware provisioning. Software that reduces memory pressure can change tradeoffs between model size, context window, and inference batch size, which in turn affects latency and per-request cost.

The $100 million seed at a $400 million valuation is unusually large for a seed round, indicating strong investor conviction in the market opportunity for inference optimization. Whether users actually pay for it remains the real question. Open-source tools are free, but managed infrastructure costs money. The company will need to prove that its efficiency gains translate to real savings for enterprises running sustained token volume.

Time will tell if SGLang delivers on the performance claims. For now, the infrastructure layer between accelerators and model runtimes has become a lever to reduce costs and latency. That's the bet RadixArk is making, and investors seem willing to back it.

Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt Connect on LinkedIn

RadixArk Raises $100M Seed to Optimize AI Inference Infrastructure

Comments