Tether Launches QVAC MedPsy Edge AI Models for Medical Reasoning
The stablecoin issuer Tether has officially launched QVAC MedPsy, a new class of medical language models engineered to operate entirely on smartphones, laptops, and other edge devices without requiring cloud infrastructure. The announcement, published on May 7, 2026, marks a significant pivot for the company beyond its core cryptocurrency business into healthcare AI infrastructure.
According to the official Tether press release, the models come in two versions: a 1.7 billion parameter variant and a 4 billion parameter variant. Both are designed to deliver clinical reasoning capabilities while keeping sensitive patient data on-device, addressing privacy concerns that have long plagued cloud-based medical AI systems.
The performance claims are aggressive. The 1.7 billion parameter model achieved an average score of 62.62 across seven closed-ended medical benchmarks, outperforming Google's MedGemma-1.5-4B-it by 11.42 points despite being less than half the size. In real-world clinical scenarios like HealthBench Hard, the same 1.7 billion model even beats MedGemma 27B—a model nearly sixteen times larger.
The 4 billion parameter version scored 70.54 across the same seven benchmarks, exceeding models nearly seven times its size including MedGemma-27B-text. Performance held across clinical-style evaluations such as HealthBench, HealthBench Hard, and MedXpertQA. The evaluation covered eight diverse benchmark suites overall: MedQA-USMLE and MedMCQA for clinical knowledge; MMLU Health and MMLU-Pro Health for health literacy; MedXpertQA for expert clinical reasoning; PubMedQA for biomedical research; AfriMedQA for underserved global healthcare contexts; and HealthBench for real-world clinical scenarios.
Token efficiency is where this gets interesting for actual deployment. The 4 billion model generates responses in approximately 909 tokens compared to 2,953 tokens for comparable systems—a 3.2x reduction. The 1.7 billion model averages around 1,110 tokens versus 1,901 tokens, a 1.7x reduction. That translates into faster response times and the ability to run locally without depending on cloud infrastructure (a problem that has plagued users for years, frankly).
The models ship as quantized GGUF files—1.2 GB for the 1.7 billion-parameter model and 2.6 GB for the 4 billion—with compressed versions retaining most benchmark performance while fitting on standard consumer hardware. That means a hospital system, rural clinic, or individual clinician could run the model entirely on-device, keeping patient records out of third-party cloud infrastructure and away from HIPAA exposure.
Paolo Ardoino, CEO of Tether, addressed the efficiency directly in the company's announcement. "With QVAC MedPsy, our focus was improving efficiency at the model level, rather than scaling up size," he said. "Our 4 billion model exceeded results from models nearly seven times its size, while using up to three times fewer tokens per response. That combination matters because it directly reduces compute requirements, latency, and cost."
The performance gains come from a staged post-training medical process that combines broad medical supervision, higher-value clinical reasoning data, and reinforcement learning focused on harder medical-reasoning cases. No additional model scaling was required to reach these results.
Independent reporting from Yahoo Tech corroborates the timeline and scope of the changes, noting that the release fits Tether's pattern over the past year. Last month it shipped the QVAC SDK, an open-source toolkit for building local, offline AI apps across iOS, Android, Windows, and Linux. Before that, it launched QVAC Health, a consumer wellness app that keeps biometric data entirely on-device. MedPsy is the first QVAC model specifically trained for clinical reasoning.
The privacy pitch may be a major plus for some people but using AI for medical opinions is far from ideal even by today's standards. An Oxford study published in February found that LLMs are routinely giving dangerous medical advice with wrong answers, confused guidance and poor handling of nuanced symptoms. The researchers stopped short of dismissing the technology entirely, but argued AI has a role as "secretary, not physician."
The compliance problem compounds it: Most medical AI today routes patient data through cloud servers, creating HIPAA exposure every time a doctor types a query. Tether's models are available under the Apache 2.0 license for educational and research use, with the firm claiming strict compliance with GDPR, HIPAA, and other regulations. Whether hospital IT departments actually trust a crypto company with patient data remains the real question.
The medical AI market sits at roughly $36 billion today, with projections pointing past $500 billion by 2033, per Tether's own announcement. Models and GGUF weights are available now at qvac.tether.io/models. The release challenges one of the most entrenched assumptions in AI—that better performance requires bigger models and more compute. Instead, QVAC MedPsy flips that model.
This shifts where medical AI can actually be used. Systems that previously required external processing can be deployed to support clinicians within on-site systems for secure, local data processing and analysis, on mobile devices, or in environments where connectivity, latency, or privacy constraints make cloud-based models impractical. It also reduces one of the main barriers to adoption in healthcare: the need to move sensitive data outside of controlled environments.
For the past decade, progress in AI has been tied to access to cloud-based compute. QVAC MedPsy points to a different direction, where efficiency, locality, and privacy define performance. If those gains hold in real-world deployments, they could reshape the economics of medical AI infrastructure, shifting the advantage toward systems that operate locally with lower cost, lower latency, and greater control over sensitive data.
Whether users actually pay for it remains the real question. The models are free and open-source, which means the business model isn't immediately clear. Tether has been expanding beyond crypto into AI infrastructure, but monetizing medical AI without charging per-query or per-seat is an unsolved problem. The technology might work, but the economics are still being figured out.
Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt Connect on LinkedIn
Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt
Comments