Vanguard's Virtual Analyst: AI Success Built on Data Foundations
When financial analysts at Vanguard needed answers from complex datasets, they faced a familiar frustration: even straightforward questions required intricate SQL queries and days of waiting for data team support. The investment management firm's solution wasn't just another AI model—it was a fundamental rethinking of how enterprise data gets prepared for artificial intelligence.
The company's Virtual Analyst project, detailed in an AWS Machine Learning blog post, reveals a critical insight that many organizations overlook: building effective conversational AI isn't primarily a machine learning challenge. It's a data architecture challenge. The most sophisticated foundation models require proper data foundations to deliver reliable results.
This realization forced Vanguard to shift focus from AI capabilities to what they termed "AI-ready data." The distinction matters because it changes where engineering effort gets spent. Instead of tuning prompts or selecting different models, teams invest in metadata management, semantic context, and data quality standards. (Honestly, this is the part most companies skip until their AI starts hallucinating financial figures.)
The technical implementation leverages a comprehensive AWS service stack. Amazon Bedrock powers the natural language understanding, while Amazon Bedrock Guardrails secure AI inputs and outputs to protect sensitive financial data. Amazon Redshift handles centralized data warehousing, AWS Glue manages data cataloging and ETL jobs, and Amazon DynamoDB maintains conversation persistence with minimal latency. Amazon ECS provides scalable compute infrastructure, Amazon S3 handles storage, and Amazon SageMaker supports experimentation.
Eight guiding principles emerged from Vanguard's journey. The first principle establishes clear data product and operating models. Data product owners handle business alignment while engineering stewards maintain technical quality. Service-level agreements for data freshness and reconciliation tolerance ensure downstream consumers get reliable, reusable data products. Both business and technical owners must be assigned to each critical data asset with documented responsibilities.
Second, governance and security measures require early engagement with compliance and security teams. Vanguard implemented logging of authorization events to meet regulatory requirements while supporting business agility. Existing data access policies map to the new AI system with row-level and column-level security where needed. This isn't optional in financial services—the regulatory environment demands it.
The collaborative imperative represents perhaps the hardest part of the work. Vanguard brought together data engineers, business analysts, compliance officers, security teams, and business stakeholders. Each team brought critical expertise: data engineers understood technical infrastructure, business analysts knew the semantic meaning of financial metrics, compliance teams ensured regulatory requirements were met, and business users provided real-world context for how insights would be used.
This cross-functional collaboration became the foundation for AI by developing a well-defined, cross-functional operating model where ownership models, semantic definitions, and quality standards were well understood and activated. Without clear ownership models and semantic definitions that all teams could understand and contribute to, the AI solution would not have a good foundation.
The Virtual Analyst project served as a catalyst for new processes and frameworks that provide benefits far beyond the initial AI use case. According to Vanguard's corporate communications, the broader modernization program that enabled this work began in spring 2021 and has reduced infrastructure total cost of ownership by almost 30% from 2022 to 2025. The team presented their achievements at AWS re:Invent 2024, where they garnered significant interest in understanding what they'd accomplished.
Physical interaction with the system matters. Analysts no longer wait days for data team responses. They type questions into a conversational interface and receive immediate answers. The latency difference—measured in seconds rather than days—changes how people work. It's the difference between planning a week ahead versus making decisions in real-time.
Eight principles guide the AI-ready data approach, though the AWS blog post only details the first two in depth. The remaining principles build on existing foundational data capabilities—data platforms, integration, interoperability—and extend them to support AI-ready data. These principles emerged from real-world challenges encountered when trying to make AI systems work reliably with enterprise data at scale.
The measurable business outcomes include faster decision-making, reduced dependency on data team support for basic queries, and improved satisfaction scores. Vanguard reports satisfaction scores well over 4.4 across web and mobile experiences, though they note they're not done—world-class performance sits above 4.5 across mobile app, web experience, and contact center experience.
Whether other organizations can replicate this success depends on their willingness to invest in data architecture before deploying AI models. The technology stack is available. The principles are documented. The real question is whether companies will do the unglamorous work of metadata management, semantic definition, and cross-functional collaboration before launching their next AI initiative. Most won't, and their AI projects will struggle accordingly.
Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt Connect on LinkedIn
Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt
Comments