How AI and Data Are Being Used in Investment Management: Armando Gonzalez - CEO RavenPack & Bigdata.com
“What you built a year ago is probably obsolete today.” — Armando Gonzalez
Armando Gonzalez is a tech entrepreneur at the forefront of the AI revolution. As the visionary CEO and Co-founder of RavenPack, he’s transforming how the world's top financial institutions harness the power of data analytics. With over 20 years of pioneering experience, Armando is a globally recognized authority in AI and systematic data analysis. In 2024, Armando took innovation to new heights by founding Bigdata.com, an AI research assistant that’s redefining how business and finance professionals access critical information.
I first met Armando when he reached out after I left Kensho Technologies, post-acquisition from S&P Global. That was six years ago. We instantly connected over a shared compulsion to supercharge the man-machine process for investing. Soon after, Armando invited me to join a panel on why natural language processing was going to be the crown jewel of AI. I represented the user. At the time, I felt clever comparing the state of NLP to that of black-and-white TV powered by vacuum tubes. Clearly, the state of play has moved forward considerably. Please enjoy our conversation.
Click here for FAQs and Mind Map summaries of the key concepts from our interview with Armando.
In this interview, you’ll learn:
How investors judge AI tools by their ability to generate alpha, manage risk, or reduce costs
Why renewal cycles are the real test of whether a data or AI product delivers value
How careful source curation and whitelisting build the foundation for trustworthy outputs
Why traceability and audit trails protect both credibility and compliance in financial settings
How knowledge graphs resolve entities across securities, making data AI-ready at scale
Why investors expect exponential ROI (5–10x) from data products to justify adoption
How subscription models provide sustainable economics for both data providers and users
Why compliance has become the key gatekeeper for scaling AI in regulated industries
How AI-related skills are emerging as a baseline competency alongside Excel and coding
Why smaller, more agile firms often gain an adoption edge over slower incumbents
Some takeaways:
Value is measured in hard outcomes. AI products in finance survive only if they generate returns, cut risk, or save costs—anything less fails at renewal.
The bar is set at exponential returns. Because adoption diverts scarce talent and budget, tools need to deliver multiples of value, not marginal gains.
Data quality comes first. Trusted sources and rigorous whitelisting determine whether AI outputs can be relied on in high-stakes investment contexts.
Audit trails preserve credibility. When outputs can be traced back to underlying sources, both clients and compliance teams gain confidence in the results.
Entity resolution unlocks scale. By connecting securities, subsidiaries, and products through a knowledge graph, Bigdata.com removes ambiguity and accelerates analysis.
Subscription economics align incentives. One or two good decisions can justify the fee, while recurring renewals signal durable value creation.
Compliance dictates the pace of adoption. A single violation can outweigh years of returns, making regulatory alignment central to every deployment.
Skills are shifting. Where Excel once differentiated careers, the ability to craft effective prompts is now becoming a core workplace requirement.
Adoption speed matters. Smaller firms can test, onboard, and integrate faster than larger incumbents bogged down by pilots and red tape.
Build vs. buy is no longer binary. The trend is to buy for speed and reliability, while building only where compliance, IP protection, or security demand it.
Introduction and Background
Rob Marsh: Welcome to AI Investor, where we focus on how AI can help investors generate better returns. I'm Rob Marsh.
It's my great privilege today to introduce Armando Gonzalez, founder and CEO of RavenPack and Bigdata.com. If there was a title for today's conversation, it might be data's role in providing an edge with a subtitle: Without data, there is no AI.
I first heard of Ravenpack probably eight or nine years ago when I was developing products at Kensho. For our listeners and readers, quick background: Kensho was a fintech company sponsored by major investment banks that was ultimately sold to S&P for its AI talent. Back then, I thought of you as a competitor. Not long after I left Kensho, we connected, and you kindly asked me to participate on a panel around why natural language processing was going to be the crown jewel of AI. I think you were very prescient in that, and even when we met you were already well into your journey. So I'm so excited to have you and have this conversation today.
Armando Gonzalez: It's a real honor, Rob, to have this conversation with you, especially since this conversation has been going on for many years between us. I've always considered you one of the real AI practitioners, which back then meant you understood how to extract value from technology.
Now it's so complex when we think about AI that the challenge is even bigger and exponential. It's difficult to have good conversations with folks in this space because of the amount of noise and exaggeration, as well as the expectations that have been put on what AI will do. Expectations are so high that it feels like we're at the top of the hype, and there is potentially a significant bubble forming, at risk of bursting.
Cutting Through the AI Hype
Rob Marsh: Absolutely. It's this weird paradox where there is so much hype. That's why there's value in focusing on process: What are people doing? What are the sources of the edge? What have they been doing throughout their careers as successful investors? And then looking at the core competencies of the technology—where they are now, where they're evolving—and focusing on the overlap.
I agree with you about the hyperbole. The challenge I have when talking with people or working with investors and companies is this expectation, almost like there's a magic eight ball that’s really easy to use. It takes some time to move people off that perception. But at the same time, I'll finish our conversation today asking you: If you had a magic wand, where would you wave it? Because the answer to that can be very practical.
It's weird—wanting to move people away from thinking it's magic, but still asking the question of where you would wave it to make things more practical and focused.
Armando Gonzalez: I would also say that what makes AI so interesting in financial services, where I've spent the last 25 years, is that there's always a way to measure it. It's either helping you generate alpha, reduce risk, or reduce headcount. More specifically for us, it's always been about signal and making our customers money.
If it doesn't do that, you get punished very quickly—within a subscription cycle of 12 months—if you can't deliver the value you may have demonstrated in trials. In our case, it’s harder than most trials because we have to demonstrate that historically you could backtest a strategy and actually make money, at least on paper. That's the decision point to buy a product.
In other cases, for consumer applications, you might play around with it, subscribe, and then realize after 12 months—or even 30 days—that it's not useful. For us, the bar has always been high. You have to add exponential value, not just a 1x return on investment, but ideally 5x or 10x, because of all the added costs associated with the firm using your product.
More importantly, there’s the opportunity cost of highly talented, expensive resources being dedicated to one data set or set of tools. This forces you, as an AI company, to ensure you’re connected to the value chain and know exactly where you fit to produce exponential value—not relying on marketing, a nice-looking brand, or a good-looking UI.
Data Quality as the Foundation
Rob Marsh: Coming from both the buy side and the consumer side for so many years, I’m very sensitive to opportunity costs before you even start actually using and consuming something. On the vendor side, one of my mantras is: Value is confirmed on the renewal, not the first contract. The first contract is a function of salesmanship and a feeling you can provide.
Armando Gonzalez: Exactly.
Rob Marsh: There’s also an opportunity, especially when we talk about Bigdata.com. As you said earlier, there is no AI without data—period. We’re learning quickly that there’s legitimate demand from AIs operated by our customers for high-quality, timely, trusted, time-sensitive information. This allows next-gen LLMs to make decisions and plans effectively.
We’ve seen how poorly they perform with low-quality training data. While they’re good at writing and generating concepts, they don’t fit in the real world of, say, writing a trading idea or creating a thesis for investing in one country over another. There are so many nuances in our space, where real money is involved, that we can’t rely on just running things through ChatGPT and being impressed by the output if the substance and data aren’t accurate.
It's cliché—garbage in, garbage out—but there’s also “gold in, gold out.” I’ve been doing the Big Moves letter, and you can see what can be done using publicly available information. But when I introduce behind-the-paywall data and structured data—things where someone has done materially important preprocessing, exactly what you do—you see a step-change improvement in the outputs.
Armando Gonzalez: I was running an exercise the other day using a very famous new AI search engine. There aren’t many famous ones, but there are lots of wrappers on top of other LLMs. It was a specific task: I wanted annualized market capitalizations for a series of cryptocurrencies.
In our world, we wouldn’t need an LLM; we’d go to a terminal and grab the data that should give the right values—market cap for Bitcoin in 2020, 2021, and so on. I ran it through this search engine, and it created a nice-looking report, explained the trends, gave me a table, even generated a chart. Then I did the same exercise on Bigdata.com, which gave me a straightforward table—no fluff, just the data.
When I compared the two tables, they looked very different. The other search engine’s output looked better—chart, explanation—but some values were completely wrong. For example, it had Bitcoin’s market cap highest three or four years ago, not recently. Nine out of ten values were materially wrong.
If I had started my investment research with those wrong assumptions about trends, every assumption moving forward would be flawed. That trade would go sideways. An audit trail is necessary in our space, and such mistakes are crucial—they could cost you your job.
Auditability and Source Trust
Rob Marsh: For most of our audience, the greater risk isn’t losing money on a trade; it’s losing credibility with clients or your boss. That’s a far greater risk.
Armando Gonzalez: Exactly. At the end of the day, job security and the ability to generate income are key human assets. AI should help preserve and grow income, not put it at risk.
I think we’ll see more AI brands—or, more specifically, data brands. You start with good data sources before you even think about models. Together, they’ll become brands of AI—“This AI only uses this class of data,” or “These AIs don’t use these sources.” I wrote an article pre–Gen AI about “tribal AI,” predicting more AIs that act like tribes, with moral codes, cultures, processes, and missions.
The AI’s output is a reflection of the data that goes in. It’s not necessarily that one model is better, but the data going into the analysis drives the difference in policy, actions, and recommendations.
Rob Marsh: You mentioned auditability. At the highest level, it’s about trust with the institution or model you’re dealing with. But there’s also the job-by-job traceability to the source. That’s important not just for trusting the output, but as a superpower—AI can, in some cases, leave a trail of what went in and how it was considered, which becomes data and an asset itself. How do Ravenpack and Bigdata.com support that trail and transparency?
Armando Gonzalez: It starts with a whitelist of sources you want to work with. We look at when a publisher was created, where it was created, whether it has a paywall, how many subscribers it has, whether it follows AP style, potential biases, and other factors. We rank sources—least biased, most read, best covered. These go into the whitelist.
Because we have to choose from billions of potential sources, and many automated news outlets pop up and disappear. There are also highly ranked sources that are pure clickbait. So we have a research process to determine which sources we track, rank, and feed with metadata. That way, the large language models using those sources can produce reports, summaries, or analysis with context about the sources and their origins.
We also build relationships with data providers whose content isn’t necessarily public—content behind a subscription or paywall. If customers are willing to pay for it, that shows trust. People don’t spend $50 or $100 a month on a service they don’t believe in. This gives us confidence that these sources will be trusted by our customers. As part of that relationship, we secure rights to use the content in AI models and ensure compliance for banks or hedge funds to be comfortable using it. Onboarding unique, high-quality, trusted sources before thinking about models is a massive and ongoing endeavor for us. This curation and AI-readiness is our business.
Client Control and Customization
Rob Marsh: As a user, I appreciate being able to rank providers differently and impose my own opinions, filtering through. It’s not just taking your word for it—I can incorporate my own IP. That’s mission critical.
Armando Gonzalez: Absolutely. Some customers have their own stack, models, and fine-tuning processes for certain tasks. They might have agents that call specific content types. For example, one customer prefers a certain earnings call provider over another. We have to support both, because clients may have different vendor relationships and compliance approvals.
The value of Bigdata.com is one API call with a single function change for the provider name, still getting standardized output designed for agents. This centralization works even if providers have already been vetted by compliance and are difficult to replace. We support “bring your own license”—if you already have rights to a provider, we can deliver that same content through our search, retrieval, and extraction technology, connected to our knowledge graph.
Our knowledge graph took over 20 years to build. We annotate and connect all publicly named securities—stocks, bonds, any tradable asset—with unique names and point-in-time identifiers, regardless of source. This entity resolution means your AI can ask about “Meta” and we resolve it to “Meta Platforms,” which owns WhatsApp, Facebook, subsidiaries, and products. Any data source covering any related element will be delivered in a consistently identified format, so AIs don’t need to disambiguate—they just request data, get relevant content, and use it in their stack, whatever model, delivery, or UI/UX they’ve built.
Monetization and Subscription Models
Rob Marsh: Pay once, use it [data and content] where I want to—that’s the direction it’s going.
Armando Gonzalez: Exactly. Rights themselves are critical. We’re educating the market on the value of data and safe monetization in the AI world. Models like paying per click or per article often fail—the economics are too limited, and AIs will optimize to minimize transactions. Subscription models are proven. One good trade can justify a subscription; two good trades can make it exponential.
This makes the case for a strong subscription model that builds provider confidence: there’s renewal potential as long as content stays relevant, well-covered, and produced with integrity, plus rights to use it in models. Most firms aren’t trying to fine-tune LLMs; they want retrieval augmented generation—feeding in the right text or reports when needed without training the model. Educating providers on this usage helps them join the ecosystem and grow the TAM without being exploited.
Compliance as a Core Stakeholder
Rob Marsh: As an investor, I need my counterparty to be successful too. If you’re valuable, I need you to be around to renew—otherwise, the game ends. One trade pays for a subscription; one violation can be fatal. Compliance in financial services can outweigh even the PM or dev team.
Armando Gonzalez: Yes. Compliance adoption is slower, seen as back-office or admin. Risk-focused teams value it, but PMs with P&L get faster value. We’ve historically delivered to quantitative PMs as one factor among many in multifactor strategies, retaining a 110% net retention rate by continuously adding valuable data.
Now the sell side is entering with Gen AI. Researchers, analysts, and sales traders need quick insights on complex, personalized themes or client portfolios. They need to ask quickly and get answers they can deliver to clients or use to make calls. These research copilots are emerging—every bank has its “own AI,” really a co-pilot. They can generate analysis, PowerPoints, summaries, and reports—but they still need good data delivered in a way that avoids hallucinations that would alarm compliance.
AI strategy now touches engineering teams, CTOs, and innovation officers. We work with dev teams tasked with delivering promised AI capabilities. It’s an investment in the future, and I think the sell side will be on the right side of this trade.
From Excel Skills to Prompting Skills
Rob Marsh: In a world with so much gray, it’s clear that’s the direction we’re going. I think back to my early days—in the mid-80s, I got a job because I could use a computer, after a FORTRAN class my freshman year. I was helping build tax partnership models on a bootleg Lotus 1-2-3, just two years after doing accounting on a calculator with pen and paper. Fast forward to now—I joke none of us would have the lives we’ve had without Excel.
Armando Gonzalez: Exactly. You could get a job because you knew Excel. Now we ask: What models do you use? Can you show your prompting skills? We’re sometimes not even asking if you can code—just if you can prompt AI to write code. Using these tools well will be key to getting jobs, more than competing with them.
Addressing IP Leakage
Rob Marsh: One concern in financial services is IP leakage. How do you address that?
Armando Gonzalez: There are degrees of concern. Protecting a firm’s own data starts with highly sensitive emails or IMs—no firm I know is ready to let that leave their systems, especially in banking. That’s the crown jewels—valuable but heavily untapped because of compliance.
Comfort comes with less sensitive categories: research reports, customer-facing content, structured CRM data (anonymized), and prompts (sanitized to avoid PII). You can run a query on an alternative data company without revealing the client wants to invest $20 million in it. Stacks are getting smarter about deciding which tasks stay within internal infrastructure versus which can be sent externally.
Our business has reached a point where firms confidently send us questions about financially relevant entities, knowing Bigdata has trusted sources. We return ranked, customizable results that feed into internal processes—decisions and analysis remain with the client, and we have no visibility into that.
Deploying Behind the Firewall
Rob Marsh: Is it on the roadmap for clients to spin up an instance of Bigdata.com locally, behind their firewall, leveraging the data and knowledge graph alongside their own unstructured gold mine?
Armando Gonzalez: Absolutely. We already have projects deploying and maintaining managed services within client VPCs. These instances process internal content alongside Bigdata.com’s content, allowing prompts, watchlists, and reports to be handled internally. This minimizes leak risk and meets technical/legal requirements. Some customers would outsource to us completely, but constraints keep it in-house.
Rob Marsh: You’ve got open-source data, paywall data, and behind-the-firewall data. Integrating all three is where the real gold comes from.
Armando Gonzalez: Many customers have unique relationships with information providers—sell side, research partners, investors—that others don’t. Analyzing that unique content with AI produces a different footprint than a new VC firm with six months of history.
Rob Marsh: And it’s not just the data—it’s the audit trail from raw data to decision. Having that inside the firewall changes the output dramatically.
Armando Gonzalez: Exactly. The process is the product. When investors put money in a fund, they’re investing in a process.
Rob Marsh: In banking, I didn’t need trade recommendations. You could tell me tomorrow’s Wall Street Journal headlines, but if it didn’t fit our process, it didn’t matter. That’s what I was paid for. Build versus buy isn’t binary for sophisticated clients—it’s about where in the stack you focus, especially with tech changing so fast.
Build vs. Buy in a Rapidly Changing Landscape
Armando Gonzalez: Post–ChatGPT, the trend was build—firms thought they needed to build the whole stack. Now, with costs and obsolescence moving so fast, the trend is buy. Build only for compliance, IP protection, and security; buy everything else, because you’ll fall behind if you try to build it all.
We specialized in NLP 20 years ago, proving alpha from text, and stuck with it. Clients told me to start a fund if it was so good, but I didn’t know how to run one. My clients are experts in that; they don’t want to be experts in NLP or big data infrastructure. If I keep delivering value in my specialty, I have a sustainable business.
Rob Marsh: That edge compounds over time. You have to start learning early to widen the advantage.
Armando Gonzalez: Indeed.
Lessons from Successful Adopters
Rob Marsh: What key lessons have you learned from clients successfully incorporating Gen AI into their investment process?
Armando Gonzalez: Speed of innovation is faster than you think, driven by developments outside financial services. Smaller, nimble organizations have an edge because they can adopt quickly without red tape, while larger firms risk losing share to them.
Rob Marsh: I see a barbell—large funds with capital and behind-the-firewall data, and nimble smaller firms. The speed of underlying tech and the innovation it enables changes how fast you can develop ideas, strategies, and infrastructure.
Armando Gonzalez: Exactly. Time to market is key—onboarding new data can take 6–12 months now, but if we can cut that to 6–12 days, you realize value (or lack of it) much faster. Many AI projects have been stuck in POCs for 2–3 years with no value yet.
Rob Marsh: The opportunity cost of not exploring other opportunities quickly can be even greater than the cost of onboarding.
Armando Gonzalez: Agreed.
If You Had a Magic Wand
Rob Marsh: Wrapping up—if you had a magic wand to wave at some part of the process, where would you wave it?
Armando Gonzalez: Compliance. We need to move faster while meeting requirements. That would foster faster adoption and innovation in capital markets, financial services, healthcare, and other regulated industries.
Rob Marsh: Great place to stop—I could keep you all day.
Armando Gonzalez: Thank you, Rob, for inviting me. It was fun.
Disclaimer: The information contained in this newsletter is intended for educational purposes only and should not be construed as financial advice. Please consult with a qualified financial advisor before making any investment decisions. Additionally, please note that we at AInvestor may or may not have a position in any of the companies mentioned herein. This is not a recommendation to buy or sell any security. The information contained herein is presented in good faith on a best efforts basis.