An Asset Allocator's AI Use Cases, Implementation Strategy, and Wishlist, with Mark Steed

Community Wisdom - Sharing the best that we find

Sep 05, 2025

“I want my team removed from high-volume, low-value tasks … so we can talk about what the information means.” — Mark Steed, CIO, Arizona Public Safety Personnel Retirement System

Check out AI Street where we found this

Introduction

After three direct one-on-one conversations with me in the interviewer seat, this note marks our first step into the Community Wisdom part of the program. This is where we curate the best and most practical material we find on how top investors and builders are incorporating AI into their processes and investment decisions.

I’m happy to share a recent interview I came across with Mark Steed, CIO of the Arizona Public Safety Personnel Retirement System. It took place on The Institutional Edge podcast, hosted by Angelo Calvello. What struck me almost immediately was Mark’s practical, level-headed approach. The absence of hyperbole is refreshing. Hopefully, we’ve done justice to the conversation between Angelo and Mark. What follows are the three styles of summaries we use, each offering a different angle on the material. Any mistakes herein are ours.

Lastly, the value we provide, hopefully, will come not only from the content and how it is presented, but also from introducing new, quality sources to your attention. In that spirit, I encourage you to check out The Institutional Edge, hosted by Angelo, and AI Street, published by former Bloomberg reporter Matt Robinson. It was in Matt’s weekly that I found this conversation. I find his newsletter a great top-of-funnel resource.

One of our motivations for starting AInvestor was to create a reason to actively engage with AI in an operational setting—learning by doing. I maintain active editorial oversight of instruction, model, and platform choices, but almost everything in the summaries below was written by AI. In the context of what we’re doing, I see this as a feature, not a bug. By experiencing the highs, and yes, the lows, we can better understand both the possibilities and the limitations of this new generation of AI.

Custom GPT for Light Edits

Click above to access a custom GPT we spun up to help clean up our own writing. It’s designed to apply the lightest of touches. If you’re a paying ChatGPT customer, it’s free to use (we don’t get a cut). Feel free to play around with it as much as you’d like.

Learnings & Takeaways

In this interview, you’ll learn:

Why PSPRS focuses its AI program on two pillars: operational efficiency and decision-making.
How local, offline LLMs (e.g., Llama, Gemma) plus simple RPA handle secure document intake and field extraction.
Which unstructured-to-structured fields matter in PE diligence (partners, fund size/vintage, compliance officer, carry terms, portfolio metrics).
How ML improves predictions on small, non-normal, dependent datasets compared with classic linear models.
Where deep learning fits—pattern discovery in unstructured diligence—alongside human review.
What PSPRS’s governance looks like: decision logging, confidence calibration, human-in-the-loop, and parallel model vs. human runs.
How screening evolves toward a manager web portal once factor importance stabilizes with enough observations.
Why PSPRS runs on-prem first (security, cost) and what compute is required for large local models.
How team design (investing + data science) reduces bias and accelerates build-out.
Why PSPRS is skeptical of sentiment scraping as a trading edge amid flows, constraints, and misinformation.

Some takeaways:

Automate high-volume, low-value work to free humans for judgment. PSPRS targets doc retrieval, PDF parsing, field extraction, and first-draft memo generation so PMs spend time debating meaning, not chasing files.
Local LLMs solve near-term security and cost constraints. By running Llama/Gemma entirely offline and pointing models at staged repositories, PSPRS avoids sharing sensitive GP materials while proving value with quick R&D loops.
Structured outputs from PDFs are the backbone of the stack. Consistent extraction of partners, vintages, carry terms (with synonyms), compliance officers, and operating metrics feeds a queryable database that supports screening and attribution.
Use ML where statistical assumptions break. Small samples, non-normality, and dependence make linear regressions brittle; PSPRS applies ML to rank features and predict outcomes on structured data.
Treat deep learning as a black box—govern it accordingly. For unstructured packets, DL surfaces patterns, but PSPRS pairs it with calibration, clear decision scope, and parallel runs before granting decision authority.
Governance is a growth enabler, not a brake. Decision logging with explicit success criteria and confidence bands builds trust internally and with the board while de-biasing post-mortems.
Screening will shift to a portal with fewer, higher-signal fields. After enough observations, PSPRS will publish the handful of inputs that matter most, letting managers self-submit and enabling faster triage.
Right-size compute to the documents you actually read. A ~70B-parameter local model with ~96 GB RAM supports long filings and large context windows—enough for 10-Qs and data rooms without cloud exposure.
Blend backgrounds to reduce institutional bias. Pairing investors who learned data science with scientists learning investing keeps feature selection honest and problem framing grounded.
Be critical of sentiment feeds as alpha. Regulatory flows, rebalancing mechanics, and misinformation weaken any causal path from social sentiment to realized trades; PSPRS prioritizes verifiable, higher-signal data.
Go to interview on YouTube

FAQs

Q1: What are PSPRS’s two primary AI use cases?

Operational efficiency and decision quality. Efficiency covers automated document intake (data rooms, PDFs), structured-field extraction, and first-draft memo generation. Decision quality covers ML on structured data and deep-learning pattern discovery in unstructured diligence materials, with calibration against stated confidence.

Q2: What specific efficiency wins are in scope right now?

Robotic access to proprietary portals (including two-factor workflows)
Bulk download of Private Placement Memorandums (PPMs), Due Diligence Questionnaires (DDQs), spreadsheets, filings
PDF/OCR parsing to extract fields (partners, portfolio metrics, fund size/vintage, compliance officer, carry terms)
Normalization into a queryable database
Auto-draft investment memos for human review

Q3: How is PSPRS handling data security?

Local-first. Large language models (LLMs) run on PSPRS machines, pointed at on-prem documents. No internet connection is required for extraction, fine-tuning, or memo drafts. Enterprise cloud will be evaluated after proof-of-concept success.

Q4: Which models and tools are being used?

Open-source LLMs such as Llama and Gemma for local use; Robotic Process Automation (RPA) scripts for retrieval; a house database for normalized fields; attribution tools (e.g., LIME/SHAP-style methods, activation maps) where interpretability helps.

Q5: What compute is required for the local LLM approach?

A high-memory workstation. For example, a ~70B-parameter model requires on the order of ~96 GB RAM to comfortably process long filings within a large context window.

Q6: What does the manager-selection workflow look like today?

Front-loaded quantitative requests to GPs, followed by model-assisted analysis of returned spreadsheets and data-room documents. Screening filters will tighten after enough observations establish which factors truly matter.

Q7: How will the external-facing data collection evolve?

A manager web portal is planned. Once factor importance stabilizes, PSPRS will publish a concise set of required fields so managers can self-submit the most predictive information up front.

Q8: How is explainability addressed?

Every material recommendation—human or model-assisted—includes success criteria and a confidence level. PSPRS tracks calibration (e.g., “80% confident” predictions should land near 80% realized accuracy). Interpretable models and attribution tools are used where practical.

Q9: What is the governance model?

Decision logging with explicit definitions of success
Calibration tracking across confidence bands
Human-in-the-loop for scope and overrides
Board education on a recurring cadence (semiannual planned)
Parallel runs for deep-learning decisions until performance is proven

Q10: How does PSPRS handle small samples and messy data?

By design. ML methods handle nonlinearity, non-normality, and dependence better than classic linear models. For alternatives, the pipeline converts unstructured PDFs into consistent rows, then builds evidence over time via calibration and parallel testing.

Q11: What is PSPRS’s stance on market sentiment analysis?

Skeptical. Flows, regulatory constraints, and misinformation weaken the link between surface sentiment and actual trades. Priority is on verifiable, higher-signal data tied to manager quality and portfolio fundamentals.

Q12: What KPIs matter for the efficiency program?

Cycle-time reduction (from data-room access to memo), throughput (docs/fields processed per day), staff hours saved, and first-draft memo quality (edit distance vs final).

Q13: What’s on the medium-term roadmap?

A multi-agent system:

The goal is to remove low-value, high-volume work from the team and focus human time on judgment.

Q14: How are bias and overfitting mitigated?

Team design blends investment and data-science backgrounds to challenge assumptions. Feature sets are constrained to auditable, repeatable fields. Models are judged by out-of-sample calibration and tracked against explicit confidence bins.

Q15: What must an asset owner have in place to do this well?

Talent, data, compute, governance.

Embedded data scientists working hand-in-hand with PMs
A pipeline that turns PDFs into structured, versioned records
Sufficient local compute for private LLM workflows
A written governance program with calibration, HIL, and board education

Q16: How do humans and models share responsibility for decisions?

Models prepare and score. Humans decide. Deep-learning recommendations run in parallel with investment-team decisions until the evidence base is large and stable. Overrides are documented like any other decision.

Q17: What does “good enough to ship” look like for productionization?

Stable extraction accuracy on key fields (measured against human labels)
Demonstrated calibration at chosen confidence thresholds
Documented decision scope and escalation paths
Board-level fluency and sign-off on policy

Q18: What is the ultimate target state for the memo process?

End-to-end automation for first drafts: models fetch, extract, analyze, and assemble a complete memo with sources, tables, and attribution. PMs then edit for judgment, risk, and fit—without spending time on retrieval, reformatting, or basic calculations.

Go to interview on YouTube

Mind Map

Please share if you know of a tool capable of creating clean, professional looking mind map graphics. We lightly played around with Mermaid and that didn’t cut it.

I. Central Theme

A. One‑line Summary

AI at PSPRS centers on two pillars: Operational Efficiency and Decision Quality, under a clear governance program.

B. Why It Matters

Free investment staff from high‑volume, low‑value work.
Improve screening and underwriting rigor with calibrated models.
Build trust with stakeholders via documentation and education.

II. Operational Efficiency (RPA → Extraction → Drafts)

A. Document Ingress

Navigate proprietary portals and 2FA; bulk‑download data‑room materials (PPMs, DDQs, spreadsheets, filings).

B. Unstructured → Structured

Parse PDFs/OCR to extract canonical fields: partners_count; portfolio‑company metrics; fund_size; vintage_year; track‑record stats; compliance_officer; carry/“performance bonus” synonyms.

C. Normalization & Storage

Write extracted fields to a versioned, queryable database for analysis and reporting.

D. First‑Draft Investment Memos

LLM‑generated drafts assembled from extracted tables and standard sections; PMs review and edit.

E. Verification Loop

Human spot checks; precision/recall tracking on key fields; error triage back into prompts/parsers.

III. Decision‑Making (Analytics)

A. Machine Learning on Structured Data

Feature discovery and prediction when linear assumptions fail (small N, non‑normality, dependence).

B. Deep Learning on Unstructured Diligence

Pattern surfacing across documents/spreadsheets; acknowledge black‑box aspects.

C. Interpretability Aids

Use feature attribution (LIME/SHAP‑style) and activation‑pattern inspection where helpful.

D. Calibration & Parallel Runs

Track realized accuracy vs. stated confidence bins (e.g., 70%, 80%).
Keep DL recommendations in parallel with human decisions until evidence is strong.

IV. Governance & Oversight

A. Decision Logging

Every material recommendation records success definition and confidence; benchmark PMs and models over time.

B. Human‑in‑the‑Loop (HIL)

Define decision scope and override paths; models assist, humans decide.

C. Board Education

Build fluency with recurring briefings (semiannual planned); align vocabulary and risk controls.

D. Rollout Discipline

Start with verifiable tasks (labeling, math checks), then expand to predictive uses as calibration evidence builds.

V. Data & Information Architecture

A. Sources & Constraints

GP data rooms (PPMs, DDQs), filings (10‑Q), spreadsheets—heavy PDF bias; limited sample sizes in alternatives.

B. Field Dictionary & Schema

Standardize key entities (team, economics, track record, compliance, operating metrics) for reliable extraction and analytics.

C. Manager Web Portal (Planned)

After sufficient observations, publish the handful of factors that matter; enable self‑submission to speed screening.

VI. Tooling & Stack

A. Local LLMs

Run Llama/Gemma offline on PSPRS machines for security; point models at staged repositories.

B. Automation & Parsers

RPA for retrieval/staging; PDF/OCR parsers; templated memo assembler; internal database with provenance.

C. Multi‑Agent Vision

Ingest → Clean/Normalize → Analyze/Attribute → Draft Memo → Compliance → Humans.

VII. Compute & Deployment

A. Hardware Profile

~70B‑parameter local model; ~96 GB RAM for long‑context processing (suitable for long filings and multi‑doc synthesis).

B. Security Posture

Local‑first R&D; evaluate enterprise cloud solutions after proof of concept.

VIII. Screening & Workflow

A. Current Practice

Front‑loaded quantitative request to GPs; model‑assisted analysis of returns plus data‑room docs.

B. Evolving Filters

Identify the 5–6 most predictive fields once observations suffice; move to tighter pre‑filters.

C. Philosophy

80/20 filter—accept some false negatives; focus on quality of done deals.

IX. Stances & Opinions

A. Sentiment Analysis

Skeptical as an alpha source (causality may run trade → sentiment; misinformation and regulatory/flow confounds).

B. Explainability

Preferred when feasible; black‑box acceptable under calibration, clear scope, and HIL.

X. Talent & Organization

A. Embedded Data Science

Two data scientists (investor→DS; DS→investing) to counter institutional bias and align with PM workflows.

B. Stakeholders & Enablement

Board and executives with varied AI literacy—bring them along with education and transparent metrics.

C. External Expertise

Former PM (computer‑science focus) consulting on LLM use cases.

XI. Roadmap & Status

A. Now

Authorization to run local LLMs; R&D on extraction and memo drafting; verification‑first approach.

B. Next

Semiannual board education; portal design; progressive expansion of model scope; continue parallel runs.

C. Later

Evaluate enterprise cloud; scale multi‑agent workflow.

D. End‑State Goal

Remove low‑value, high‑volume tasks; humans focus on interpretation and judgment.

XII. Metrics & KPIs

A. Efficiency

Cycle time (access → first draft); docs/fields processed per day; staff hours saved.

B. Quality

Extraction precision/recall on key fields; memo edit distance (model draft → final).

C. Model Performance

Calibration accuracy by confidence bin; stability over time.

XIII. Illustrative Anecdote (Color)

A. Worst Pitch

Formula 1 racetrack in Monterrey; glossy pitch books requested back—illustrates disciplined triage and scarcity realities.

Go to interview on YouTube

Thanks for reading AInvestor! This post is public so feel free to share it.

Disclaimer: The information contained in this newsletter is intended for educational purposes only and should not be construed as financial advice. Please consult with a qualified financial advisor before making any investment decisions. Additionally, please note that we at AInvestor may or may not have a position in any of the companies mentioned herein. This is not a recommendation to buy or sell any security. The information contained herein is presented in good faith on a best efforts basis.

AInvestor

An Asset Allocator's AI Use Cases, Implementation Strategy, and Wishlist, with Mark Steed

Community Wisdom - Sharing the best that we find

Introduction

Learnings & Takeaways

In this interview, you’ll learn:

Some takeaways:

FAQs

Q1: What are PSPRS’s two primary AI use cases?

Q2: What specific efficiency wins are in scope right now?

Q3: How is PSPRS handling data security?

Q4: Which models and tools are being used?

Q5: What compute is required for the local LLM approach?

Q6: What does the manager-selection workflow look like today?

Q7: How will the external-facing data collection evolve?

Q8: How is explainability addressed?

Q9: What is the governance model?

Q10: How does PSPRS handle small samples and messy data?

Q11: What is PSPRS’s stance on market sentiment analysis?

Q12: What KPIs matter for the efficiency program?

Q13: What’s on the medium-term roadmap?

Q14: How are bias and overfitting mitigated?

Q15: What must an asset owner have in place to do this well?

Q16: How do humans and models share responsibility for decisions?

Q17: What does “good enough to ship” look like for productionization?

Q18: What is the ultimate target state for the memo process?

Mind Map

I. Central Theme

II. Operational Efficiency (RPA → Extraction → Drafts)

III. Decision‑Making (Analytics)

IV. Governance & Oversight

V. Data & Information Architecture

VI. Tooling & Stack

VII. Compute & Deployment

VIII. Screening & Workflow

IX. Stances & Opinions

X. Talent & Organization

XI. Roadmap & Status

XII. Metrics & KPIs

XIII. Illustrative Anecdote (Color)

Discussion about this post