There are two kinds of job descriptions for AI Product Manager roles right now.
The first kind reads like someone copy-pasted a traditional PM job description and added “experience with LLMs preferred” at the bottom. The second kind asks for a PhD in machine learning, five years of experience with a technology that’s eighteen months old, and “strong intuition for model behavior.”
Neither of them tells you what an AI PM actually does. That’s what this post is.
What the standard answer gets wrong
The standard answer is: “An AI PM is a PM who works on products powered by AI.” That’s technically true and completely useless.
The version you’ll hear from recruiters: “You need to understand machine learning and translate between engineers and stakeholders.” Also not wrong. Also not the point.
Here’s what they’re all missing: the nature of what you’re actually specifying has changed.
The core difference — specifying behavior vs. specifying logic
Traditional product management is fundamentally about specifying logic.
You write requirements that describe deterministic behavior: if the user does X, the system does Y. The engineering team implements that logic. You QA it. It either works or it doesn’t. The success criteria are binary.
AI product management is about specifying behavior in systems that are non-deterministic by design.
You are not writing logic. You are writing constraints, examples, and evaluation criteria for a system whose outputs you cannot fully predict in advance. You define what “good” looks like. You design the measurement apparatus to detect it. And you iterate on the system until it hits that bar — knowing it will never hit it 100% of the time.
This is a fundamentally different cognitive task. And it requires skills that traditional PM work doesn’t develop.
The three skills that actually matter
1 — Eval design
An eval is how you measure whether your AI system is doing what you want it to do. This sounds obvious until you try to do it. Defining what “good” means for a generative system — in a way that’s measurable, automatable, and actually correlated with user value — is hard.
Most teams skip it or do it wrong. The AI PMs who get this right ship better products faster because they can make decisions based on evidence instead of vibes.
You don’t need to write the evals yourself. You need to know enough to define the criteria, catch bad eval design when you see it, and understand what the numbers are actually telling you.
2 — Context window thinking
Everything that happens in an LLM interaction is shaped by what’s in the context window: the system prompt, the user input, the retrieved documents, the conversation history, the tool results. The model has no memory outside of it. It has no knowledge outside of it (unless you give it tools to retrieve things). Its behavior is entirely a function of what’s in that window.
Understanding this changes how you specify AI features. You stop thinking about “what should the AI know” and start thinking about “what information needs to be in context at the moment the model generates a response, and how does it get there.” That’s a retrieval and orchestration design problem, not a model selection problem.
3 — Probabilistic quality bars
Traditional PMs think in terms of acceptance criteria: the feature is done when it does X. AI PMs need to think in terms of distributions: the feature ships when it achieves quality bar Y on evaluation set Z, with a confidence interval we’re comfortable with.
This is a mindset shift, not a technical skill. But it requires enough technical literacy to have the conversation with your engineering team without either deferring to them entirely or bulldozing them with uninformed opinions.
What technical knowledge you actually need
You don’t need to be able to train a model. You don’t need to understand backpropagation. You don’t need to know how transformers work at the architecture level.
You do need to understand:
- How RAG works and what the failure modes are. If your AI feature retrieves context from a knowledge base, you need to understand chunking, embedding, retrieval, and reranking well enough to diagnose why it’s returning bad results.
- What a system prompt is and how prompt engineering affects model behavior. You should be able to write and iterate on system prompts yourself, not just hand it to an engineer.
- What evaluations are, how they’re designed, and what makes them reliable. This is the most underrated skill in the field right now.
- The latency and cost tradeoffs of different model and architecture choices. You don’t need to be the expert but you need to not be lost.
- What agents and tool use actually mean in practice. Understanding what it means for a model to use tools, why it sometimes uses the wrong one, and how orchestration frameworks work will separate you from people who are just pattern matching on buzzwords.
The career path reality
AI PM is not yet a clearly defined career track at most companies. What you’ll find in practice:
- At large tech companies: dedicated AI PM roles exist but often sit within existing product areas. The AI is a feature of the product, not the product itself.
- At AI-native startups: the PM role is often much closer to the model — you’re defining prompts, designing evals, and working directly on the AI behavior. Higher leverage, higher ambiguity.
- At enterprise software companies: AI is being bolted onto existing products. This is where most of the jobs are right now, and where the PM role looks most like traditional PM with an AI layer.
The skills I described above — eval design, context window thinking, probabilistic quality bars — matter most in the second category but are useful in all three.
The gap between “PM who has worked on AI features” and “AI PM” is mostly about whether you understand the measurement problem. Everything else is learnable on the job.
Next week: RAG explained for PMs — no math, just the concepts you need to make good product decisions around retrieval systems.