Prompt Volume Won't Save You

The question every executive leader (eventually) asks their GSO team: “What is the monthly volume for this [topic]?”

It’s a reasonable question.

In SEO, keyword volume was step one. You found the queries with demand, checked if you could rank, and worked backwards from there. It was the starting point of every content strategy for the past 20+ years. If you’ve spent your career in SEO, asking about keyword search demand is pure instinct.

The problem in today’s AI-first world is that prompt volume doesn’t mean what keyword volume meant, and importing that mental model into Generative Search Optimization sends your team in a direction that doesn’t connect to business outcomes.

Why keyword volume worked

Monthly search volume data was a fairly effective metric because search (organic & paid) is inherently deterministic. A thousand people searching for “best project management software” all landed on the same results page. And, for the most part, the results were fixed at any given moment. Volume was a rough indicator about the size of the audience from a stable result set, and your ranking position told you how much of that audience you captured.

The math was clean: search volume × click-through rate × conversion rate = expected value. You could forecast, prioritize, and build a business case from a spreadsheet just by triangulating a few data points.

This model was simple, legible, and definitely actionable. The high-volume term where you rank well was your imperative to defend it. In contrast, the high-volume query where you don’t rank for is the headroom and business opportunity.

Keyword volume also had a structural property that made it compound: granular long-tail variants rolled up into broader head terms. “Best project management software for remote teams” and “top PM tools for startups” both laddered into the parent volume for the topical category. You could model total addressable demand by aggregating from the bottom up. That gave SEO teams a credible answer to the leadership question: “How big is this opportunity?”

The entire industry’s tooling, reporting, and pricing models were built on this foundation and it worked because the operating landscape was deterministic, the search results were “stable”, and the demand signal was observable.

Why prompt volume won’t work

There are a few inherent measurement issues in this space that do not have clean solutions.

Unable to count

There’s no canonical prompt. In SEO, “best project management software” was a discrete, trackable keyword. In GSO, the same intent surfaces as dozens if not hundreds of natural language variations and each one can produce completely different model responses.

The above underscores how six prompts with the same intent can still have a citation distribution that shifts depending on how the question is phrased.

Volume != Citations

In SEO, those long-tail queries all aggregated into a head term with decent volume. In GSO, each phrase is a different draw from a probabilistic system and there is no head term to aggregate into. Assigning “volume” to any single prompt string is measuring one point in a cloud, and summing those points doesn’t give you a meaningful total the way aggregating keyword variants did.

Even if you had reliable demand data, it wouldn’t answer the question that actually matters: does your brand get cited? In SEO, volume and opportunity had a positive correlation where more searches meant more potential traffic if you achieved the top position. For the nascent GSO industry, volume only indicates the size of the audience but nothing about whether the model mentions you when that target audience asks a question.

The top-right quadrant is the ideal real estate for SEO: high demand, high opportunity. However, the bottom-right quadrant is where SEO instincts send you: big audience, so it must be worth pursuing. For AI search, that can be a wall if the model never cites you there regardless of volume. Meanwhile, the top-left is a “smaller” audience but your brand shows up consistently where you actually have influence and you’d never find it by sorting on volume alone.

Nobody has the data anyway

Keyword volume worked because SEMrush, Ahrefs, Google’s Keyword Planner, etc. all provided estimates from clickstream data and ad auction signals. The current (misguided) attempts to measure prompt volume are built on a tiny, non-representative slice of browser extension and app users. I suspect the extrapolations are wide and the corrections are based on assumptions rather than calibration data. Even the tools offering this data acknowledge it’s directional at best. And the platforms themselves (OpenAI, Google, Anthropic, Perplexity) have no incentive to expose it…yet.

Where prompt volume can work

There’s an important exception: the ads business.

If you’re ChatGPT selling sponsored citations, or Google monetizing their AI Overviews, prompt volume is exactly the metric that matters. It’s the same economics that powered search advertising: demand density determines inventory pricing, audience targeting, and revenue forecasting. An advertiser buying visibility in LLM responses needs to know the audience size, just like they needed keyword volume to plan their Google Ads budget.

Prompt volume for the ads business is a purchasing decision: WHERE should I spend money?

Prompt volume for organic GSO is…what, exactly?

The infinite pro(blem|mpt)

There’s a deeper philosophical issue that makes the volume question even less useful.

For organic search, you could build a reasonably comprehensive keyword list for a given category. The space was large but finite. Google’s Keyword Planner would show you the universe. Between your head terms, torso modifiers, and long-tail queries, you could be confident you’d captured the majority of the observable demand with a few hundred keywords.

For generative search, the prompt space is genuinely infinite and unbounded. Every variation in phrasing, context, persona, and specificity produces a different prompt that the model may respond to differently. “What’s the best video editing tool?” is one prompt. “I’m a YouTuber with 50K subscribers looking to grow my audience but need to switch from Premiere, what should I try? CapCut or something else?” is a different prompt with the same intent but different context (and different cited brands for each response!).

Multiply that by conversation history (the user’s prior messages shape what comes next), system prompt differences across platforms, and the fact that a five-paragraph prompt with constraints produces fundamentally different outputs than a one-line question.

So how many prompts do you need to track? Is it 10? 100? 500?

The answer is that there’s no number where you’ve “covered” the space the way you could with keywords. Going from 10 to 50 prompts per intent adds real signal. Going from 50 to 150 adds some, but from 150 to 500 is when diminishing returns starts kicking in. You never reach completeness because the next user’s prompt will be phrased differently than anything in your tracked set.

This means prompt tracking in GSO isn’t inventory but representative sampling. You don’t need to count every prompt any more than an epidemiologist needs to survey every person in a country. You need a sample that’s designed to be “good enough” for the intent space you care about.

This is a fundamentally different design problem than “find the high-volume keywords.” Instead we focus on “what are the intent categories that matter to my brand? Within each category, what are the meaningful variations in phrasing, specificity, and persona? How many passes do I need per prompt to account for the model’s non-determinism?”

The quality of your prompt set is measured by whether it accurately covers the intent space with enough variation to produce stable estimates of citation probability.

Find & replace fallacy

I suspect the reason SEO’s continue searching for the holy grail that is prompt volume is purely operational…it’s an easy “find & replace” command.

The status quo is built around monthly search volume. Keyword research tools, content prioritization frameworks, ranking report templates, and performance forecasting models (to name a few) are all anchored in some form of “we track X keywords with Y combined search volume.”

There is less cognitive load and workflow overhaul to slot “prompt volume” into existing SEO methodology and keep the machinery running. However, this machinery doesn’t transfer over into the new world where it relied on a deterministic system, quantifiable queries, and observable signals. Generative search is probabilistic, conversational, and any user attribution is virtually nonexistent.

This isn’t a criticism of the people asking for prompt volume. In fact, I believe their instinct is right because you need to know where to allocate your time and resources. But the unit of measurement has to change. Not from “keywords” to “prompts” which is just a marketing relabel but from enumeration to sampling, volume to coverage, or counts to probability.

That shift, and what it looks like in practice, is what my next post will be about.