Lab / Case Study

Product shape is what you refuse.

Scope lessons from building an enterprise voice AI platform from zero. Narrow control point, MCP-first bet, the pivot I got wrong, the shape that held.

Case Study Head of product, from zero Real-time conversational execution

The challenge: Enterprise voice AI scope decision. Every customer wanted full-stack CCaaS. Yes-to-everything would chase revenue but lose product shape. Pick a narrow control point and refuse the rest.
The bet: Real-time conversational execution layer between voice runtime and enterprise systems. MCP-first integration. Standalone-sellable evaluation module pivot (got wrong, walked back). Consolidation pattern that held.
Business impact: Two paying enterprise customers in production, multi-million-dollar deal pipeline, live airline deployment. Scope discipline preserved engineering velocity and integration depth where competitors fragmented.

Context

A real-time conversational execution platform for enterprise voice agents. I came in as head of product from zero, reporting to the CEO, and owned the product surface end to end. Strategy, roadmap, specs, architecture decisions with engineering, customer-facing POCs, GTM. Two paying enterprise customers in production, a multi-million-dollar deal pipeline, and a live airline deployment over the arc I'm describing.

Problem I was solving

Large enterprises wanted voice agents for contact center volume. The market was split. Legacy CCaaS vendors bolted AI onto stacks that didn't meet real-time bars. New LLM-native voice tools demoed well but couldn't execute tools against enterprise systems or hold reliability at production load. The gap was a real-time conversational execution layer. Orchestration, tool execution, latency, reliability, sitting between the voice runtime and the enterprise's systems.

The strategic question was where to draw product boundaries. Every enterprise customer wanted the full stack. Voice, telephony, CRM, analytics, QA, agent desktop. The natural answer was yes-to-everything, because the revenue was there. The hard answer was to pick a narrow control point and refuse the rest.

What I considered

Three product shapes.

Full-stack CCaaS. Own everything the customer asked for. Attractive because every RFP asked for this shape. Rejected because we'd be a distant third on every axis against incumbents who'd been building each layer for a decade, with zero differentiation on the one layer that was actually new.

Voice AI as a point product. Sell the voice agent standalone, integrate with whatever the customer already ran. Attractive because scope was small and the sales cycle shorter. Rejected because point products in the voice space were commoditizing, and selling without orchestration meant every deal hit a different integration ceiling.

Narrow control point. Real-time conversational execution. Own the agent configuration layer, the tool execution layer, the latency and reliability primitives, the post-call evaluation loop. Partner on telephony, on CRM, on analytics-as-a-product. This was the choice. Least intuitive from "what do customers ask for." Most defensible from "what is actually hard and getting harder."

What I shipped

The platform over this arc includes an agent configuration surface with prompts, tools, models, voice, call stages, guardrails, and versioning. A platform-level tool registry with a built-in tester. A post-call evaluation module. A chat engine on a frontier hosted LLM provider, separate from the voice runtime. A consolidated platform API that absorbed CRM and post-call analytics as internal packages.

The biggest architectural bet was MCP-first. Every new capability ships as an MCP server, not a bespoke SDK and not an agent-local tool definition. I made this call early, before MCP had mass adoption. The team had been burned by bespoke tool APIs in prior products, and customers kept asking for tool portability across agents. The tradeoff was betting on a standard that wasn't yet the default. If it hadn't taken off, we'd have rebuilt. It took off.

The second bet was separation of chat and voice at the runtime level. Voice on a real-time voice runtime with a frontier open-weight LLM and premium TTS. Chat on a frontier hosted LLM with a channel-agnostic chat API. Competitors were trying to unify both on one runtime. I rejected that because latency profile, interruption behavior, and turn-taking semantics are not the same problem, and a shared abstraction would be worse at both than two focused ones.

The hardest call post-ship: the evaluation module pivot

Post-call evaluation and analytics shipped first as a standalone service in its own repo, its own deployment, its own database, its own team allocation. The logic: QA and analytics had standalone-sellable value, so they should be a standalone service. Engineering pushed back that this created ops surface area we couldn't sustain at current headcount. I overruled, because the product case for selling QA to customers who weren't buying voice AI felt real.

Six months in, I was wrong. Three things forced the pivot.

The evaluation module was tightly coupled to every call event stream in the voice platform. The service boundary was not a clean domain seam. It was a fiction.
The separate ops stack was eating engineer-weeks per sprint in deployment glue. Message queues, auth, observability, a second database to babysit.
The standalone sales motion hadn't produced a deal in two quarters. Every customer who bought QA was already buying voice. The "standalone-sellable" thesis was not holding.

I pulled it back into the platform API as an internal package. Roughly 18 data models, 4 services, 28 aggregations, 25 routes moved in. Post-call evaluation now runs fire-and-forget against an internal URL. Four dead frontend pages and ten dead components dropped out. The team got back meaningful sprint capacity. The standalone-sellable thesis, parked.

The pattern I extracted. When a service doesn't warrant its own ops surface, collapse it into the platform API. CRM followed the same path into the same repo by the same reasoning, pulling in another 40 files. The platform API has become the consolidation target for anything not earning its own boundary.

Other things I had to walk back

Feature categorization on the ops board used colored labels. Replaced with prefixed card names after labels got overloaded with "Blocked" and "Bug" flags and the board became unreadable at a glance.

Chat and voice were considered for a unified abstraction in early specs. Dropped that for the reasons above, and the separation has held under production load across both surfaces.

The standalone pitch for the evaluation module appeared in sales decks for two quarters. I killed the slide after the pivot. Selling a package as a product wastes the first meeting.

Where the scope discipline paid off

The narrow control point held under revenue pressure. Customers still asked for CCaaS, CRM, telephony, standalone analytics. I said no with a documented boundary every time. The payoff was that engineering time compounded on the one thing that was actually differentiated, and when the voice agent deployments went live, they worked. Two paying enterprise customers, a multi-million-dollar deal pipeline, and an airline agent live and performing in prod.

The MCP-first bet paid off separately. When the standard matured, we were already there. Every new capability since ships as an MCP server in 1 to 2 weeks steady state. New agents compose from the tool registry, not from bespoke tool code. That compounding is visible in every deal cycle now.

Metrics, ranges only

Platform consolidation reduced deployable services by 40 to 60 percent across the evaluation module and CRM pullbacks. Sprint capacity freed: 20 to 30 percent in engineering-weeks post-collapse. Time from new capability spec to shipped MCP server: 1 to 2 weeks steady state. Customer-ask-to-scope-boundary rejections: 10+ documented and held, zero reversals. Two paying enterprise customers, multi-million-dollar deal pipeline, airline deployment live.

What I learned as a PM

Product shape is decided by what you refuse. Saying yes to every customer ask made us indistinguishable from incumbents. Saying no to CCaaS, CRM, telephony, and analytics-as-a-product was what made the control point legible. I'd write the scope-boundary document before the first customer meeting, not after, and treat it as a living artifact that every decision is tested against.

Standalone-sellable is a seductive frame and usually wrong. The evaluation module looked like two products in one repo. It was one product with a pretend seam. When someone claims "this could be standalone," I now check three things. Its own data model. Its own deployment rhythm. A customer who'd buy one without the other. If any fails, it's a package, not a service.

MCP-first worked because it was a single bet on a converging standard. The general rule: when you're early on an emerging standard that solves a real pain, going all in is cheaper than hedging. Hedging would have doubled our tool API surface for a year.

Consolidation is a product decision, not an engineering cleanup. When the evaluation module and CRM pulled back into the platform API, that wasn't refactoring. It was the product boundary catching up with reality. I treat the shape of the service graph as part of the product spec now, not a deployment detail the team figures out later.

← back to lab read: voice ai in production →