Skip to content
AI Voice Agents
AI Voice Agents12 min read0 views

Build a Metered AI Voice Agent with Clerk Billing + Stripe (2026)

Clerk Billing wraps Stripe in 700ms of glue: PricingTable component, has() entitlement checks, and per-minute metered billing. Plug it into a voice agent in one afternoon.

TL;DR — Clerk Billing (0.7% + Stripe fees) gives you a drop-in <PricingTable />, server-side has({ feature }) entitlement, and Stripe Meters for per-minute voice billing — without writing a webhook handler.

What you'll build

Three voice plans (Starter / Pro / Scale), each gating minutes per month. The Realtime endpoint checks entitlement, increments a Stripe Meter on every minute, and shuts off cleanly on cap.

Prerequisites

  1. Next.js 15 + Clerk @clerk/nextjs@^6.
  2. Stripe account connected to Clerk Billing.
  3. OPENAI_API_KEY, STRIPE_SECRET_KEY.

Architecture

flowchart TD
  U[User] --> CL[Clerk Auth + Plan]
  U --> RT[/api/realtime/]
  RT --> EN{has voice_minutes?}
  EN -->|yes| OA[OpenAI Realtime]
  OA --> M[Stripe Meter increment]
  EN -->|no| UP[Upgrade page]

Step 1 — Define plans in Clerk Dashboard

In Clerk Dashboard → Billing, create plans starter_149, pro_499, scale_1499 and a feature voice_minutes. Mark voice_minutes as a metered feature linked to a Stripe Meter voice_minutes_meter.

Step 2 — Drop the pricing table

```tsx // app/pricing/page.tsx import { PricingTable } from "@clerk/nextjs"; export default function Pricing() { return ; } ```

Step 3 — Entitlement gate

```ts // app/api/realtime/route.ts import { auth } from "@clerk/nextjs/server"; import { NextResponse } from "next/server";

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

export async function POST() { const { userId, has } = await auth(); if (!userId) return new NextResponse("auth", { status: 401 }); if (!has({ feature: "voice_minutes" })) return NextResponse.json({ upgrade: true }, { status: 402 });

const r = await fetch("https://api.openai.com/v1/realtime/sessions", { method: "POST", headers: { Authorization: Bearer ${process.env.OPENAI_API_KEY}, "Content-Type": "application/json" }, body: JSON.stringify({ model: "gpt-realtime" }), }); return NextResponse.json(await r.json()); } ```

Step 4 — Meter usage on call end

```ts import Stripe from "stripe"; import { auth, clerkClient } from "@clerk/nextjs/server";

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);

export async function POST(req: Request) { const { userId } = await auth(); const { minutes } = await req.json(); const u = await (await clerkClient()).users.getUser(userId!); const customerId = u.publicMetadata.stripeCustomerId as string;

await stripe.billing.meterEvents.create({ event_name: "voice_minutes_meter", payload: { stripe_customer_id: customerId, value: String(minutes) }, }); return Response.json({ ok: true }); } ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 5 — Hook on call disconnect

In the browser, when the WebRTC peer goes disconnected, POST elapsed minutes to /api/meter.

Step 6 — Cap enforcement

Use has({ feature: "voice_minutes", quantity: { gte: capLeft } }) in middleware to block once the soft cap is hit.

Pitfalls

  • Test mode: Clerk Billing dev sandbox uses Stripe test keys — switch both before launch.
  • Meter granularity: Stripe Meters dedupe by (customer, event_name, idempotency_key, time); always pass a unique identifier per call.
  • Entitlement caching: has() is cached for 30s — for hard caps, query Stripe usage directly.

How CallSphere does this in production

CallSphere prices at $149 Starter / $499 Pro / $1,499 Scale with metered overage, 14-day no-card trial, and 22% recurring Year-1 affiliate. The platform spans 37 agents, 90+ tools, 115+ DB tables, 6 verticals — Healthcare, OneRoof (Next.js 16 + React 19), Salon (NestJS 10 + Prisma), Sales (Node.js 20 + React 18 + Vite). Clerk + Stripe handles auth + billing across all six.

FAQ

Clerk Billing fee? 0.7% per transaction on top of Stripe fees.

Can I run B2B teams? Yes — Clerk has Organizations with seat-based plans and per-org features.

Can I avoid Clerk and use Stripe Customer Portal directly? Yes, but you give up the React components and has() helpers.

EU compliance? Clerk offers SCA-ready Stripe checkout out of the box.

Sources

## How this plays out in production One layer below what *Build a Metered AI Voice Agent with Clerk Billing + Stripe (2026)* covers, the practical question every team hits is multi-turn handoffs between specialist agents without losing slot state, sentiment, or escalation context. Treat this as a voice-first system from the first prompt: the agent's persona, its tool surface, and its escalation rules all flow from that single decision. Teams that ship fast tend to instrument the loop end-to-end before they tune any single component, because the bottleneck is rarely where intuition puts it. ## Voice agent architecture, end to end A production-grade voice stack at CallSphere stitches Twilio Programmable Voice (PSTN ingress, TwiML, bidirectional Media Streams) to a realtime reasoning layer — typically OpenAI Realtime or ElevenLabs Conversational AI — with sub-second response as a hard SLO. Anything north of one second of perceived silence and callers either repeat themselves or hang up; that single number drives the whole architecture. Server-side VAD with proper barge-in support is non-negotiable, otherwise the agent talks over the caller and the conversation collapses. Streaming TTS with phoneme-aligned interruption keeps the cadence natural even when the user changes their mind mid-sentence. Post-call, every transcript is run through a structured pipeline: sentiment, intent classification, lead score, escalation flag, and a normalized slot extraction (name, callback number, reason, urgency). For healthcare workloads, the BAA-covered storage path, audit logs, encryption-at-rest, and PHI-safe transcript redaction are wired in from day one, not bolted on at compliance review. The end state is a system where every call produces a row of structured data, not just a recording. ## FAQ **What is the fastest path to a voice agent the way *Build a Metered AI Voice Agent with Clerk Billing + Stripe (2026)* describes?** Treat the architecture in this post as a starting point and instrument it before you tune it. The metrics that matter most early on are end-to-end latency (target < 1s for voice, < 3s for chat), barge-in correctness, tool-call success rate, and post-conversation lead score distribution. Optimize whatever the data flags as the bottleneck, not whatever feels slowest in your head. **What are the gotchas around voice agent deployments at scale?** The two failure modes that bite hardest are silent context loss across multi-turn handoffs and tool calls that succeed in dev but get rate-limited in production. Both are solvable with a proper agent backplane that pins state to a session ID, retries with backoff, and writes every tool invocation to an audit log you can replay. **What does the CallSphere outbound sales calling product do that a regular dialer does not?** It uses the ElevenLabs "Sarah" voice, runs up to 5 concurrent outbound calls per operator, and ships with a browser-based dialer that transfers warm calls back to a human in one click. Dispositions, transcripts, and lead scores write back to the CRM automatically. ## See it live Book a 30-minute working session at [calendly.com/sagar-callsphere/new-meeting](https://calendly.com/sagar-callsphere/new-meeting) and bring a real call flow — we will walk it through the live outbound sales dialer at [sales.callsphere.tech](https://sales.callsphere.tech) and show you exactly where the production wiring sits.
Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Infrastructure

Build a Multi-Region Voice Agent on Fly.io for Sub-500ms Global Latency (2026)

Deploy a voice agent to Fly.io's anycast network across 6 regions: Tokyo, Frankfurt, São Paulo, Sydney, Virginia, Los Angeles. fly-replay routes traffic to the closest healthy region.

AI Voice Agents

Build an AI Voice Agent with SolidStart + SolidJS + OpenAI Realtime (2026)

SolidStart 1.3 + Solid 1.9 deliver fine-grained reactivity with no VDOM — voice agents render at 30% lower CPU than React. Plug WebRTC into Solid signals.

Financial AI

Stripe Pay-by-Agent: How AI Agents Are Buying Things in 2026

Stripe's Pay-by-Agent and Agent SDK shipped in 2026 — letting AI agents transact on behalf of users with proper consent. Here's the protocol, the merchant rollout.

AI Infrastructure

TensorFlow.js + ML5.js Voice Agents in the Browser: 2026 Architecture

Pre-trained Speech Commands models, ml5.js wrappers, and TensorFlow.js with the WASM/WebGPU backend let you ship a voice agent with wake-word, intent, and tone detection — all client-side.

AI Voice Agents

Build an AI Voice Agent with Nuxt 3 + Vue 3.5 + OpenAI Realtime (2026)

Nuxt 3 Nitro server routes mint ephemeral OpenAI keys, Vue 3.5 composables wrap WebRTC, and Pinia holds the call state. Sub-700ms voice agent in 200 lines.

AI Voice Agents

AI Billing Question Agent: The HIPAA Boundary Most Practices Get Wrong

A patient billing line is treated as casual operations until the agent reads back a CPT code over a voicemail. The 2026 HIPAA-aligned billing workflow is tighter than most practices realize.