API Pagination for AI Agent Data: Cursor-Based, Offset, and Keyset Pagination
Compare cursor-based, offset, and keyset pagination strategies for AI agent APIs. Includes FastAPI implementations, performance analysis, and guidance on choosing the right approach for your data access patterns.
Why Pagination Matters for AI Agent APIs
AI agents generate enormous volumes of data: conversation histories, tool call logs, evaluation results, and audit trails. Returning all records in a single response is impractical. Without pagination, a single query for an agent's conversation history could return millions of messages, consuming excessive memory, saturating the network, and timing out.
Pagination splits large result sets into manageable pages. The three dominant strategies — offset-based, cursor-based, and keyset pagination — each offer different performance characteristics and consistency guarantees.
Offset-Based Pagination: Simple but Fragile
Offset pagination uses a page number or offset combined with a limit. It is the most intuitive approach and maps directly to SQL's LIMIT and OFFSET clauses.
flowchart LR
CLIENT(["Client SDK"])
GW["API Gateway<br/>auth plus rate limit"]
APP["FastAPI app<br/>handlers and DI"]
VAL["Pydantic validation"]
SVC["Service layer<br/>business logic"]
DB[(Database)]
QUEUE[(Background queue)]
OBS[(Tracing)]
CLIENT --> GW --> APP --> VAL --> SVC
SVC --> DB
SVC --> QUEUE
SVC --> OBS
SVC --> CLIENT
style GW fill:#4f46e5,stroke:#4338ca,color:#fff
style APP fill:#f59e0b,stroke:#d97706,color:#1f2937
style DB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
from fastapi import FastAPI, Query
from pydantic import BaseModel
from sqlalchemy import select, func
from sqlalchemy.ext.asyncio import AsyncSession
app = FastAPI()
class PaginatedResponse(BaseModel):
data: list[dict]
total: int
offset: int
limit: int
has_more: bool
@app.get("/v1/agents/{agent_id}/messages")
async def list_messages_offset(
agent_id: str,
offset: int = Query(0, ge=0),
limit: int = Query(20, ge=1, le=100),
db: AsyncSession = Depends(get_db),
):
total = await db.scalar(
select(func.count())
.select_from(Message)
.where(Message.agent_id == agent_id)
)
rows = await db.execute(
select(Message)
.where(Message.agent_id == agent_id)
.order_by(Message.created_at.desc())
.offset(offset)
.limit(limit)
)
messages = rows.scalars().all()
return PaginatedResponse(
data=[m.to_dict() for m in messages],
total=total,
offset=offset,
limit=limit,
has_more=offset + limit < total,
)
The problem with offset pagination is performance degradation at scale. OFFSET 1000000 forces the database to scan and discard one million rows before returning results. It also suffers from consistency issues: if new records are inserted while the client is paginating, pages can shift, causing duplicated or skipped items.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Cursor-Based Pagination: Consistent and Scalable
Cursor pagination uses an opaque token representing the position of the last item on the current page. The server decodes the cursor to determine where to start the next page, avoiding the performance cliff of large offsets.
import base64
import json
def encode_cursor(created_at: str, id: str) -> str:
payload = json.dumps({"created_at": created_at, "id": id})
return base64.urlsafe_b64encode(payload.encode()).decode()
def decode_cursor(cursor: str) -> dict:
payload = base64.urlsafe_b64decode(cursor.encode()).decode()
return json.loads(payload)
class CursorPaginatedResponse(BaseModel):
data: list[dict]
next_cursor: str | None
has_more: bool
@app.get("/v1/agents/{agent_id}/conversations")
async def list_conversations_cursor(
agent_id: str,
cursor: str | None = Query(None),
limit: int = Query(20, ge=1, le=100),
db: AsyncSession = Depends(get_db),
):
query = (
select(Conversation)
.where(Conversation.agent_id == agent_id)
.order_by(
Conversation.created_at.desc(),
Conversation.id.desc(),
)
)
if cursor:
decoded = decode_cursor(cursor)
query = query.where(
(Conversation.created_at < decoded["created_at"])
| (
(Conversation.created_at == decoded["created_at"])
& (Conversation.id < decoded["id"])
)
)
rows = await db.execute(query.limit(limit + 1))
items = rows.scalars().all()
has_more = len(items) > limit
items = items[:limit]
next_cursor = None
if has_more and items:
last = items[-1]
next_cursor = encode_cursor(
last.created_at.isoformat(), str(last.id)
)
return CursorPaginatedResponse(
data=[c.to_dict() for c in items],
next_cursor=next_cursor,
has_more=has_more,
)
The trick of fetching limit + 1 items lets you determine whether more pages exist without running a separate count query.
Keyset Pagination: Maximum Database Performance
Keyset pagination is a variant of cursor pagination that directly uses column values rather than opaque tokens. It requires a strict, unique ordering and leverages database indexes for maximum efficiency.
@app.get("/v1/agents/{agent_id}/tool-calls")
async def list_tool_calls_keyset(
agent_id: str,
after_id: int | None = Query(None),
limit: int = Query(50, ge=1, le=200),
db: AsyncSession = Depends(get_db),
):
query = (
select(ToolCall)
.where(ToolCall.agent_id == agent_id)
.order_by(ToolCall.id.asc())
)
if after_id is not None:
query = query.where(ToolCall.id > after_id)
rows = await db.execute(query.limit(limit + 1))
items = rows.scalars().all()
has_more = len(items) > limit
items = items[:limit]
return {
"data": [t.to_dict() for t in items],
"next_after_id": items[-1].id if has_more else None,
"has_more": has_more,
}
This generates a simple WHERE id > :after_id ORDER BY id LIMIT :limit query that uses an index seek instead of a sequential scan, performing consistently regardless of how deep into the dataset you paginate.
Choosing the Right Strategy
Use offset pagination for admin dashboards and internal tools where datasets are small, users need to jump to specific pages, and simplicity is valued over performance.
Use cursor pagination for public APIs consumed by AI agents that iterate through large datasets sequentially. It provides stable results and consistent performance.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Use keyset pagination when you control both the API and the client, your ordering column is indexed and unique, and you need maximum query performance on tables with millions of rows.
FAQ
Can I mix pagination strategies in the same API?
Yes, but be consistent within each resource. For example, use cursor pagination for conversation messages (which are append-heavy and sequentially accessed) and offset pagination for a paginated admin dashboard that needs page jumping. Document the strategy clearly in your OpenAPI spec for each endpoint.
How do I handle filtering with cursor pagination?
Apply filters before cursor conditions. The cursor encodes position within the filtered result set. If a user changes filters mid-pagination, they must start from the beginning with no cursor. Never reuse a cursor from a different filter combination — the underlying position may point to a record that no longer matches the new filter.
What page size should I default to for AI agent APIs?
Start with 20 to 50 items per page, with a maximum of 100 to 200. AI agents processing data in bulk may benefit from larger pages to reduce HTTP round trips, but excessively large pages increase memory pressure and response latency. Let clients specify the page size via a limit query parameter with a sane default and a hard maximum.
#APIPagination #CursorPagination #FastAPI #DatabasePerformance #AIAgents #AgenticAI #LearnAI #AIEngineering
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.