Skip to content
Learn Agentic AI
Learn Agentic AI11 min read14 views

OpenAI Embeddings API: Creating Vector Representations of Text

Learn how to generate text embeddings with OpenAI's API, understand embedding dimensions, implement batch embedding, and build practical search and similarity applications.

What Are Embeddings?

Embeddings are numerical vector representations of text that capture semantic meaning. Similar texts produce similar vectors, which makes them the foundation for semantic search, recommendation systems, clustering, classification, and retrieval-augmented generation (RAG). Instead of matching keywords, you match meaning.

OpenAI's embedding models convert any text into a fixed-length array of floating-point numbers. Two pieces of text about the same topic will have vectors that are close together in this high-dimensional space, regardless of the specific words used.

Generating Embeddings

The OpenAI Python SDK makes embedding generation straightforward:

flowchart TD
    DOC(["Document"])
    CHUNK["Chunker<br/>recursive plus overlap"]
    EMB["Embedding model"]
    META["Attach metadata<br/>source, page, tenant"]
    INDEX[("HNSW or IVF index<br/>in vector store")]
    Q(["Query"])
    QEMB["Embed query"]
    SEARCH["ANN search<br/>cosine similarity"]
    FILTER["Metadata filter<br/>tenant or date"]
    HITS(["Top-k chunks"])
    DOC --> CHUNK --> EMB --> META --> INDEX
    Q --> QEMB --> SEARCH
    INDEX --> SEARCH --> FILTER --> HITS
    style INDEX fill:#4f46e5,stroke:#4338ca,color:#fff
    style HITS fill:#059669,stroke:#047857,color:#fff
from openai import OpenAI

client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I reset my password?",
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

The text-embedding-3-small model produces 1536-dimensional vectors by default. The text-embedding-3-large model produces 3072-dimensional vectors with higher quality at the cost of more storage and computation.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Choosing a Model

Model Dimensions Quality Cost Best For
text-embedding-3-small 1536 Good Lowest Most applications
text-embedding-3-large 3072 Highest Higher Precision-critical search

Reducing Dimensions

Both models support a dimensions parameter to truncate vectors without significant quality loss:

# Reduce to 256 dimensions for faster search and less storage
response = client.embeddings.create(
    model="text-embedding-3-large",
    input="Machine learning fundamentals",
    dimensions=256,
)

embedding = response.data[0].embedding
print(f"Reduced dimensions: {len(embedding)}")  # 256

This is useful when you need to balance quality against storage cost and search speed. Reducing text-embedding-3-large to 256 dimensions still outperforms the older ada-002 model.

Batch Embedding

Embed multiple texts in a single API call for efficiency:

documents = [
    "How do I reset my password?",
    "What are your business hours?",
    "How do I cancel my subscription?",
    "Where can I find my invoice?",
    "How do I update my payment method?",
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=documents,
)

embeddings = [item.embedding for item in response.data]
print(f"Generated {len(embeddings)} embeddings")
print(f"Each has {len(embeddings[0])} dimensions")

The API supports up to 2048 inputs per request. For large datasets, batch your inputs into chunks.

Computing Similarity

Cosine similarity is the standard metric for comparing embeddings:

import numpy as np
from openai import OpenAI

client = OpenAI()

def get_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return response.data[0].embedding

def cosine_similarity(a: list[float], b: list[float]) -> float:
    a_arr = np.array(a)
    b_arr = np.array(b)
    return float(np.dot(a_arr, b_arr) / (np.linalg.norm(a_arr) * np.linalg.norm(b_arr)))

# Compare semantic similarity
query = get_embedding("How do I change my password?")
doc1 = get_embedding("Reset your password by clicking Forgot Password on the login page.")
doc2 = get_embedding("Our office is open Monday through Friday, 9 AM to 5 PM.")

print(f"Query vs password reset: {cosine_similarity(query, doc1):.4f}")
print(f"Query vs business hours: {cosine_similarity(query, doc2):.4f}")

The password reset document will score much higher despite using different words.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Combine embeddings with cosine similarity to build a search engine:

import numpy as np
from openai import OpenAI

client = OpenAI()

knowledge_base = [
    "To reset your password, go to Settings > Security > Change Password.",
    "Our support team is available 24/7 via chat and email.",
    "Free trials last 14 days. No credit card required.",
    "You can export your data as CSV from the Reports page.",
    "Two-factor authentication can be enabled in Security settings.",
]

# Pre-compute embeddings for all documents
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=knowledge_base,
)
doc_embeddings = np.array([item.embedding for item in response.data])

def search(query: str, top_k: int = 3) -> list[tuple[str, float]]:
    query_resp = client.embeddings.create(
        model="text-embedding-3-small",
        input=query,
    )
    query_vec = np.array(query_resp.data[0].embedding)

    similarities = np.dot(doc_embeddings, query_vec) / (
        np.linalg.norm(doc_embeddings, axis=1) * np.linalg.norm(query_vec)
    )

    top_indices = np.argsort(similarities)[-top_k:][::-1]
    return [(knowledge_base[i], float(similarities[i])) for i in top_indices]

results = search("How do I secure my account?")
for doc, score in results:
    print(f"[{score:.4f}] {doc}")

FAQ

Should I use text-embedding-3-small or text-embedding-3-large?

Start with text-embedding-3-small for most applications. It offers excellent quality at the lowest cost. Only upgrade to text-embedding-3-large if you need the highest precision for tasks like legal document retrieval or medical record matching where subtle semantic differences matter.

How should I store embeddings in production?

For small datasets (under 100K documents), store embeddings in PostgreSQL with the pgvector extension. For larger datasets, use a dedicated vector database like Pinecone, Weaviate, or Qdrant that provides optimized approximate nearest neighbor search.

Can I compare embeddings from different models?

No. Embeddings from different models exist in different vector spaces and cannot be meaningfully compared. If you switch models, you must re-embed all your documents.


#OpenAI #Embeddings #VectorSearch #SemanticSimilarity #Python #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Browser Agents with LangGraph + Playwright: Visual Evaluation Pipelines That Don't Lie

Build a browser agent with LangGraph and Playwright that does multi-step web tasks, then ground-truth its work with visual diffs and DOM-based evaluators.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Build a production RAG agent with LangChain, then measure faithfulness, answer relevance, and context precision with RAGAS. The four metrics that matter and how to wire them up.

Agentic AI

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Beyond single-shot RAG — agentic RAG with LangGraph that re-retrieves, self-grades, and rewrites queries. With evals that catch silent retrieval drift.

Agentic AI

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)

Build a working computer-use agent with the OpenAI Computer Use tool — clicks, types, scrolls a real browser — then evaluate task success on a benchmark suite.

Funding & Industry

OpenAI revenue run-rate — April 2026 read — April 2026 update

OpenAI's April 2026 reported revenue run-rate cleared $13B annualized, on continued ChatGPT growth, agentic Operator monetization, and enterprise API expansion.