Skip to content
Learn Agentic AI
Learn Agentic AI14 min read9 views

Building a Whiteboard-to-Code Agent: Converting Hand-Drawn Diagrams to Working Software

Learn how to build an AI agent that recognizes hand-drawn diagrams on whiteboards, classifies shapes and connections, and generates working code including Mermaid diagrams, database schemas, and API stubs.

From Sketch to Code in Seconds

Whiteboards are where software architecture happens. Teams sketch entity-relationship diagrams, flowcharts, system architectures, and UI wireframes during design sessions. But these diagrams typically die on the whiteboard — someone takes a photo, it gets buried in a Slack thread, and the knowledge is effectively lost.

A whiteboard-to-code agent changes this. It takes a photo of a whiteboard, identifies the shapes, arrows, and text, understands the diagram type, and produces working code artifacts: Mermaid diagrams for documentation, SQL schemas for databases, API route stubs, or even class definitions.

Architecture of the Agent

The pipeline has four stages:

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
  1. Image preprocessing — clean up whiteboard photo artifacts
  2. Element detection — find shapes (boxes, circles, diamonds) and connections (arrows, lines)
  3. Semantic classification — determine diagram type and element meanings
  4. Code generation — produce the appropriate code output

Image Preprocessing for Whiteboards

Whiteboard photos have unique challenges: glare, perspective distortion, marker color variations, and erased-but-visible ghost text:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
import cv2
import numpy as np

def preprocess_whiteboard(image_path: str) -> np.ndarray:
    """Clean up a whiteboard photo for element detection."""
    img = cv2.imread(image_path)

    # Perspective correction: find the whiteboard boundary
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    edges = cv2.Canny(blurred, 50, 150)

    contours, _ = cv2.findContours(
        edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
    )

    if contours:
        largest = max(contours, key=cv2.contourArea)
        epsilon = 0.02 * cv2.arcLength(largest, True)
        approx = cv2.approxPolyDP(largest, epsilon, True)

        if len(approx) == 4:
            pts = approx.reshape(4, 2).astype(np.float32)
            width, height = 1200, 900
            dst = np.array([
                [0, 0], [width, 0],
                [width, height], [0, height]
            ], dtype=np.float32)
            matrix = cv2.getPerspectiveTransform(pts, dst)
            img = cv2.warpPerspective(img, matrix, (width, height))

    # Enhance contrast and remove background
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    mask = cv2.inRange(hsv, (0, 30, 0), (180, 255, 255))
    result = cv2.bitwise_and(img, img, mask=mask)

    return result

Shape Detection and Classification

Detect individual shapes by finding contours and classifying them based on geometry:

from dataclasses import dataclass, field
from enum import Enum

class ShapeType(Enum):
    RECTANGLE = "rectangle"
    CIRCLE = "circle"
    DIAMOND = "diamond"
    ARROW = "arrow"
    TEXT = "text"
    UNKNOWN = "unknown"

@dataclass
class DiagramElement:
    shape: ShapeType
    bbox: tuple  # (x, y, w, h)
    center: tuple  # (cx, cy)
    label: str = ""
    connections: list[int] = field(default_factory=list)

def detect_shapes(image: np.ndarray) -> list[DiagramElement]:
    """Detect and classify shapes in the preprocessed image."""
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    _, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)

    contours, _ = cv2.findContours(
        binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
    )

    elements = []
    for contour in contours:
        area = cv2.contourArea(contour)
        if area < 500:  # Skip noise
            continue

        x, y, w, h = cv2.boundingRect(contour)
        center = (x + w // 2, y + h // 2)

        # Classify based on geometry
        perimeter = cv2.arcLength(contour, True)
        approx = cv2.approxPolyDP(contour, 0.04 * perimeter, True)
        circularity = 4 * np.pi * area / (perimeter ** 2)

        if circularity > 0.8:
            shape = ShapeType.CIRCLE
        elif len(approx) == 4:
            aspect = w / float(h)
            angle = cv2.minAreaRect(contour)[-1]
            if 0.8 < aspect < 1.2 and abs(angle) > 30:
                shape = ShapeType.DIAMOND
            else:
                shape = ShapeType.RECTANGLE
        else:
            shape = ShapeType.UNKNOWN

        elements.append(DiagramElement(
            shape=shape,
            bbox=(x, y, w, h),
            center=center,
        ))

    return elements

Text Recognition Within Shapes

Extract the text label inside each detected shape:

import pytesseract
from PIL import Image

def extract_shape_labels(
    image: np.ndarray,
    elements: list[DiagramElement]
) -> list[DiagramElement]:
    """Read text inside each detected shape."""
    for elem in elements:
        x, y, w, h = elem.bbox
        padding = 5
        roi = image[
            max(0, y - padding):y + h + padding,
            max(0, x - padding):x + w + padding
        ]

        roi_pil = Image.fromarray(roi)
        text = pytesseract.image_to_string(
            roi_pil, config="--psm 6"
        ).strip()

        elem.label = text if text else f"Element_{elements.index(elem)}"

    return elements

Connection Detection

Find arrows and lines that connect shapes:

def detect_connections(
    elements: list[DiagramElement],
    image: np.ndarray
) -> list[tuple[int, int]]:
    """Detect which elements are connected by arrows or lines."""
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    edges = cv2.Canny(gray, 50, 150)

    lines = cv2.HoughLinesP(
        edges, 1, np.pi / 180,
        threshold=50, minLineLength=30, maxLineGap=10
    )

    connections = []
    if lines is None:
        return connections

    for line in lines:
        x1, y1, x2, y2 = line[0]

        start_elem = find_nearest_element(elements, (x1, y1))
        end_elem = find_nearest_element(elements, (x2, y2))

        if (start_elem is not None and end_elem is not None
                and start_elem != end_elem):
            connections.append((start_elem, end_elem))

    return list(set(connections))

def find_nearest_element(
    elements: list[DiagramElement],
    point: tuple,
    max_dist: float = 50.0
) -> int | None:
    """Find the element closest to a given point."""
    min_dist = float("inf")
    nearest = None

    for i, elem in enumerate(elements):
        dist = np.sqrt(
            (elem.center[0] - point[0]) ** 2 +
            (elem.center[1] - point[1]) ** 2
        )
        if dist < min_dist and dist < max_dist:
            min_dist = dist
            nearest = i

    return nearest

Generating Mermaid Diagrams

Convert the detected structure into a Mermaid diagram:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

def generate_mermaid(
    elements: list[DiagramElement],
    connections: list[tuple[int, int]],
    diagram_type: str = "flowchart"
) -> str:
    """Generate Mermaid diagram syntax from detected elements."""
    lines = [f"flowchart TD"]

    # Define nodes
    for i, elem in enumerate(elements):
        label = elem.label.replace('"', "'")
        if elem.shape == ShapeType.CIRCLE:
            lines.append(f'    N{i}(("{label}"))')
        elif elem.shape == ShapeType.DIAMOND:
            lines.append(f'    N{i}{{"{label}"}}')
        else:
            lines.append(f'    N{i}["{label}"]')

    # Define connections
    for start, end in connections:
        lines.append(f"    N{start} --> N{end}")

    return "\n".join(lines)

Generating SQL Schema from ER Diagrams

When the diagram is identified as an entity-relationship diagram, generate a SQL schema:

from openai import OpenAI

def diagram_to_sql(
    elements: list[DiagramElement],
    connections: list[tuple[int, int]]
) -> str:
    """Use an LLM to generate SQL from detected ER diagram."""
    diagram_desc = "Entities:\n"
    for i, elem in enumerate(elements):
        diagram_desc += f"- {elem.label} ({elem.shape.value})\n"

    diagram_desc += "\nRelationships:\n"
    for start, end in connections:
        diagram_desc += (
            f"- {elements[start].label} -> "
            f"{elements[end].label}\n"
        )

    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "Convert this ER diagram description into a PostgreSQL "
                "schema. Include primary keys, foreign keys, appropriate "
                "data types, and indexes. Only output SQL, no explanation."
            )},
            {"role": "user", "content": diagram_desc},
        ],
    )

    return response.choices[0].message.content

FAQ

How well does this work with messy handwriting?

The accuracy depends heavily on handwriting legibility. Block letters in dark markers on a clean whiteboard work well — expect 85-90% text recognition accuracy. Cursive or small writing drops significantly. For critical diagrams, consider having users write labels in a structured way or adding a manual correction step before code generation.

Can the agent distinguish between different diagram types automatically?

Yes, with LLM-powered classification. Send the detected shapes, their types, and connection patterns to an LLM and ask it to classify the diagram as a flowchart, ER diagram, sequence diagram, or architecture diagram. The shape distribution is a strong signal: many diamonds suggest a flowchart, all rectangles with labeled connections suggest an ER diagram.

How do I handle diagrams with multiple colors?

Color carries semantic meaning on whiteboards — red might mean errors, green might mean success paths. Preserve color information during preprocessing and pass it to the LLM as metadata. For example, annotate each element with its dominant color so the code generator can map red paths to error handlers and green paths to success flows.


#WhiteboardAI #DiagramRecognition #CodeGeneration #MermaidJS #ComputerVision #AgenticAI #Python #SoftwareDesign

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Learn Agentic AI

OpenAI Codex Agent Mode: Autonomous Coding with GPT-5.4 in Production

How Codex uses GPT-5.4 for autonomous coding tasks including subagent architecture with GPT-5.4 mini, practical patterns for building production code generation agents.

Learn Agentic AI

Building AI Agents That Write and Deploy Their Own Tools: Self-Extending Agent Systems

Discover how to build AI agents that can write new Python tools at runtime, validate them in a sandbox, register them dynamically, and use them in subsequent reasoning — creating truly self-extending agent systems.

Learn Agentic AI

Building a Floor Plan Analysis Agent: Room Detection, Measurement, and Description

Build an AI agent that analyzes architectural floor plans to detect rooms, classify their types, estimate areas, identify furniture, and generate natural language descriptions for real estate and interior design applications.

Learn Agentic AI

Building a Code Generation Pipeline with 5 Specialized Agents: Planner, Coder, Reviewer, Tester, Deployer

Build an end-to-end code generation pipeline using five specialized AI agents — Planner, Coder, Reviewer, Tester, and Deployer — with complete handoff data structures and orchestration logic in Python.

Learn Agentic AI

UFO's Visual Understanding: How GPT-4V Interprets Windows Application Screenshots

Explore how UFO captures, annotates, and sends Windows application screenshots to GPT-4V for UI element detection, control identification, and intelligent action mapping at each automation step.

Learn Agentic AI

Table Extraction from Images and PDFs with AI: Building Reliable Data Pipelines

Build an AI-powered table extraction pipeline that detects tables in images and PDFs, recognizes cell boundaries, infers structure, and outputs clean CSV data for downstream consumption.