Building a Whiteboard-to-Code Agent: Converting Hand-Drawn Diagrams to Working Software

From Sketch to Code in Seconds

Whiteboards are where software architecture happens. Teams sketch entity-relationship diagrams, flowcharts, system architectures, and UI wireframes during design sessions. But these diagrams typically die on the whiteboard — someone takes a photo, it gets buried in a Slack thread, and the knowledge is effectively lost.

A whiteboard-to-code agent changes this. It takes a photo of a whiteboard, identifies the shapes, arrows, and text, understands the diagram type, and produces working code artifacts: Mermaid diagrams for documentation, SQL schemas for databases, API route stubs, or even class definitions.

Architecture of the Agent

The pipeline has four stages:

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

Image preprocessing — clean up whiteboard photo artifacts
Element detection — find shapes (boxes, circles, diamonds) and connections (arrows, lines)
Semantic classification — determine diagram type and element meanings
Code generation — produce the appropriate code output

Image Preprocessing for Whiteboards

Whiteboard photos have unique challenges: glare, perspective distortion, marker color variations, and erased-but-visible ghost text:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

import cv2
import numpy as np

def preprocess_whiteboard(image_path: str) -> np.ndarray:
    """Clean up a whiteboard photo for element detection."""
    img = cv2.imread(image_path)

    # Perspective correction: find the whiteboard boundary
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    edges = cv2.Canny(blurred, 50, 150)

    contours, _ = cv2.findContours(
        edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
    )

    if contours:
        largest = max(contours, key=cv2.contourArea)
        epsilon = 0.02 * cv2.arcLength(largest, True)
        approx = cv2.approxPolyDP(largest, epsilon, True)

        if len(approx) == 4:
            pts = approx.reshape(4, 2).astype(np.float32)
            width, height = 1200, 900
            dst = np.array([
                [0, 0], [width, 0],
                [width, height], [0, height]
            ], dtype=np.float32)
            matrix = cv2.getPerspectiveTransform(pts, dst)
            img = cv2.warpPerspective(img, matrix, (width, height))

    # Enhance contrast and remove background
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    mask = cv2.inRange(hsv, (0, 30, 0), (180, 255, 255))
    result = cv2.bitwise_and(img, img, mask=mask)

    return result

Shape Detection and Classification

Detect individual shapes by finding contours and classifying them based on geometry:

from dataclasses import dataclass, field
from enum import Enum

class ShapeType(Enum):
    RECTANGLE = "rectangle"
    CIRCLE = "circle"
    DIAMOND = "diamond"
    ARROW = "arrow"
    TEXT = "text"
    UNKNOWN = "unknown"

@dataclass
class DiagramElement:
    shape: ShapeType
    bbox: tuple  # (x, y, w, h)
    center: tuple  # (cx, cy)
    label: str = ""
    connections: list[int] = field(default_factory=list)

def detect_shapes(image: np.ndarray) -> list[DiagramElement]:
    """Detect and classify shapes in the preprocessed image."""
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    _, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)

    contours, _ = cv2.findContours(
        binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
    )

    elements = []
    for contour in contours:
        area = cv2.contourArea(contour)
        if area < 500:  # Skip noise
            continue

        x, y, w, h = cv2.boundingRect(contour)
        center = (x + w // 2, y + h // 2)

        # Classify based on geometry
        perimeter = cv2.arcLength(contour, True)
        approx = cv2.approxPolyDP(contour, 0.04 * perimeter, True)
        circularity = 4 * np.pi * area / (perimeter ** 2)

        if circularity > 0.8:
            shape = ShapeType.CIRCLE
        elif len(approx) == 4:
            aspect = w / float(h)
            angle = cv2.minAreaRect(contour)[-1]
            if 0.8 < aspect < 1.2 and abs(angle) > 30:
                shape = ShapeType.DIAMOND
            else:
                shape = ShapeType.RECTANGLE
        else:
            shape = ShapeType.UNKNOWN

        elements.append(DiagramElement(
            shape=shape,
            bbox=(x, y, w, h),
            center=center,
        ))

    return elements

Text Recognition Within Shapes

Extract the text label inside each detected shape:

import pytesseract
from PIL import Image

def extract_shape_labels(
    image: np.ndarray,
    elements: list[DiagramElement]
) -> list[DiagramElement]:
    """Read text inside each detected shape."""
    for elem in elements:
        x, y, w, h = elem.bbox
        padding = 5
        roi = image[
            max(0, y - padding):y + h + padding,
            max(0, x - padding):x + w + padding
        ]

        roi_pil = Image.fromarray(roi)
        text = pytesseract.image_to_string(
            roi_pil, config="--psm 6"
        ).strip()

        elem.label = text if text else f"Element_{elements.index(elem)}"

    return elements

Connection Detection

Find arrows and lines that connect shapes:

def detect_connections(
    elements: list[DiagramElement],
    image: np.ndarray
) -> list[tuple[int, int]]:
    """Detect which elements are connected by arrows or lines."""
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    edges = cv2.Canny(gray, 50, 150)

    lines = cv2.HoughLinesP(
        edges, 1, np.pi / 180,
        threshold=50, minLineLength=30, maxLineGap=10
    )

    connections = []
    if lines is None:
        return connections

    for line in lines:
        x1, y1, x2, y2 = line[0]

        start_elem = find_nearest_element(elements, (x1, y1))
        end_elem = find_nearest_element(elements, (x2, y2))

        if (start_elem is not None and end_elem is not None
                and start_elem != end_elem):
            connections.append((start_elem, end_elem))

    return list(set(connections))

def find_nearest_element(
    elements: list[DiagramElement],
    point: tuple,
    max_dist: float = 50.0
) -> int | None:
    """Find the element closest to a given point."""
    min_dist = float("inf")
    nearest = None

    for i, elem in enumerate(elements):
        dist = np.sqrt(
            (elem.center[0] - point[0]) ** 2 +
            (elem.center[1] - point[1]) ** 2
        )
        if dist < min_dist and dist < max_dist:
            min_dist = dist
            nearest = i

    return nearest

Generating Mermaid Diagrams

Convert the detected structure into a Mermaid diagram:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

def generate_mermaid(
    elements: list[DiagramElement],
    connections: list[tuple[int, int]],
    diagram_type: str = "flowchart"
) -> str:
    """Generate Mermaid diagram syntax from detected elements."""
    lines = [f"flowchart TD"]

    # Define nodes
    for i, elem in enumerate(elements):
        label = elem.label.replace('"', "'")
        if elem.shape == ShapeType.CIRCLE:
            lines.append(f'    N{i}(("{label}"))')
        elif elem.shape == ShapeType.DIAMOND:
            lines.append(f'    N{i}{{"{label}"}}')
        else:
            lines.append(f'    N{i}["{label}"]')

    # Define connections
    for start, end in connections:
        lines.append(f"    N{start} --> N{end}")

    return "\n".join(lines)

Generating SQL Schema from ER Diagrams

When the diagram is identified as an entity-relationship diagram, generate a SQL schema:

from openai import OpenAI

def diagram_to_sql(
    elements: list[DiagramElement],
    connections: list[tuple[int, int]]
) -> str:
    """Use an LLM to generate SQL from detected ER diagram."""
    diagram_desc = "Entities:\n"
    for i, elem in enumerate(elements):
        diagram_desc += f"- {elem.label} ({elem.shape.value})\n"

    diagram_desc += "\nRelationships:\n"
    for start, end in connections:
        diagram_desc += (
            f"- {elements[start].label} -> "
            f"{elements[end].label}\n"
        )

    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "Convert this ER diagram description into a PostgreSQL "
                "schema. Include primary keys, foreign keys, appropriate "
                "data types, and indexes. Only output SQL, no explanation."
            )},
            {"role": "user", "content": diagram_desc},
        ],
    )

    return response.choices[0].message.content

FAQ

How well does this work with messy handwriting?

The accuracy depends heavily on handwriting legibility. Block letters in dark markers on a clean whiteboard work well — expect 85-90% text recognition accuracy. Cursive or small writing drops significantly. For critical diagrams, consider having users write labels in a structured way or adding a manual correction step before code generation.

Can the agent distinguish between different diagram types automatically?

Yes, with LLM-powered classification. Send the detected shapes, their types, and connection patterns to an LLM and ask it to classify the diagram as a flowchart, ER diagram, sequence diagram, or architecture diagram. The shape distribution is a strong signal: many diamonds suggest a flowchart, all rectangles with labeled connections suggest an ER diagram.

How do I handle diagrams with multiple colors?

Color carries semantic meaning on whiteboards — red might mean errors, green might mean success paths. Preserve color information during preprocessing and pass it to the LLM as metadata. For example, annotate each element with its dominant color so the code generator can map red paths to error handlers and green paths to success flows.

#WhiteboardAI #DiagramRecognition #CodeGeneration #MermaidJS #ComputerVision #AgenticAI #Python #SoftwareDesign

Building a Whiteboard-to-Code Agent: Converting Hand-Drawn Diagrams to Working Software

From Sketch to Code in Seconds

Architecture of the Agent

Image Preprocessing for Whiteboards

Shape Detection and Classification

Text Recognition Within Shapes

Connection Detection

Generating Mermaid Diagrams

Generating SQL Schema from ER Diagrams

FAQ

How well does this work with messy handwriting?

Can the agent distinguish between different diagram types automatically?

How do I handle diagrams with multiple colors?

Try CallSphere AI Voice Agents

Related Articles You May Like

OpenAI Codex Agent Mode: Autonomous Coding with GPT-5.4 in Production

Building AI Agents That Write and Deploy Their Own Tools: Self-Extending Agent Systems

Building a Floor Plan Analysis Agent: Room Detection, Measurement, and Description

Building a Code Generation Pipeline with 5 Specialized Agents: Planner, Coder, Reviewer, Tester, Deployer

UFO's Visual Understanding: How GPT-4V Interprets Windows Application Screenshots

Table Extraction from Images and PDFs with AI: Building Reliable Data Pipelines