Building a Floor Plan Analysis Agent: Room Detection, Measurement, and Description

Why Floor Plan Analysis Matters

Real estate listings, interior design platforms, and property management systems all need structured data from floor plans. A floor plan image contains room layouts, dimensions, furniture placement, and spatial relationships — but this information is trapped in pixels. An AI agent that can parse floor plans into structured data unlocks automated property descriptions, accurate square footage calculations, and intelligent room comparisons.

The challenge is that floor plans come in wildly different styles: architectural blueprints, hand-drawn sketches, 3D-rendered marketing plans, and everything in between. A robust agent must handle this variety.

The Analysis Pipeline

The agent processes floor plans through four stages: wall detection and room segmentation, room type classification, dimension estimation, and description generation.

flowchart LR
    CALLER(["Buyer or Seller Lead"])
    subgraph TEL["Telephony"]
        SIP["Twilio SIP and PSTN"]
    end
    subgraph BRAIN["Real Estate AI Agent"]
        STT["Streaming STT<br/>Deepgram or Whisper"]
        NLU{"Intent and<br/>Entity Extraction"}
        TOOLS["Tool Calls"]
        TTS["Streaming TTS<br/>ElevenLabs or Rime"]
    end
    subgraph DATA["Live Data Plane"]
        CRM[("CRM and Notes")]
        CAL[("Calendar and<br/>Schedule")]
        KB[("Knowledge Base<br/>and Policies")]
    end
    subgraph OUT["Outcomes"]
        O1(["Showing scheduled"])
        O2(["Lead routed to agent"])
        O3(["Pre-qual handed to broker"])
    end
    CALLER --> SIP --> STT --> NLU
    NLU -->|Lookup| TOOLS
    TOOLS <--> CRM
    TOOLS <--> CAL
    TOOLS <--> KB
    NLU --> TTS --> SIP --> CALLER
    NLU -->|Resolved| O1
    NLU -->|Schedule| O2
    NLU -->|Escalate| O3
    style CALLER fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style NLU fill:#4f46e5,stroke:#4338ca,color:#fff
    style O1 fill:#059669,stroke:#047857,color:#fff
    style O2 fill:#0ea5e9,stroke:#0369a1,color:#fff
    style O3 fill:#f59e0b,stroke:#d97706,color:#1f2937

Wall Detection and Room Segmentation

Walls are the structural elements that define rooms. Detect them using morphological operations:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

import cv2
import numpy as np
from dataclasses import dataclass, field

@dataclass
class Room:
    room_id: int
    contour: np.ndarray
    bbox: tuple           # (x, y, w, h)
    area_pixels: float
    area_sqft: float = 0.0
    room_type: str = "unknown"
    center: tuple = (0, 0)
    furniture: list[str] = field(default_factory=list)

def detect_walls(image_path: str) -> np.ndarray:
    """Detect walls in a floor plan image."""
    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)

    # Binarize: walls are typically dark lines
    _, binary = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY_INV)

    # Use morphological closing to connect broken wall segments
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
    walls = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel)

    # Thicken walls slightly for better room segmentation
    walls = cv2.dilate(walls, kernel, iterations=1)

    return walls

def segment_rooms(wall_mask: np.ndarray) -> list[Room]:
    """Segment the floor plan into individual rooms."""
    # Invert: rooms are the spaces between walls
    room_spaces = cv2.bitwise_not(wall_mask)

    # Remove small noise regions
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
    room_spaces = cv2.morphologyEx(room_spaces, cv2.MORPH_OPEN, kernel)

    # Find connected components (each is a room)
    num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(
        room_spaces, connectivity=4
    )

    rooms = []
    for i in range(1, num_labels):  # Skip background (label 0)
        area = stats[i, cv2.CC_STAT_AREA]

        # Filter out very small regions (noise) and the exterior
        if area < 1000 or area > 0.5 * wall_mask.size:
            continue

        # Create contour for this room
        room_mask = (labels == i).astype(np.uint8) * 255
        contours, _ = cv2.findContours(
            room_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
        )

        if contours:
            rooms.append(Room(
                room_id=len(rooms),
                contour=contours[0],
                bbox=(
                    stats[i, cv2.CC_STAT_LEFT],
                    stats[i, cv2.CC_STAT_TOP],
                    stats[i, cv2.CC_STAT_WIDTH],
                    stats[i, cv2.CC_STAT_HEIGHT],
                ),
                area_pixels=area,
                center=(int(centroids[i][0]), int(centroids[i][1])),
            ))

    return rooms

Scale Detection and Area Estimation

Floor plans often include a scale bar or dimension annotations. Detect these to convert pixel areas to real-world measurements:

import pytesseract
from PIL import Image
import re

def detect_scale(image_path: str) -> float:
    """Detect the scale factor (pixels per foot) from annotations."""
    img = Image.open(image_path)
    text = pytesseract.image_to_string(img)

    # Look for dimension patterns like "10'" or "10ft" or "3m"
    ft_pattern = r"(\d+)['′]"
    m_pattern = r"(\d+\.?\d*)\s*m"

    ft_matches = re.findall(ft_pattern, text)
    m_matches = re.findall(m_pattern, text)

    if ft_matches:
        # Find the dimension annotation and its pixel length
        # This is a simplified version; production code would
        # locate the dimension line in the image
        return estimate_scale_from_annotation(image_path, ft_matches[0])

    return 10.0  # Default: 10 pixels per foot (rough estimate)

def estimate_scale_from_annotation(
    image_path: str,
    known_dimension: str,
) -> float:
    """Estimate pixels-per-foot from a known dimension annotation."""
    # In production, you would locate the dimension line endpoints
    # and compute: pixels_between_endpoints / dimension_in_feet
    known_ft = float(known_dimension)
    estimated_pixel_length = 100  # Placeholder
    return estimated_pixel_length / known_ft

def calculate_room_areas(
    rooms: list[Room],
    pixels_per_foot: float,
) -> list[Room]:
    """Convert pixel areas to square feet."""
    sqft_per_pixel = 1.0 / (pixels_per_foot ** 2)

    for room in rooms:
        room.area_sqft = round(room.area_pixels * sqft_per_pixel, 1)

    return rooms

Room Type Classification

Classify rooms based on their size, shape, position, and any detected text labels or furniture:

from openai import OpenAI

def classify_rooms_with_context(
    rooms: list[Room],
    image_path: str,
) -> list[Room]:
    """Classify room types using spatial context and LLM reasoning."""
    # Extract any text labels near each room
    img = Image.open(image_path)
    full_text = pytesseract.image_to_string(img)

    room_descriptions = []
    for room in rooms:
        desc = (
            f"Room {room.room_id}: "
            f"area={room.area_sqft:.0f} sqft, "
            f"dimensions={room.bbox[2]}x{room.bbox[3]} pixels, "
            f"center=({room.center[0]}, {room.center[1]})"
        )
        room_descriptions.append(desc)

    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "You are a floor plan analyst. Given room measurements "
                "and positions, classify each room as one of: living_room, "
                "bedroom, kitchen, bathroom, dining_room, hallway, closet, "
                "garage, office, laundry, entrance. Consider typical room "
                "sizes: bathrooms are small (30-80 sqft), bedrooms are "
                "medium (100-200 sqft), living rooms are large (200+ sqft)."
            )},
            {"role": "user", "content": (
                f"Text found on floor plan: {full_text}\n\n"
                f"Rooms:\n" + "\n".join(room_descriptions)
            )},
        ],
    )

    classifications = response.choices[0].message.content
    # Parse LLM response and update rooms
    for room in rooms:
        for room_type in ["living_room", "bedroom", "kitchen",
                          "bathroom", "hallway", "closet"]:
            if (f"Room {room.room_id}" in classifications and
                    room_type in classifications.lower()):
                room.room_type = room_type
                break

    return rooms

Furniture and Fixture Detection

Detect common furniture symbols in the floor plan:

FURNITURE_TEMPLATES = {
    "toilet": {"min_area": 200, "max_area": 800, "aspect_range": (0.5, 1.5)},
    "bathtub": {"min_area": 800, "max_area": 3000, "aspect_range": (0.3, 0.6)},
    "sink": {"min_area": 100, "max_area": 500, "aspect_range": (0.7, 1.3)},
    "bed": {"min_area": 2000, "max_area": 8000, "aspect_range": (0.5, 0.8)},
    "table": {"min_area": 500, "max_area": 3000, "aspect_range": (0.6, 1.4)},
}

def detect_furniture(
    image: np.ndarray,
    rooms: list[Room],
) -> list[Room]:
    """Detect furniture symbols within each room."""
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) if len(image.shape) == 3 else image
    _, binary = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY_INV)

    for room in rooms:
        x, y, w, h = room.bbox
        room_region = binary[y:y+h, x:x+w]

        contours, _ = cv2.findContours(
            room_region, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE
        )

        for contour in contours:
            area = cv2.contourArea(contour)
            if area < 100:
                continue

            bx, by, bw, bh = cv2.boundingRect(contour)
            aspect = bw / max(bh, 1)

            for name, props in FURNITURE_TEMPLATES.items():
                if (props["min_area"] <= area <= props["max_area"] and
                        props["aspect_range"][0] <= aspect <= props["aspect_range"][1]):
                    if name not in room.furniture:
                        room.furniture.append(name)

    return rooms

Natural Language Description Generation

Generate listing-quality descriptions from the structured data:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

def generate_property_description(rooms: list[Room]) -> str:
    """Generate a natural language property description."""
    total_sqft = sum(r.area_sqft for r in rooms)
    bedrooms = [r for r in rooms if r.room_type == "bedroom"]
    bathrooms = [r for r in rooms if r.room_type == "bathroom"]

    client = OpenAI()
    room_details = "\n".join(
        f"- {r.room_type.replace('_', ' ').title()}: "
        f"{r.area_sqft:.0f} sqft"
        f"{', with ' + ', '.join(r.furniture) if r.furniture else ''}"
        for r in rooms
    )

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "Write a professional real estate listing description "
                "based on these floor plan details. Be factual, "
                "highlight the layout flow, and mention room sizes."
            )},
            {"role": "user", "content": (
                f"Total area: {total_sqft:.0f} sqft\n"
                f"Bedrooms: {len(bedrooms)}\n"
                f"Bathrooms: {len(bathrooms)}\n\n"
                f"Room details:\n{room_details}"
            )},
        ],
    )

    return response.choices[0].message.content

FAQ

How accurate are the area measurements from floor plan analysis?

Pixel-based area estimation typically achieves 85-95% accuracy when a reliable scale reference is detected. The main error sources are perspective distortion in photographs of floor plans, inconsistent line weights, and scale bars that are not detected correctly. For critical measurements, always include a disclaimer that areas are estimates and should be verified by professional measurement.

Can this work on hand-drawn floor plans?

Yes, but with reduced accuracy. Hand-drawn plans have inconsistent line weights, imprecise angles, and often lack scale references. The wall detection stage needs more aggressive morphological operations, and room classification relies more heavily on text labels (which may be handwritten and harder to OCR). Expect 70-80% accuracy on room detection for clean hand-drawn plans.

How do I handle multi-story buildings?

Process each floor plan image independently, then use the LLM to identify common elements (staircases, elevators) that connect floors. Generate a combined description that references the flow between levels. The key challenge is maintaining consistent room numbering across floors.

#FloorPlanAI #RoomDetection #RealEstateAI #ComputerVision #ArchitectureAI #PropertyTech #AgenticAI #Python

Building a Floor Plan Analysis Agent: Room Detection, Measurement, and Description

Why Floor Plan Analysis Matters

The Analysis Pipeline

Wall Detection and Room Segmentation

Scale Detection and Area Estimation

Room Type Classification

Furniture and Fixture Detection

Natural Language Description Generation

FAQ

How accurate are the area measurements from floor plan analysis?

Can this work on hand-drawn floor plans?

How do I handle multi-story buildings?

Try CallSphere AI Voice Agents

Related Articles You May Like

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

Dubai Hospitality AI Agents 2026: Atlantis, Address Hotels Rollouts

Vector DB Build vs Buy: The 2026 Decision Framework Made Simple

Real Estate Voice Agents 2026: CallSphere Deployment Pattern

Hotel AI Agents 2026: CallSphere Deployment for Hospitality

Forgetting Curves and Decay in Agent Memory: Four Strategies