Building a Floor Plan Analysis Agent: Room Detection, Measurement, and Description
Build an AI agent that analyzes architectural floor plans to detect rooms, classify their types, estimate areas, identify furniture, and generate natural language descriptions for real estate and interior design applications.
Why Floor Plan Analysis Matters
Real estate listings, interior design platforms, and property management systems all need structured data from floor plans. A floor plan image contains room layouts, dimensions, furniture placement, and spatial relationships — but this information is trapped in pixels. An AI agent that can parse floor plans into structured data unlocks automated property descriptions, accurate square footage calculations, and intelligent room comparisons.
The challenge is that floor plans come in wildly different styles: architectural blueprints, hand-drawn sketches, 3D-rendered marketing plans, and everything in between. A robust agent must handle this variety.
The Analysis Pipeline
The agent processes floor plans through four stages: wall detection and room segmentation, room type classification, dimension estimation, and description generation.
flowchart LR
CALLER(["Buyer or Seller Lead"])
subgraph TEL["Telephony"]
SIP["Twilio SIP and PSTN"]
end
subgraph BRAIN["Real Estate AI Agent"]
STT["Streaming STT<br/>Deepgram or Whisper"]
NLU{"Intent and<br/>Entity Extraction"}
TOOLS["Tool Calls"]
TTS["Streaming TTS<br/>ElevenLabs or Rime"]
end
subgraph DATA["Live Data Plane"]
CRM[("CRM and Notes")]
CAL[("Calendar and<br/>Schedule")]
KB[("Knowledge Base<br/>and Policies")]
end
subgraph OUT["Outcomes"]
O1(["Showing scheduled"])
O2(["Lead routed to agent"])
O3(["Pre-qual handed to broker"])
end
CALLER --> SIP --> STT --> NLU
NLU -->|Lookup| TOOLS
TOOLS <--> CRM
TOOLS <--> CAL
TOOLS <--> KB
NLU --> TTS --> SIP --> CALLER
NLU -->|Resolved| O1
NLU -->|Schedule| O2
NLU -->|Escalate| O3
style CALLER fill:#f1f5f9,stroke:#64748b,color:#0f172a
style NLU fill:#4f46e5,stroke:#4338ca,color:#fff
style O1 fill:#059669,stroke:#047857,color:#fff
style O2 fill:#0ea5e9,stroke:#0369a1,color:#fff
style O3 fill:#f59e0b,stroke:#d97706,color:#1f2937
Wall Detection and Room Segmentation
Walls are the structural elements that define rooms. Detect them using morphological operations:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
import cv2
import numpy as np
from dataclasses import dataclass, field
@dataclass
class Room:
room_id: int
contour: np.ndarray
bbox: tuple # (x, y, w, h)
area_pixels: float
area_sqft: float = 0.0
room_type: str = "unknown"
center: tuple = (0, 0)
furniture: list[str] = field(default_factory=list)
def detect_walls(image_path: str) -> np.ndarray:
"""Detect walls in a floor plan image."""
img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
# Binarize: walls are typically dark lines
_, binary = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY_INV)
# Use morphological closing to connect broken wall segments
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
walls = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel)
# Thicken walls slightly for better room segmentation
walls = cv2.dilate(walls, kernel, iterations=1)
return walls
def segment_rooms(wall_mask: np.ndarray) -> list[Room]:
"""Segment the floor plan into individual rooms."""
# Invert: rooms are the spaces between walls
room_spaces = cv2.bitwise_not(wall_mask)
# Remove small noise regions
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
room_spaces = cv2.morphologyEx(room_spaces, cv2.MORPH_OPEN, kernel)
# Find connected components (each is a room)
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(
room_spaces, connectivity=4
)
rooms = []
for i in range(1, num_labels): # Skip background (label 0)
area = stats[i, cv2.CC_STAT_AREA]
# Filter out very small regions (noise) and the exterior
if area < 1000 or area > 0.5 * wall_mask.size:
continue
# Create contour for this room
room_mask = (labels == i).astype(np.uint8) * 255
contours, _ = cv2.findContours(
room_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
)
if contours:
rooms.append(Room(
room_id=len(rooms),
contour=contours[0],
bbox=(
stats[i, cv2.CC_STAT_LEFT],
stats[i, cv2.CC_STAT_TOP],
stats[i, cv2.CC_STAT_WIDTH],
stats[i, cv2.CC_STAT_HEIGHT],
),
area_pixels=area,
center=(int(centroids[i][0]), int(centroids[i][1])),
))
return rooms
Scale Detection and Area Estimation
Floor plans often include a scale bar or dimension annotations. Detect these to convert pixel areas to real-world measurements:
import pytesseract
from PIL import Image
import re
def detect_scale(image_path: str) -> float:
"""Detect the scale factor (pixels per foot) from annotations."""
img = Image.open(image_path)
text = pytesseract.image_to_string(img)
# Look for dimension patterns like "10'" or "10ft" or "3m"
ft_pattern = r"(\d+)['′]"
m_pattern = r"(\d+\.?\d*)\s*m"
ft_matches = re.findall(ft_pattern, text)
m_matches = re.findall(m_pattern, text)
if ft_matches:
# Find the dimension annotation and its pixel length
# This is a simplified version; production code would
# locate the dimension line in the image
return estimate_scale_from_annotation(image_path, ft_matches[0])
return 10.0 # Default: 10 pixels per foot (rough estimate)
def estimate_scale_from_annotation(
image_path: str,
known_dimension: str,
) -> float:
"""Estimate pixels-per-foot from a known dimension annotation."""
# In production, you would locate the dimension line endpoints
# and compute: pixels_between_endpoints / dimension_in_feet
known_ft = float(known_dimension)
estimated_pixel_length = 100 # Placeholder
return estimated_pixel_length / known_ft
def calculate_room_areas(
rooms: list[Room],
pixels_per_foot: float,
) -> list[Room]:
"""Convert pixel areas to square feet."""
sqft_per_pixel = 1.0 / (pixels_per_foot ** 2)
for room in rooms:
room.area_sqft = round(room.area_pixels * sqft_per_pixel, 1)
return rooms
Room Type Classification
Classify rooms based on their size, shape, position, and any detected text labels or furniture:
from openai import OpenAI
def classify_rooms_with_context(
rooms: list[Room],
image_path: str,
) -> list[Room]:
"""Classify room types using spatial context and LLM reasoning."""
# Extract any text labels near each room
img = Image.open(image_path)
full_text = pytesseract.image_to_string(img)
room_descriptions = []
for room in rooms:
desc = (
f"Room {room.room_id}: "
f"area={room.area_sqft:.0f} sqft, "
f"dimensions={room.bbox[2]}x{room.bbox[3]} pixels, "
f"center=({room.center[0]}, {room.center[1]})"
)
room_descriptions.append(desc)
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": (
"You are a floor plan analyst. Given room measurements "
"and positions, classify each room as one of: living_room, "
"bedroom, kitchen, bathroom, dining_room, hallway, closet, "
"garage, office, laundry, entrance. Consider typical room "
"sizes: bathrooms are small (30-80 sqft), bedrooms are "
"medium (100-200 sqft), living rooms are large (200+ sqft)."
)},
{"role": "user", "content": (
f"Text found on floor plan: {full_text}\n\n"
f"Rooms:\n" + "\n".join(room_descriptions)
)},
],
)
classifications = response.choices[0].message.content
# Parse LLM response and update rooms
for room in rooms:
for room_type in ["living_room", "bedroom", "kitchen",
"bathroom", "hallway", "closet"]:
if (f"Room {room.room_id}" in classifications and
room_type in classifications.lower()):
room.room_type = room_type
break
return rooms
Furniture and Fixture Detection
Detect common furniture symbols in the floor plan:
FURNITURE_TEMPLATES = {
"toilet": {"min_area": 200, "max_area": 800, "aspect_range": (0.5, 1.5)},
"bathtub": {"min_area": 800, "max_area": 3000, "aspect_range": (0.3, 0.6)},
"sink": {"min_area": 100, "max_area": 500, "aspect_range": (0.7, 1.3)},
"bed": {"min_area": 2000, "max_area": 8000, "aspect_range": (0.5, 0.8)},
"table": {"min_area": 500, "max_area": 3000, "aspect_range": (0.6, 1.4)},
}
def detect_furniture(
image: np.ndarray,
rooms: list[Room],
) -> list[Room]:
"""Detect furniture symbols within each room."""
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) if len(image.shape) == 3 else image
_, binary = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY_INV)
for room in rooms:
x, y, w, h = room.bbox
room_region = binary[y:y+h, x:x+w]
contours, _ = cv2.findContours(
room_region, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE
)
for contour in contours:
area = cv2.contourArea(contour)
if area < 100:
continue
bx, by, bw, bh = cv2.boundingRect(contour)
aspect = bw / max(bh, 1)
for name, props in FURNITURE_TEMPLATES.items():
if (props["min_area"] <= area <= props["max_area"] and
props["aspect_range"][0] <= aspect <= props["aspect_range"][1]):
if name not in room.furniture:
room.furniture.append(name)
return rooms
Natural Language Description Generation
Generate listing-quality descriptions from the structured data:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
def generate_property_description(rooms: list[Room]) -> str:
"""Generate a natural language property description."""
total_sqft = sum(r.area_sqft for r in rooms)
bedrooms = [r for r in rooms if r.room_type == "bedroom"]
bathrooms = [r for r in rooms if r.room_type == "bathroom"]
client = OpenAI()
room_details = "\n".join(
f"- {r.room_type.replace('_', ' ').title()}: "
f"{r.area_sqft:.0f} sqft"
f"{', with ' + ', '.join(r.furniture) if r.furniture else ''}"
for r in rooms
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": (
"Write a professional real estate listing description "
"based on these floor plan details. Be factual, "
"highlight the layout flow, and mention room sizes."
)},
{"role": "user", "content": (
f"Total area: {total_sqft:.0f} sqft\n"
f"Bedrooms: {len(bedrooms)}\n"
f"Bathrooms: {len(bathrooms)}\n\n"
f"Room details:\n{room_details}"
)},
],
)
return response.choices[0].message.content
FAQ
How accurate are the area measurements from floor plan analysis?
Pixel-based area estimation typically achieves 85-95% accuracy when a reliable scale reference is detected. The main error sources are perspective distortion in photographs of floor plans, inconsistent line weights, and scale bars that are not detected correctly. For critical measurements, always include a disclaimer that areas are estimates and should be verified by professional measurement.
Can this work on hand-drawn floor plans?
Yes, but with reduced accuracy. Hand-drawn plans have inconsistent line weights, imprecise angles, and often lack scale references. The wall detection stage needs more aggressive morphological operations, and room classification relies more heavily on text labels (which may be handwritten and harder to OCR). Expect 70-80% accuracy on room detection for clean hand-drawn plans.
How do I handle multi-story buildings?
Process each floor plan image independently, then use the LLM to identify common elements (staircases, elevators) that connect floors. Generate a combined description that references the flow between levels. The key challenge is maintaining consistent room numbering across floors.
#FloorPlanAI #RoomDetection #RealEstateAI #ComputerVision #ArchitectureAI #PropertyTech #AgenticAI #Python
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.