Skip to content
Learn Agentic AI
Learn Agentic AI13 min read8 views

Building a File Organization Agent: AI-Powered Document Categorization and Filing

Build an AI agent that scans directories, analyzes file content, categorizes documents by type and topic, and organizes them into a structured folder hierarchy with consistent naming conventions.

The Cost of Digital Disorganization

A typical shared drive accumulates thousands of files with names like "Final_v2_REVISED.docx" and "report copy (3).pdf." Finding the right document means searching through nested folders with inconsistent naming, duplicate files scattered across directories, and no clear taxonomy. An AI file organization agent solves this by analyzing file content, categorizing documents by type and topic, and filing them into a structured hierarchy.

This guide builds a complete file organization agent that scans directories, extracts content from multiple file types, uses an LLM for intelligent categorization, and reorganizes files with consistent naming.

Scanning and Extracting File Content

The agent needs to read content from various file types. We create extractors for the most common formats:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
from pathlib import Path
from dataclasses import dataclass
import mimetypes

@dataclass
class FileInfo:
    path: Path
    name: str
    extension: str
    size_bytes: int
    content_preview: str
    mime_type: str

def extract_text_content(filepath: Path, max_chars: int = 2000) -> str:
    """Extract text content from common file types."""
    ext = filepath.suffix.lower()

    if ext in (".txt", ".md", ".csv", ".log", ".json", ".yaml", ".yml"):
        return filepath.read_text(errors="replace")[:max_chars]

    if ext == ".pdf":
        import pymupdf
        doc = pymupdf.open(str(filepath))
        text = ""
        for page in doc:
            text += page.get_text()
            if len(text) > max_chars:
                break
        doc.close()
        return text[:max_chars]

    if ext in (".docx",):
        from docx import Document
        doc = Document(str(filepath))
        text = "\n".join(p.text for p in doc.paragraphs)
        return text[:max_chars]

    if ext in (".xlsx", ".xls"):
        import openpyxl
        wb = openpyxl.load_workbook(str(filepath), read_only=True)
        text = ""
        for sheet in wb.sheetnames[:3]:
            ws = wb[sheet]
            for row in ws.iter_rows(max_row=20, values_only=True):
                text += " ".join(str(c) for c in row if c) + "\n"
        return text[:max_chars]

    return ""

def scan_directory(directory: str, recursive: bool = True) -> list[FileInfo]:
    """Scan a directory and extract file information."""
    root = Path(directory)
    pattern = "**/*" if recursive else "*"
    files = []

    for filepath in root.glob(pattern):
        if filepath.is_file() and not filepath.name.startswith("."):
            content = extract_text_content(filepath)
            mime, _ = mimetypes.guess_type(str(filepath))
            files.append(FileInfo(
                path=filepath,
                name=filepath.name,
                extension=filepath.suffix.lower(),
                size_bytes=filepath.stat().st_size,
                content_preview=content,
                mime_type=mime or "application/octet-stream",
            ))

    return files

AI-Powered Categorization

The agent sends file metadata and content previews to an LLM for intelligent categorization. The model determines the document type, topic, and an appropriate filename:

from openai import OpenAI
import json

client = OpenAI()

CATEGORIES = {
    "contracts": "Legal agreements, NDAs, service contracts, amendments",
    "proposals": "Business proposals, RFPs, pitch decks",
    "invoices": "Invoices, receipts, purchase orders, billing statements",
    "reports": "Analytics reports, status updates, research findings",
    "correspondence": "Emails, letters, memos, meeting notes",
    "technical": "Architecture docs, API specs, runbooks, code reviews",
    "marketing": "Campaign materials, brand assets, social media content",
    "hr": "Employee records, policies, offer letters, reviews",
    "misc": "Files that do not fit other categories",
}

def categorize_file(file_info: FileInfo) -> dict:
    """Use LLM to categorize a file based on its content and metadata."""
    category_desc = "\n".join(f"- {k}: {v}" for k, v in CATEGORIES.items())

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        response_format={"type": "json_object"},
        messages=[
            {
                "role": "system",
                "content": (
                    "You categorize files. Return JSON with:\n"
                    "- category: one of the categories below\n"
                    "- subcategory: a specific subcategory (e.g., 'nda' under contracts)\n"
                    "- suggested_name: a clean descriptive filename (lowercase, hyphens, no spaces)\n"
                    "- confidence: float 0-1\n"
                    "- summary: one sentence describing the file\n\n"
                    f"Categories:\n{category_desc}"
                ),
            },
            {
                "role": "user",
                "content": (
                    f"Filename: {file_info.name}\n"
                    f"Type: {file_info.mime_type}\n"
                    f"Size: {file_info.size_bytes} bytes\n\n"
                    f"Content preview:\n{file_info.content_preview[:1500]}"
                ),
            },
        ],
    )
    return json.loads(response.choices[0].message.content)

Building the Folder Structure

The agent creates a structured folder hierarchy based on categories and subcategories:

from datetime import datetime

def build_target_path(
    base_dir: str,
    category: str,
    subcategory: str,
    suggested_name: str,
    original_ext: str,
    year: int | None = None,
) -> Path:
    """Build a target path following the folder structure convention."""
    if year is None:
        year = datetime.now().year

    target_dir = Path(base_dir) / category / subcategory / str(year)
    target_dir.mkdir(parents=True, exist_ok=True)

    filename = f"{suggested_name}{original_ext}"
    target = target_dir / filename

    # Handle name collisions
    counter = 1
    while target.exists():
        target = target_dir / f"{suggested_name}-{counter}{original_ext}"
        counter += 1

    return target

Executing the Organization Plan

Before moving files, the agent generates a plan for human review. This prevents destructive mistakes:

import shutil
import logging

logger = logging.getLogger("file_agent")

@dataclass
class FilePlan:
    source: Path
    destination: Path
    category: str
    confidence: float
    summary: str

def create_organization_plan(
    source_dir: str, target_dir: str
) -> list[FilePlan]:
    """Scan files and create an organization plan without moving anything."""
    files = scan_directory(source_dir)
    plan = []

    for file_info in files:
        result = categorize_file(file_info)
        dest = build_target_path(
            target_dir,
            result["category"],
            result.get("subcategory", "general"),
            result["suggested_name"],
            file_info.extension,
        )
        plan.append(FilePlan(
            source=file_info.path,
            destination=dest,
            category=result["category"],
            confidence=result["confidence"],
            summary=result["summary"],
        ))

    return plan

def execute_plan(plan: list[FilePlan], min_confidence: float = 0.7):
    """Execute the organization plan, moving files above the confidence threshold."""
    for item in plan:
        if item.confidence < min_confidence:
            logger.warning(f"Skipping (low confidence {item.confidence}): {item.source}")
            continue
        item.destination.parent.mkdir(parents=True, exist_ok=True)
        shutil.move(str(item.source), str(item.destination))
        logger.info(f"Moved: {item.source.name} -> {item.destination}")

The confidence threshold ensures that files the AI is unsure about remain untouched for manual review. Start with a high threshold like 0.85 and lower it as you validate accuracy.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

FAQ

How do I handle duplicate files during organization?

Compute a SHA-256 hash of each file's content before moving. Maintain a hash-to-path mapping and flag duplicates. Let the user choose which copy to keep. For near-duplicates like different versions of the same document, compare filenames and modification dates to identify the most recent version.

What about files the AI cannot read, like images or videos?

For images, use an LLM with vision capabilities to describe the content. For videos, extract metadata like duration and codec using ffprobe. Fall back to filename analysis and file extension when content extraction is impossible. These files typically end up in a media category with subcategories based on metadata.

How do I undo a batch organization if something goes wrong?

Log every move operation with source and destination paths in a JSON manifest file. To undo, read the manifest and reverse each move. This is why the plan-then-execute pattern is critical — the plan itself serves as an undo log.


#FileOrganization #AIAgents #DocumentClassification #WorkflowAutomation #Python #Automation #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

Handoffs done right — when one agent should hand control to another, how to preserve context, and how to evaluate the handoff decision itself.

AI Strategy

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Q1 2026 saw a record acquisition wave: Aircall bought Vogent (May), Meta acquired Manus and PlayAI, OpenAI closed six deals. The voice AI consolidation phase has begun.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Use LangGraph's checkpointer to make agents resumable across crashes and human-in-the-loop pauses, then replay any checkpoint into your eval pipeline.

Agentic AI

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

How LangGraph's StateGraph, channels, and reducers actually work — with a working multi-step agent, eval hooks at every node, and the patterns that survive production.

Agentic AI

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026

The supervisor pattern in LangGraph for coordinating specialist agents, with full code, an eval pipeline that scores routing accuracy, and the failure modes to watch for.