Project Requirements Document: Video Search Desktop Client for macOS


1. Introduction

Project Name: Video Search macOS Client
Version: 1.0.0
Objective: Develop a native macOS application for offline video content retrieval using OCR and AI-powered indexing. The tool enables rapid keyword-based searches within locally stored videos, supporting English and Chinese text recognition without internet dependency.


2. Project Overview

  • Core Functionality:
    • Offline video indexing and search via OCR.
    • Support for Chinese (Simplified) and English text extraction.
    • Keyword-based timestamped video snippet retrieval.
  • Target Users: Students, researchers, content creators, and professionals handling large local video libraries.

3. Functional Requirements

ID Requirement Description
FR-01 Video Indexing Automatically scan and index user-specified directories for MP4, MOV, AVI files.
FR-02 OCR Text Extraction Extract text from video frames using Tesseract OCR (v5.3.0) with LSTM engines.
FR-03 Multilingual Support Recognize Chinese (chi_sim) and English (eng) via trained language models.
FR-04 Keyword Search Return video snippets with timestamps matching user queries (e.g., "slide 12").
FR-05 Results Display Show clickable video thumbnails with highlighted keywords and playback controls.
FR-06 User Preferences Customize scan intervals, file exclusions, and OCR sensitivity.

4. Non-Functional Requirements

  • Performance:
    • Indexing: Process 1 hour of 1080p video in ≤10 minutes (M1 Pro CPU).
    • Search: Return results for 10,000 indexed videos in <500ms.
  • Security:
    • All data processed locally; zero network transmission.
    • Sandboxed execution with macOS App Transport Security (ATS).
  • Scalability:
    • Support libraries up to 10TB via incremental indexing.
  • Compatibility:
    • macOS Monterey (12.0+) and Apple Silicon (ARM64)/Intel x86.

5. Technical Specifications

Component Technology/Tool Version Rationale
Core Framework SwiftUI 5.0 Native macOS UI with declarative syntax.
OCR Engine Tesseract OCR + OpenCV (Preprocessing) 5.3.0 High-accuracy multilingual text recognition.
Video Processing AVFoundation + Core ML macOS 12.0+ Hardware-accelerated decoding/frame extraction.
Database SQLite (Embedded) 3.38 Lightweight storage for indexed metadata.
Concurrency Grand Central Dispatch (GCD) - Background indexing without UI lag.
Dependency Manager Swift Package Manager (SPM) 5.7 Native Apple ecosystem integration.

6. Implementation Steps

  1. Phase 1: Setup & Core Architecture (2 Weeks)

    • Initialize Xcode project (SwiftUI + MVVM pattern).
    • Integrate Tesseract OCR via Swift wrappers (SwiftyTesseract).
    • Configure SQLite schema for metadata (video paths, timestamps, OCR text).
  2. Phase 2: Video Processing Module (3 Weeks)

    • Implement AVFoundation pipeline for frame extraction (1 frame/sec).
    • Preprocess frames with OpenCV (grayscale + contrast enhancement).
    • Run OCR via Tesseract; store results in SQLite.
  3. Phase 3: Search & UI (2 Weeks)

    • Build inverted index for keywords (e.g., "meeting" → [videoID, timestamp]).
    • Develop SwiftUI search view with filters (date, duration, language).
    • Integrate video player using AVKit.
  4. Phase 4: Optimization & Testing (3 Weeks)

    • Profile performance with Instruments (CPU/GPU usage).
    • Validate OCR accuracy: ≥95% F1-score for Chinese/English.
    • Conduct user acceptance testing (UAT) with target personas.

7. Security & Compliance

  • Data Privacy:
    • All processing occurs in the user’s sandboxed environment.
    • No third-party data sharing; zero telemetry.
  • Permissions:
    • Explicit user consent for file access (macOS Privacy API).

8. Future Extensions

  • Add support for Japanese/Korean OCR via Tesseract language packs.
  • Integrate Core ML for scene detection (e.g., "whiteboard," "person").
  • Export search results as SRT subtitles or CSV.

Document Length: 2,980 characters
Approval: Pending review by Engineering Lead & Product Owner.