Project Requirements Document
Project Requirements Document: Video Search Desktop Client for macOS
1. Introduction
Project Name: Video Search macOS Client
Version: 1.0.0
Objective: Develop a native macOS application for offline video content retrieval using OCR and AI-powered indexing. The tool enables rapid keyword-based searches within locally stored videos, supporting English and Chinese text recognition without internet dependency.
2. Project Overview
- Core Functionality:
- Offline video indexing and search via OCR.
- Support for Chinese (Simplified) and English text extraction.
- Keyword-based timestamped video snippet retrieval.
- Target Users: Students, researchers, content creators, and professionals handling large local video libraries.
3. Functional Requirements
ID | Requirement | Description |
---|---|---|
FR-01 | Video Indexing | Automatically scan and index user-specified directories for MP4, MOV, AVI files. |
FR-02 | OCR Text Extraction | Extract text from video frames using Tesseract OCR (v5.3.0) with LSTM engines. |
FR-03 | Multilingual Support | Recognize Chinese (chi_sim) and English (eng) via trained language models. |
FR-04 | Keyword Search | Return video snippets with timestamps matching user queries (e.g., "slide 12"). |
FR-05 | Results Display | Show clickable video thumbnails with highlighted keywords and playback controls. |
FR-06 | User Preferences | Customize scan intervals, file exclusions, and OCR sensitivity. |
4. Non-Functional Requirements
- Performance:
- Indexing: Process 1 hour of 1080p video in ≤10 minutes (M1 Pro CPU).
- Search: Return results for 10,000 indexed videos in <500ms.
- Security:
- All data processed locally; zero network transmission.
- Sandboxed execution with macOS App Transport Security (ATS).
- Scalability:
- Support libraries up to 10TB via incremental indexing.
- Compatibility:
- macOS Monterey (12.0+) and Apple Silicon (ARM64)/Intel x86.
5. Technical Specifications
Component | Technology/Tool | Version | Rationale |
---|---|---|---|
Core Framework | SwiftUI | 5.0 | Native macOS UI with declarative syntax. |
OCR Engine | Tesseract OCR + OpenCV (Preprocessing) | 5.3.0 | High-accuracy multilingual text recognition. |
Video Processing | AVFoundation + Core ML | macOS 12.0+ | Hardware-accelerated decoding/frame extraction. |
Database | SQLite (Embedded) | 3.38 | Lightweight storage for indexed metadata. |
Concurrency | Grand Central Dispatch (GCD) | - | Background indexing without UI lag. |
Dependency Manager | Swift Package Manager (SPM) | 5.7 | Native Apple ecosystem integration. |
6. Implementation Steps
Phase 1: Setup & Core Architecture (2 Weeks)
- Initialize Xcode project (SwiftUI + MVVM pattern).
- Integrate Tesseract OCR via Swift wrappers (
SwiftyTesseract
). - Configure SQLite schema for metadata (video paths, timestamps, OCR text).
Phase 2: Video Processing Module (3 Weeks)
- Implement AVFoundation pipeline for frame extraction (1 frame/sec).
- Preprocess frames with OpenCV (grayscale + contrast enhancement).
- Run OCR via Tesseract; store results in SQLite.
Phase 3: Search & UI (2 Weeks)
- Build inverted index for keywords (e.g., "meeting" → [videoID, timestamp]).
- Develop SwiftUI search view with filters (date, duration, language).
- Integrate video player using
AVKit
.
Phase 4: Optimization & Testing (3 Weeks)
- Profile performance with Instruments (CPU/GPU usage).
- Validate OCR accuracy: ≥95% F1-score for Chinese/English.
- Conduct user acceptance testing (UAT) with target personas.
7. Security & Compliance
- Data Privacy:
- All processing occurs in the user’s sandboxed environment.
- No third-party data sharing; zero telemetry.
- Permissions:
- Explicit user consent for file access (macOS Privacy API).
8. Future Extensions
- Add support for Japanese/Korean OCR via Tesseract language packs.
- Integrate Core ML for scene detection (e.g., "whiteboard," "person").
- Export search results as SRT subtitles or CSV.
Document Length: 2,980 characters
Approval: Pending review by Engineering Lead & Product Owner.