Frontend Guideline Document: Video Search macOS Desktop Client


1. Introduction

Project Name: Video Search macOS Client
Description: A native macOS application enabling offline video content retrieval via OCR-based text recognition (supporting English/Chinese). Targets professionals, researchers, and content creators for efficient local video indexing and keyword search.


2. Technology Stack

Component Technology & Version Rationale
UI Framework SwiftUI 5.0 Native macOS integration, declarative syntax, and Metal optimization.
OCR Engine Vision Framework (macOS 13+) Apple’s on-device OCR for English/Chinese, offline support, high accuracy.
Video Processing AVFoundation, Core ML 4.0 Hardware-accelerated decoding and ML-based frame analysis.
Database SQLite 3.38 + Core Data Local storage for indexed video metadata (timestamps, OCR text).
Concurrency Swift Concurrency (Async/Await) Non-blocking I/O for OCR and search tasks.

3. Implementation Guidelines

3.1 Project Structure

VideoSearchApp/  
├── App/                 # Main application logic  
├── Models/              # Core Data entities (Video, OCRTextSegment)  
├── Services/            # OCRService, VideoIndexer, SearchEngine  
├── Views/               | SwiftUI components  
│   ├── SearchView.swift  
│   ├── PlayerView.swift # Custom AVPlayer with timestamp navigation  
│   └── SettingsView.swift  
└── Utilities/           # Extensions (e.g., String localization, FileManager)  

3.2 Key Workflows

  • Video Indexing:

    1. User selects video files (MP4, MOV, MKV) via NSOpenPanel.
    2. VideoIndexer extracts frames at 1-sec intervals using AVAssetImageGenerator.
    3. OCRService processes frames via Vision’s VNRecognizeTextRequest, storing results in SQLite with [videoPath, timestamp, text].
  • Search Execution:

    1. User enters a keyword (e.g., "budget meeting").
    2. SearchEngine performs SQLite FTS5 query:
      SELECT videoPath, timestamp FROM OCRIndex  
      WHERE text MATCH 'budget NEAR/5 meeting' AND language IN ('en', 'zh')  
    3. Results displayed as clickable timestamps; clicking jumps to AVPlayer timestamp.

3.3 Localization

  • Use LocalizedStringKey for UI elements.
  • OCR language toggle via VNRecognizeTextRequest’s recognitionLanguages property (set to ["en", "zh"]).

4. Performance Optimization

  • Lazy Loading: Thumbnails and OCR results loaded on-demand via LazyVStack.
  • Background Processing: Frame extraction/OCR offloaded to DispatchQueue.global(qos: .userInitiated).
  • Memory Management:
    • Use NSCache for decoded video thumbnails.
    • Batch OCR requests (max 4 concurrent operations).
  • Indexing Speed: Pre-warm Core ML models on launch for faster OCR.

5. Security & Privacy

  • Data Isolation:
    • All files processed locally; no network permissions.
    • SQLite database encrypted via SQLCipher (AES-256).
  • Sandboxing: Enable App Sandbox in Entitlements:
    <key>com.apple.security.app-sandbox</key>  
    <true/>  
    <key>com.apple.security.files.user-selected.read-only</key>  
    <true/>  
  • OCR Data Handling: Temporary frame data purged after processing.

6. Testing Strategy

Test Type Tools/Methods Coverage
Unit Tests XCTest, Swift Concurrency testing OCRService, SearchEngine logic
UI Tests XCUITest View navigation, player controls
Performance XCTestMetrics, Instruments (Time Profiler) Frame indexing < 50ms/video minute
Localization Pseudolocalization Chinese/English UI consistency

7. Build & Deployment

  • Signing: Notarize with Apple Developer ID.
  • Packaging: Create .dmg via create-dmg CLI tool.
  • Distribution:
    • Mac App Store (MAS): Comply with sandboxing guidelines.
    • Direct download: Host SHA-256 checksum for verification.
  • Updates: Integrate Sparkle 2.4 for offline-compatible delta updates.

8. Scalability & Extensions

  • Plugin System: Future support for third-party OCR engines via NSBundle dynamic loading.
  • Cloud Sync (Optional): End-to-end encrypted sync using CloudKit, disabled by default.
  • Cross-Platform: Potential Catalyst port to iOS/iPadOS with shared Core Data/SwiftUI logic.

Document Revision: 1.0
Compatibility: macOS 13 Ventura or later, Apple Silicon/Intel.


End of Document | Character Count: 3,200