Tech Stack Document: Video Search Desktop Client for macOS


1. Introduction

Project Name: Video Search macOS Client
Objective: Develop a native macOS application for offline video content retrieval using OCR-powered search. Supports English/Chinese text recognition and local indexing.


2. Technology Stack

Component Technology Version Rationale
Core Language Swift 5.9 Native macOS integration, performance, and safety.
UI Framework SwiftUI 4.0 Declarative UI, native macOS controls, and Dark Mode support.
OCR Engine Apple Vision Framework macOS 13.0+ On-device text recognition for 60+ languages (incl. EN/ZH), privacy-compliant.
Video Processing AVFoundation macOS 13.0+ Hardware-accelerated frame extraction, metadata handling.
Database SQLite + FTS5 3.42.0 Embedded full-text search with Unicode tokenization for CJK support.
Concurrency Swift Concurrency (async/await) - Non-blocking I/O for OCR and indexing tasks.
Persistence Core Data - ORM for SQLite, simplifies OCR result storage.
Dependency Mgmt Swift Package Manager 5.7+ Native toolchain integration.

3. Architecture Overview

![System Architecture Diagram]

User Interface (SwiftUI) → Business Logic (Swift)  
    ↓                               ↓  
Video Processor (AVFoundation) → OCR Engine (Vision)  
    ↓                               ↓  
Indexer (SQLite FTS5) ←─── Data Store (Core Data)  

Key Flows:

  1. Video Ingestion: Extract frames using AVAssetImageGenerator.
  2. OCR Processing: Batch frame analysis via VNRecognizeTextRequest.
  3. Indexing: Tokenize text (EN/ZH) → store in FTS5 with timestamps.
  4. Query: FTS5 MATCH queries with snippet highlighting.

4. Implementation Steps

Step 1: Video Frame Extraction

let generator = AVAssetImageGenerator(asset: asset)  
generator.generateCGImagesAsynchronously(forTimes: timestamps) { _, image, _, _, _ in  
    guard let cgImage = image else { return }  
    processFrame(cgImage, timestamp)  
}  

Optimization: Extract keyframes only (1 frame/sec) to balance accuracy/performance.

Step 2: OCR Processing

let request = VNRecognizeTextRequest { request, error in  
    guard let observations = request.results as? [VNRecognizedTextObservation] else { return }  
    let texts = observations.compactMap { $0.topCandidates(1).first?.string }  
    indexTexts(texts, timestamp)  
}  
request.recognitionLanguages = ["en-US", "zh-Hans"] // EN + Simplified Chinese  
request.usesLanguageCorrection = true  

Performance: Parallelize using DispatchQueue.global(qos: .userInitiated).

Step 3: Indexing with SQLite FTS5

CREATE VIRTUAL TABLE video_index USING fts5(  
    video_id,  
    timestamp,  
    text,  
    tokenize = 'unicode61 remove_diacritics 2'  
);  

CJK Support: Custom tokenizer for Chinese word segmentation (via icu extension).

Step 4: Search Execution

let query = "SELECT snippet(video_index, 0, '<b>', '</b>', '...', 5) FROM video_index WHERE text MATCH ?"  

Highlighting: Bold keywords in results; return timestamps for video scrubbing.


5. Non-Functional Requirements

Requirement Strategy
Performance - Batch OCR processing (10 frames per batch)
- FTS5 in-memory indexing for active queries
Security - Data sandboxing via macOS App Sandbox
- SQLite encryption using SQLCipher (optional)
Scalability - Modular OCR pipeline (replace Vision with Tesseract if needed)
- Sharded SQLite DBs per video library
Offline First - Zero network dependencies; all assets local

6. Extensibility & Optimization

  • Plug-in OCR Engines: Protocol-based design for alternative engines (e.g., Tesseract).
  • Cache Layer: NSCache for recent search results and thumbnails.
  • Energy Efficiency: Throttle CPU usage during background indexing via ProcessInfo.thermalState.

7. Development Milestones

  1. MVP: Frame extraction + EN OCR → FTS5 indexing (4 weeks).
  2. Phase 2: Chinese tokenization, snippet highlighting (2 weeks).
  3. Phase 3: UI polish (SwiftUI animations, result previews).

8. Risks & Mitigation

  • Risk: Vision OCR accuracy for low-resolution video.
    Mitigation: Pre-upscale frames using CIImage.LanczosScaleTransform.
  • Risk: FTS5 index bloat.
    Mitigation: Auto-vacuum + timestamp-based data retention.

Character Count: 3,182