Tech Stack Document

Tech Stack Document: Video Search Desktop Client for macOS

1. Introduction

Project Name: Video Search macOS Client
Objective: Develop a native macOS application for offline video content retrieval using OCR-powered search. Supports English/Chinese text recognition and local indexing.

2. Technology Stack

Component	Technology	Version	Rationale
Core Language	Swift	5.9	Native macOS integration, performance, and safety.
UI Framework	SwiftUI	4.0	Declarative UI, native macOS controls, and Dark Mode support.
OCR Engine	Apple Vision Framework	macOS 13.0+	On-device text recognition for 60+ languages (incl. EN/ZH), privacy-compliant.
Video Processing	AVFoundation	macOS 13.0+	Hardware-accelerated frame extraction, metadata handling.
Database	SQLite + FTS5	3.42.0	Embedded full-text search with Unicode tokenization for CJK support.
Concurrency	Swift Concurrency (async/await)	-	Non-blocking I/O for OCR and indexing tasks.
Persistence	Core Data	-	ORM for SQLite, simplifies OCR result storage.
Dependency Mgmt	Swift Package Manager	5.7+	Native toolchain integration.

3. Architecture Overview

![System Architecture Diagram]

User Interface (SwiftUI) → Business Logic (Swift)  
    ↓                               ↓  
Video Processor (AVFoundation) → OCR Engine (Vision)  
    ↓                               ↓  
Indexer (SQLite FTS5) ←─── Data Store (Core Data)

Key Flows:

Video Ingestion: Extract frames using AVAssetImageGenerator.
OCR Processing: Batch frame analysis via VNRecognizeTextRequest.
Indexing: Tokenize text (EN/ZH) → store in FTS5 with timestamps.
Query: FTS5 MATCH queries with snippet highlighting.

4. Implementation Steps

Step 1: Video Frame Extraction

let generator = AVAssetImageGenerator(asset: asset)  
generator.generateCGImagesAsynchronously(forTimes: timestamps) { _, image, _, _, _ in  
    guard let cgImage = image else { return }  
    processFrame(cgImage, timestamp)  
}

Optimization: Extract keyframes only (1 frame/sec) to balance accuracy/performance.

Step 2: OCR Processing

let request = VNRecognizeTextRequest { request, error in  
    guard let observations = request.results as? [VNRecognizedTextObservation] else { return }  
    let texts = observations.compactMap { $0.topCandidates(1).first?.string }  
    indexTexts(texts, timestamp)  
}  
request.recognitionLanguages = ["en-US", "zh-Hans"] // EN + Simplified Chinese  
request.usesLanguageCorrection = true

Performance: Parallelize using DispatchQueue.global(qos: .userInitiated).

Step 3: Indexing with SQLite FTS5

CREATE VIRTUAL TABLE video_index USING fts5(  
    video_id,  
    timestamp,  
    text,  
    tokenize = 'unicode61 remove_diacritics 2'  
);

CJK Support: Custom tokenizer for Chinese word segmentation (via icu extension).

Step 4: Search Execution

let query = "SELECT snippet(video_index, 0, '<b>', '</b>', '...', 5) FROM video_index WHERE text MATCH ?"

Highlighting: Bold keywords in results; return timestamps for video scrubbing.

5. Non-Functional Requirements

Requirement	Strategy
Performance	- Batch OCR processing (10 frames per batch) - FTS5 in-memory indexing for active queries
Security	- Data sandboxing via macOS App Sandbox - SQLite encryption using SQLCipher (optional)
Scalability	- Modular OCR pipeline (replace Vision with Tesseract if needed) - Sharded SQLite DBs per video library
Offline First	- Zero network dependencies; all assets local

6. Extensibility & Optimization

Plug-in OCR Engines: Protocol-based design for alternative engines (e.g., Tesseract).
Cache Layer: NSCache for recent search results and thumbnails.
Energy Efficiency: Throttle CPU usage during background indexing via ProcessInfo.thermalState.

7. Development Milestones

MVP: Frame extraction + EN OCR → FTS5 indexing (4 weeks).
Phase 2: Chinese tokenization, snippet highlighting (2 weeks).
Phase 3: UI polish (SwiftUI animations, result previews).

8. Risks & Mitigation

Risk: Vision OCR accuracy for low-resolution video.
Mitigation: Pre-upscale frames using CIImage.LanczosScaleTransform.
Risk: FTS5 index bloat.
Mitigation: Auto-vacuum + timestamp-based data retention.

Character Count: 3,182