Tech Stack Document
Tech Stack Document: Video Search Desktop Client for macOS
1. Introduction
Project Name: Video Search macOS Client
Objective: Develop a native macOS application for offline video content retrieval using OCR-powered search. Supports English/Chinese text recognition and local indexing.
2. Technology Stack
Component | Technology | Version | Rationale |
---|---|---|---|
Core Language | Swift | 5.9 | Native macOS integration, performance, and safety. |
UI Framework | SwiftUI | 4.0 | Declarative UI, native macOS controls, and Dark Mode support. |
OCR Engine | Apple Vision Framework | macOS 13.0+ | On-device text recognition for 60+ languages (incl. EN/ZH), privacy-compliant. |
Video Processing | AVFoundation | macOS 13.0+ | Hardware-accelerated frame extraction, metadata handling. |
Database | SQLite + FTS5 | 3.42.0 | Embedded full-text search with Unicode tokenization for CJK support. |
Concurrency | Swift Concurrency (async/await) | - | Non-blocking I/O for OCR and indexing tasks. |
Persistence | Core Data | - | ORM for SQLite, simplifies OCR result storage. |
Dependency Mgmt | Swift Package Manager | 5.7+ | Native toolchain integration. |
3. Architecture Overview
![System Architecture Diagram]
User Interface (SwiftUI) → Business Logic (Swift)
↓ ↓
Video Processor (AVFoundation) → OCR Engine (Vision)
↓ ↓
Indexer (SQLite FTS5) ←─── Data Store (Core Data)
Key Flows:
- Video Ingestion: Extract frames using
AVAssetImageGenerator
. - OCR Processing: Batch frame analysis via
VNRecognizeTextRequest
. - Indexing: Tokenize text (EN/ZH) → store in FTS5 with timestamps.
- Query: FTS5 MATCH queries with snippet highlighting.
4. Implementation Steps
Step 1: Video Frame Extraction
let generator = AVAssetImageGenerator(asset: asset)
generator.generateCGImagesAsynchronously(forTimes: timestamps) { _, image, _, _, _ in
guard let cgImage = image else { return }
processFrame(cgImage, timestamp)
}
Optimization: Extract keyframes only (1 frame/sec) to balance accuracy/performance.
Step 2: OCR Processing
let request = VNRecognizeTextRequest { request, error in
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
let texts = observations.compactMap { $0.topCandidates(1).first?.string }
indexTexts(texts, timestamp)
}
request.recognitionLanguages = ["en-US", "zh-Hans"] // EN + Simplified Chinese
request.usesLanguageCorrection = true
Performance: Parallelize using DispatchQueue.global(qos: .userInitiated)
.
Step 3: Indexing with SQLite FTS5
CREATE VIRTUAL TABLE video_index USING fts5(
video_id,
timestamp,
text,
tokenize = 'unicode61 remove_diacritics 2'
);
CJK Support: Custom tokenizer for Chinese word segmentation (via icu
extension).
Step 4: Search Execution
let query = "SELECT snippet(video_index, 0, '<b>', '</b>', '...', 5) FROM video_index WHERE text MATCH ?"
Highlighting: Bold keywords in results; return timestamps for video scrubbing.
5. Non-Functional Requirements
Requirement | Strategy |
---|---|
Performance | - Batch OCR processing (10 frames per batch) - FTS5 in-memory indexing for active queries |
Security | - Data sandboxing via macOS App Sandbox - SQLite encryption using SQLCipher (optional) |
Scalability | - Modular OCR pipeline (replace Vision with Tesseract if needed) - Sharded SQLite DBs per video library |
Offline First | - Zero network dependencies; all assets local |
6. Extensibility & Optimization
- Plug-in OCR Engines: Protocol-based design for alternative engines (e.g., Tesseract).
- Cache Layer:
NSCache
for recent search results and thumbnails. - Energy Efficiency: Throttle CPU usage during background indexing via
ProcessInfo.thermalState
.
7. Development Milestones
- MVP: Frame extraction + EN OCR → FTS5 indexing (4 weeks).
- Phase 2: Chinese tokenization, snippet highlighting (2 weeks).
- Phase 3: UI polish (SwiftUI animations, result previews).
8. Risks & Mitigation
- Risk: Vision OCR accuracy for low-resolution video.
Mitigation: Pre-upscale frames usingCIImage.LanczosScaleTransform
. - Risk: FTS5 index bloat.
Mitigation: Auto-vacuum + timestamp-based data retention.
Character Count: 3,182