AI System Architecture Design
AI System Architecture Design: Video Search Desktop Client
1. Overview
Objective: Design a native macOS application for offline video content retrieval using OCR-based text recognition (supporting English/Chinese).
Key Requirements:
- Zero network dependency
- High-accuracy OCR for video frames
- Efficient indexing and search
- Native macOS integration
- Security and scalability
2. Architecture Diagram
┌──────────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ User Interface │──────▶│ Core Engine │──────▶│ Data Storage │
│ (SwiftUI 5.0) │◀──────│ (Swift/C++) │◀──────│ (SQLite 3.38 + │
└──────────────────────┘ └─────────────────┘ │ Core Data) │
▲ ▲ └─────────────────┘
│ │
┌─────────────┘ └──────────────┐
▼ ▼
┌───────────────────┐ ┌─────────────────────┐
│ OCR Processing │ │ Search Engine │
│ (Vision 4.0) │ │ (Lucene 9.5 + │
└───────────────────┘ │ Custom Tokenizers) │
└─────────────────────┘
3. Technology Stack & Versions
Component | Technology | Version | Rationale |
---|---|---|---|
Frontend | SwiftUI | 5.0 | Native macOS UX |
Backend Core | Swift | 5.7 | Performance + Apple ecosystem |
OCR Engine | Apple Vision Framework | 4.0 | On-device CN/EN text recognition |
Search Indexing | Apache Lucene | 9.5 | Offline inverted indexing |
Database | SQLite + Core Data | 3.38 | Local storage optimization |
Video Processing | AVFoundation | - | Native frame extraction |
Concurrency | Grand Central Dispatch (GCD) | - | Parallel processing |
4. Component Breakdown
4.1. User Interface (SwiftUI 5.0)
- Video Library Manager: Drag-and-drop video ingestion (MP4, MOV, MKV).
- Search Dashboard: Keyword input with real-time suggestions.
- Result Viewer: Thumbnail grid with timestamped OCR snippets.
- Playback Panel: Integrated AVPlayer for clip preview.
4.2. Core Engine (Swift/C++)
- Frame Extractor:
- Uses
AVAssetImageGenerator
to sample frames at 1 FPS (configurable). - Dynamic resolution scaling (4K → 1080p) to reduce OCR workload.
- Uses
- OCR Pipeline:
VNRecognizeTextRequest
for CN/EN text detection.- Confidence threshold: 90% (adjustable via settings).
- Post-processing: Noise removal using regex filters.
- Indexing Service:
- Lucene-based inverted index mapping:
(word → videoID + timestamp)
- Custom tokenizers for Chinese (Jieba 0.47) and English (Snowball).
- Lucene-based inverted index mapping:
4.3. Search Engine (Lucene 9.5)
- Query Processing:
- Tokenization + stemming (Porter2 for EN, Jieba for CN).
- Fuzzy matching (Levenshtein distance ≤ 2).
- Ranking:
- TF-IDF scoring boosted by temporal proximity.
- Caching: LRU cache for frequent queries (min. 200ms latency).
4.4. Data Storage (SQLite 3.38 + Core Data)
- Schema:
VideoMeta
: path, duration, checksum (SHA-256).OCRIndex
: word, videoID, timestamps, confidence.
- Encryption: AES-256 for metadata at rest.
5. Workflow
- Ingestion:
- User adds video → checksum validation → extract key metadata.
- Processing:
- Frame extraction → OCR via Vision → text normalization → Lucene indexing.
- Search:
- Query tokenization → index lookup → relevance ranking → results rendering.
- Playback:
- Direct frame seek using
AVPlayer.seek(to: toleranceBefore: toleranceAfter:)
.
- Direct frame seek using
6. Performance Optimization
- Parallelism:
- GCD queues for OCR (4 threads) and indexing (2 threads).
- Resource Management:
- Frame batch processing (max 100 frames/batch).
- Index compression (Zstandard).
- Latency Targets:
- Indexing: ≤ 1.5× real-time (e.g., 10-min video in ≤15 mins).
- Search: ≤ 300ms for 10k-indexed videos.
7. Security & Privacy
- Data Isolation: Sandboxed storage with macOS App Sandbox.
- Permissions: Explicit user consent for video access.
- No Telemetry: Zero data exfiltration; all processing on-device.
8. Scalability
- Modular Design:
- Plug-in architecture for future OCR engines (e.g., Tesseract).
- Index Sharding: Splits index by video date/size.
- Resource Scaling: Auto-reduces frame rate on low RAM.
9. Deployment
- Build: Xcode 14.3, macOS SDK 13.0+.
- Distribution: Notarized .dmg via App Store/standalone.
- Dependencies: Embedded Lucene; Jieba via Swift Package Manager.
10. Metrics & Monitoring
- Instrumentation:
os_signpost
for profiling OCR/search latency.- Memory/CPU usage logs (disabled by default).
- User-Configurable:
- Frame sampling rate (0.5–5 FPS).
- Index purge scheduler.
Character Count: 3,812
This architecture ensures offline efficiency, leverages Apple-native frameworks for optimal macOS integration, and scales for large video libraries while maintaining strict privacy standards.