AI System Architecture Design

AI System Architecture Design: Video Search Desktop Client

1. Overview

Objective: Design a native macOS application for offline video content retrieval using OCR-based text recognition (supporting English/Chinese).
Key Requirements:

Zero network dependency
High-accuracy OCR for video frames
Efficient indexing and search
Native macOS integration
Security and scalability

2. Architecture Diagram

┌──────────────────────┐       ┌─────────────────┐       ┌─────────────────┐  
│      User Interface  │──────▶│   Core Engine   │──────▶│   Data Storage  │  
│ (SwiftUI 5.0)        │◀──────│ (Swift/C++)     │◀──────│ (SQLite 3.38 +  │  
└──────────────────────┘       └─────────────────┘       │ Core Data)      │  
                               ▲        ▲                └─────────────────┘  
                               │        │  
                 ┌─────────────┘        └──────────────┐  
                 ▼                                     ▼  
       ┌───────────────────┐                ┌─────────────────────┐  
       │  OCR Processing   │                │   Search Engine     │  
       │ (Vision 4.0)      │                │ (Lucene 9.5 +       │  
       └───────────────────┘                │  Custom Tokenizers) │  
                                            └─────────────────────┘

3. Technology Stack & Versions

Component	Technology	Version	Rationale
Frontend	SwiftUI	5.0	Native macOS UX
Backend Core	Swift	5.7	Performance + Apple ecosystem
OCR Engine	Apple Vision Framework	4.0	On-device CN/EN text recognition
Search Indexing	Apache Lucene	9.5	Offline inverted indexing
Database	SQLite + Core Data	3.38	Local storage optimization
Video Processing	AVFoundation	-	Native frame extraction
Concurrency	Grand Central Dispatch (GCD)	-	Parallel processing

4. Component Breakdown

4.1. User Interface (SwiftUI 5.0)

Video Library Manager: Drag-and-drop video ingestion (MP4, MOV, MKV).
Search Dashboard: Keyword input with real-time suggestions.
Result Viewer: Thumbnail grid with timestamped OCR snippets.
Playback Panel: Integrated AVPlayer for clip preview.

4.2. Core Engine (Swift/C++)

Frame Extractor:
- Uses AVAssetImageGenerator to sample frames at 1 FPS (configurable).
- Dynamic resolution scaling (4K → 1080p) to reduce OCR workload.
OCR Pipeline:
- VNRecognizeTextRequest for CN/EN text detection.
- Confidence threshold: 90% (adjustable via settings).
- Post-processing: Noise removal using regex filters.
Indexing Service:
- Lucene-based inverted index mapping: (word → videoID + timestamp)
- Custom tokenizers for Chinese (Jieba 0.47) and English (Snowball).

4.3. Search Engine (Lucene 9.5)

Query Processing:
- Tokenization + stemming (Porter2 for EN, Jieba for CN).
- Fuzzy matching (Levenshtein distance ≤ 2).
Ranking:
- TF-IDF scoring boosted by temporal proximity.
Caching: LRU cache for frequent queries (min. 200ms latency).

4.4. Data Storage (SQLite 3.38 + Core Data)

Schema:
- VideoMeta: path, duration, checksum (SHA-256).
- OCRIndex: word, videoID, timestamps, confidence.
Encryption: AES-256 for metadata at rest.

5. Workflow

Ingestion:
- User adds video → checksum validation → extract key metadata.
Processing:
- Frame extraction → OCR via Vision → text normalization → Lucene indexing.
Search:
- Query tokenization → index lookup → relevance ranking → results rendering.
Playback:
- Direct frame seek using AVPlayer.seek(to: toleranceBefore: toleranceAfter:).

6. Performance Optimization

Parallelism:
- GCD queues for OCR (4 threads) and indexing (2 threads).
Resource Management:
- Frame batch processing (max 100 frames/batch).
- Index compression (Zstandard).
Latency Targets:
- Indexing: ≤ 1.5× real-time (e.g., 10-min video in ≤15 mins).
- Search: ≤ 300ms for 10k-indexed videos.

7. Security & Privacy

Data Isolation: Sandboxed storage with macOS App Sandbox.
Permissions: Explicit user consent for video access.
No Telemetry: Zero data exfiltration; all processing on-device.

8. Scalability

Modular Design:
- Plug-in architecture for future OCR engines (e.g., Tesseract).
Index Sharding: Splits index by video date/size.
Resource Scaling: Auto-reduces frame rate on low RAM.

9. Deployment

Build: Xcode 14.3, macOS SDK 13.0+.
Distribution: Notarized .dmg via App Store/standalone.
Dependencies: Embedded Lucene; Jieba via Swift Package Manager.

10. Metrics & Monitoring

Instrumentation:
- os_signpost for profiling OCR/search latency.
- Memory/CPU usage logs (disabled by default).
User-Configurable:
- Frame sampling rate (0.5–5 FPS).
- Index purge scheduler.

Character Count: 3,812
This architecture ensures offline efficiency, leverages Apple-native frameworks for optimal macOS integration, and scales for large video libraries while maintaining strict privacy standards.