Qasar Media Server
Enterprise media management solution
Qasar Media Server - Features Overview
A comprehensive media server solution providing scalable upload, processing, adaptive playback, and intelligent labeling capabilities. This repository includes a FastAPI backend server, iOS packages for easy integration, sample applications, and advanced AR technologies for content labeling and moderation.
Table of Contents
- Server Features
- Sample Applications
- iOS Packages
- AR Technologies for Labeling and Moderation
- Adaptive Playback and Packaging
Server Features
FastAPI Backend
The server is built on FastAPI with PostgreSQL and Redis, providing a robust, scalable foundation for media management.
Core Capabilities
- RESTful API with OpenAPI/Swagger documentation
- Multi-tenant architecture with tenant and user isolation
- Asynchronous processing using background workers
- Database migrations via Alembic
- Redis-based job queue for media processing tasks
Chunked Upload System
Efficient large-file upload handling with resume support:
- Session-based uploads: Create upload sessions with configurable chunk sizes
- Resumable uploads: Track chunk metadata in Redis for reliable resume capability
- Progress tracking: Real-time upload progress monitoring
- Multi-chunk support: Upload files in configurable chunk sizes
- Automatic validation: File integrity checks during upload
Upload Flow:
1. Create upload session with metadata (filename, total size, chunk size)
2. Upload chunks sequentially or in parallel
3. Complete session to trigger processing
Media Processing Pipeline
Automated transcoding and packaging for multiple formats:
- Video Processing:
- Multi-resolution HLS transcoding (1080p, 720p, 480p)
- DASH manifest generation
- WebM/VP9 encoding
- Thumbnail generation (up to 25 thumbnails per video)
- Poster image extraction
-
Duration and dimension extraction (width, height, SAR, DAR)
-
Audio Processing:
- WebM/Opus encoding
- Spectrogram generation
-
Poster image from spectrogram
-
Image Processing:
- WebP conversion
- Thumbnail generation
-
Poster image creation
-
Non-packaged Types: Images, audio files, and documents are served directly without transcoding
Media Management API
Comprehensive media asset management:
- Media Retrieval: Get media by ID with full metadata
- Search: Full-text search across labels and captions
- Recommendations: Stub for recommendation engine integration
- Feed Endpoints: New media, recommended media, search results
- Metadata Management: Captions, user tags, labels
- Status Tracking: Processing status (queued, processing, ready, error)
Label Management System
Flexible labeling infrastructure:
- Label Banks: Tenant-specific label collections
- General Labels: Content categorization labels
- Moderation Labels: Content moderation and safety labels
- Label Statistics: Track label usage, confidence scores, and match counts
- Label Associations: Link labels to media assets with confidence scores
- Hierarchical Labels: Support for parent-child label relationships
Storage Architecture
Organized storage structure:
/assets/
/<tenant_id>/
/<user_id>/
/<media_id>/
/media/ # Original uploaded file
/hls/ # HLS playlists and segments
/dash/ # DASH manifest and segments
/webm/ # WebM encoded files
/webp/ # WebP converted images
/thumbnails/ # Thumbnail images
/posters/ # Poster images
/transcripts/ # Transcript files
Sample Applications
iOS Sample Application (QasarMedia)
A complete SwiftUI iOS application demonstrating integration with the media server.
Features
- Vertical Video Feed: TikTok-style scrollable feed with autoplay
- Smart Playback: Autoplay when video reaches top 25% of screen
- Infinite Scroll: Pagination for seamless content browsing
- Media Upload: Chunked upload with progress tracking
- Media Download: Save videos to Photos or Files app
- Settings Management: Configure server URL, tenant ID, and user ID
- HLS Playback: Native AVPlayer integration with adaptive streaming
- Label Management: View and manage labels for uploaded content
- Creator View: Upload and label media content
Technical Stack
- SwiftUI for modern UI
- AVFoundation for media playback
- Combine for reactive programming
- CoreML for on-device ML inference
- Vision Framework for computer vision tasks
Web Application (Flask)
A Flask-based web application for media management and moderation.
Features
- Media Feed: Browse uploaded media with pagination
- Creator Interface: Upload and manage media content
- Label Management: Create and manage label banks
- Moderation Interface: Review and moderate content using moderation labels
- Settings: Configure server and tenant settings
- Tailwind CSS: Modern, responsive UI styling
Blueprints
feed: Media browsing and viewingcreator: Media upload and creationlabels: General label managementmoderation_labels: Content moderation labelssettings: Application configuration
iOS Packages
Two Swift Package Manager packages for easy integration into new or existing iOS applications.
MediaUpload Package
A complete solution for uploading media to the Qasar Media Server.
Components
MediaUploadService: Main service for coordinating uploadsUploadCoordinator: Manages upload sessions and chunk coordinationUploadAPIClient: REST API client for server communicationUploadProgressView: SwiftUI view for displaying upload progressVideoLabeler: Protocol for on-device video labeling (optional)UploadTypes: Type definitions for upload operations
Features
- Chunked Upload: Automatic chunking and upload coordination
- Resume Support: Resume interrupted uploads
- Progress Tracking: Real-time upload progress callbacks
- Error Handling: Comprehensive error handling and retry logic
- Label Integration: Optional on-device labeling before upload
- Background Upload: Support for background upload tasks
Usage
let uploadService = MediaUploadService(
apiClient: UploadAPIClient(baseURL: serverURL),
labeler: yourLabelingService // Optional
)
try await uploadService.uploadVideo(
url: videoURL,
tenantId: tenantUUID,
userId: userUUID
)
VideoPlayer Package
A high-performance video player component with adaptive streaming support.
Components
VideoPlayerView: SwiftUI view for video playbackVideoPlayerManager: Centralized player management and prewarmingVideoPlayerItem: Protocol for media itemsThumbnailCache: Efficient thumbnail caching system
Features
- Adaptive Streaming: HLS and DASH support with automatic quality selection
- Player Prewarming: Preload next videos for seamless playback
- Thumbnail Scrubbing: Preview thumbnails during scrubbing
- Aspect Ratio Handling: Correct SAR/DAR calculation for proper display
- Poster Images: Display poster images before playback starts
- Memory Management: Efficient player lifecycle management
- Tap Gestures: Customizable tap handling for play/pause
Usage
VideoPlayerView(
item: mediaItem,
onTap: { /* Handle tap */ },
onScrubStart: { /* Handle scrub start */ },
onScrubEnd: { time in /* Handle scrub end */ }
)
AR Technologies for Labeling and Moderation
Advanced on-device AI/ML technologies for automatic content labeling and moderation.
MobileCLIP Integration
Semantic understanding using Apple's MobileCLIP models:
- Image Embeddings: Extract semantic embeddings from video frames
- Text Embeddings: Generate embeddings for label phrases using multiple templates
- Similarity Matching: Match video frames to labels using cosine similarity
- Template-based Prompts: Multiple prompt templates for improved accuracy
- "a photo of {label}"
- "a video frame of {label}"
- "a close-up photo of {label}"
- "a scene of {label}"
- "a photo of the {label}"
YOLO Object Detection
Real-time object detection using YOLOv8:
- CoreML Integration: Native CoreML model support
- Vision Framework: Integration with Apple's Vision framework
- Custom Label Mapping: Map detected objects to label bank entries
- Confidence Scoring: Track detection confidence for moderation decisions
- Batch Processing: Efficient frame-by-frame detection
Vision ROI Analyzer
Intelligent region-of-interest analysis:
- Face Detection: Detect faces in video frames
- Landmark Detection: Identify facial landmarks for selfie detection
- Animal Detection: Recognize animals in content
- Object Tracking: Track objects across frames
Labeling Service
Comprehensive labeling pipeline:
- Frame Sampling: Intelligent frame sampling at configurable FPS
- Keyframe Detection: Prioritize keyframes for efficient processing
- Multi-model Fusion: Combine CLIP and YOLO results
- Confidence Thresholding: Filter labels by confidence scores
- Temporal Aggregation: Aggregate labels across video duration
- Label Bank Integration: Match against tenant-specific label banks
Label Banks
Flexible label management system:
- General Labels: Content categorization (objects, scenes, activities)
- Moderation Labels: Safety and content moderation labels
- Phrase Variants: Multiple phrases per label for improved matching
- Statistics Tracking: Track label usage and performance metrics
- Tenant Isolation: Separate label banks per tenant
Smart Reframe Transforms
Intelligent video composition:
- ROI-based Cropping: Crop videos based on detected regions of interest
- Aspect Ratio Adaptation: Adapt content for different display formats
- Composition Service: Generate optimal video compositions
Adaptive Playback and Packaging
High-quality adaptive streaming with multiple format support.
HLS (HTTP Live Streaming)
Multi-resolution adaptive streaming:
- Multi-resolution Support: 1080p, 720p, and 480p variants
- 5-second Segments: Optimal balance between quality and latency
- Master Playlist: Automatic playlist generation with bandwidth hints
- H.264 Encoding: Broad device compatibility
- AAC Audio: High-quality audio encoding
- Independent Segments: Enable efficient seeking and caching
Encoding Profiles:
- 1080p: 5000k video bitrate, 128k audio
- 720p: 3000k video bitrate, 128k audio
- 480p: 1500k video bitrate, 128k audio
DASH (Dynamic Adaptive Streaming over HTTP)
Alternative adaptive streaming format:
- MPD Manifest: Media Presentation Description generation
- H.264 Encoding: Consistent codec support
- Multi-bitrate Support: Automatic quality selection
WebM Support
Modern web format encoding:
- VP9 Video: Efficient video compression
- Opus Audio: High-quality audio codec
- WebP Images: Modern image format with superior compression
Thumbnail Generation
Comprehensive thumbnail support:
- Video Thumbnails: Up to 25 thumbnails per video (configurable interval)
- Spectrogram Thumbnails: For audio files
- Image Thumbnails: Scaled versions of images
- WebP Format: Efficient thumbnail storage
- Timestamp Association: Each thumbnail linked to video timestamp
Poster Images
High-quality poster generation:
- Video Posters: Extracted from video content using thumbnail algorithm
- Audio Posters: Spectrogram-based posters
- Image Posters: Scaled versions of images
- Optimal Sizing: 1280px width for video posters
Transcript Support
Text content extraction:
- Transcript Storage: Organized transcript file management
- API Integration: Ready for transcription service integration
- Structured Format: Plain text transcripts with metadata
Processing Features
Advanced processing capabilities:
- Asynchronous Processing: Background worker processing
- Error Handling: Comprehensive error tracking and reporting
- Status Tracking: Real-time processing status updates
- Requeue Support: Retry failed processing jobs
- Metadata Extraction: Duration, dimensions, aspect ratios
- Format Detection: Automatic media type detection
API Endpoints
RESTful API for accessing processed media:
- Media Metadata:
/api/v1/media/{media_id} - HLS Playback:
/api/v1/media/{media_id}/hls/{file_path} - DASH Playback:
/api/v1/media/{media_id}/dash/{file_path} - Thumbnail Access:
/api/v1/media/{media_id}/serve/thumbnail/{file_path} - Poster Access:
/api/v1/media/{media_id}/serve/poster/{file_path} - Transcript Access:
/api/v1/media/{media_id}/serve/transcript/{file_path}
Technology Stack
Backend
- FastAPI: Modern Python web framework
- PostgreSQL: Relational database with JSONB support
- Redis: Job queue and caching
- Alembic: Database migrations
- FFmpeg: Media transcoding
- HlsKit-Py: Advanced HLS processing (optional)
iOS
- Swift 5.9+: Modern Swift language features
- SwiftUI: Declarative UI framework
- AVFoundation: Media playback and processing
- CoreML: On-device machine learning
- Vision Framework: Computer vision tasks
- Combine: Reactive programming
Web
- Flask: Python web framework
- Tailwind CSS: Utility-first CSS framework
- JavaScript: Client-side interactivity
Deployment
The repository includes comprehensive deployment support:
- Docker: Multi-stage Dockerfiles for different components
- Docker Compose: Local development environment
- Nginx Configuration: Production-ready reverse proxy setup
- Deployment Scripts: Automated deployment workflows
- Multi-architecture Support: ARM64 and x86_64 support
Summary
The Qasar Media Server provides a complete solution for:
- Scalable Media Upload: Chunked uploads with resume support
- Intelligent Processing: Automated transcoding to multiple adaptive formats
- High-Quality Playback: HLS/DASH adaptive streaming with multi-resolution support
- AI-Powered Labeling: On-device CLIP and YOLO integration for automatic labeling
- Content Moderation: Label bank system for safety and compliance
- Easy Integration: Swift packages for seamless iOS integration
- Sample Applications: Complete reference implementations
Perfect for building modern media applications with advanced AI capabilities and professional-grade adaptive streaming.