Multimodal AI Data Infrastructure

The only open source Python library providing incremental storage, transformation, indexing, and orchestration of multimodal data.

$pip install pixeltable

Examples

ESSENTIALS

USER STORIES

ADVANCED

Key Features

Multimodal Storage
Images, videos, audio, docs
Incremental Updates
Only process what changed
Vector Search
Built-in similarity search
Versioning
Time travel & lineage
Monday 9:00 AM: New autonomous vehicle data arrives from fleet
2TB processed, 50K frames prioritized in 30 minutes vs. 3 days manually
ml-engineer_example.py
1# Connect to existing data sources without migration
2vehicles = pxt.create_table('fleet.raw_data', {
3 'video': pxt.Video, # S3 references, no data movement
4 'sensor_metadata': pxt.Json, # From existing RDBMS
5 'route_id': pxt.String,
6 'weather': pxt.String
7})
8
9# Import weekend's data (2TB of video + metadata)
10vehicles.insert_from_s3('s3://fleet-data/2025-01-06/')
11vehicles.sync_metadata_from_db('postgresql://fleet_db/sensor_readings')
12
13# Automatic frame extraction with YOLOX detection
14frames = pxt.create_view('fleet.frames', vehicles,
15 iterator=FrameIterator.create(video=vehicles.video, fps=1))
16
17frames.add_computed_column(
18 detections=yolox(frames.frame, model_id='yolox_l', threshold=0.6)
19)
20
21# Quality assessment for annotation priority
22@pxt.udf
23def annotation_priority(detections: dict, weather: str) -> float:
24 edge_cases = ['fog', 'rain', 'construction']
25 weather_mult = 2.0 if weather in edge_cases else 1.0
26 confidence_penalty = 1.0 - detections.get('avg_confidence', 0.8)
27 return weather_mult * confidence_penalty
28
29frames.add_computed_column(
30 priority=annotation_priority(frames.detections, vehicles.weather)
31)
32
33# Send high-priority frames to Label Studio
34high_priority = frames.where(frames.priority > 1.5)
35pxt.io.sync_label_studio_project(
36 ls_project_name='highway-fog-annotations',
37 view=high_priority,
38 config=label_studio_config
39)