Multimodal AI Data Infrastructure

The only open source Python library providing incremental storage, transformation, indexing, and orchestration of multimodal data.

$pip install pixeltable

Try 10-Min Tutorial View Documentation

Examples

ESSENTIALS

USER STORIES

ADVANCED

Key Features

Multimodal Storage

Images, videos, audio, docs

Incremental Updates

Only process what changed

Vector Search

Built-in similarity search

Versioning

Time travel & lineage

Monday 9:00 AM: New autonomous vehicle data arrives from fleet

2TB processed, 50K frames prioritized in 30 minutes vs. 3 days manually

ml-engineer_example.py

1# Connect to existing data sources without migration
2vehicles = pxt.create_table('fleet.raw_data', {
3    'video': pxt.Video,           # S3 references, no data movement
4    'sensor_metadata': pxt.Json,  # From existing RDBMS
5    'route_id': pxt.String,
6    'weather': pxt.String
7})
8
9# Import weekend's data (2TB of video + metadata)
10vehicles.insert_from_s3('s3://fleet-data/2025-01-06/')
11vehicles.sync_metadata_from_db('postgresql://fleet_db/sensor_readings')
12
13# Automatic frame extraction with YOLOX detection
14frames = pxt.create_view('fleet.frames', vehicles,
15    iterator=FrameIterator.create(video=vehicles.video, fps=1))
16
17frames.add_computed_column(
18    detections=yolox(frames.frame, model_id='yolox_l', threshold=0.6)
19)
20
21# Quality assessment for annotation priority
22@pxt.udf
23def annotation_priority(detections: dict, weather: str) -> float:
24    edge_cases = ['fog', 'rain', 'construction']
25    weather_mult = 2.0 if weather in edge_cases else 1.0
26    confidence_penalty = 1.0 - detections.get('avg_confidence', 0.8)
27    return weather_mult * confidence_penalty
28
29frames.add_computed_column(
30    priority=annotation_priority(frames.detections, vehicles.weather)
31)
32
33# Send high-priority frames to Label Studio
34high_priority = frames.where(frames.priority > 1.5)
35pxt.io.sync_label_studio_project(
36    ls_project_name='highway-fog-annotations',
37    view=high_priority,
38    config=label_studio_config
39)