Multimodal DataMade Simple
Every team building with video, audio, images, and documents stitches together 5–8 services. The real cost isn't the wiring — it's the lost feedback loop. Pixeltable ends that.
Your data is scattered.
IngestYou need to know where your data is, where it came from, and where it goes — as you build and as you grow. That requires integrity, not more glue.
The only system where video, audio, images, and documents are first-class column types — not opaque blobs. Schema, versioning, and lineage from the moment data enters. Your files stay where they are.
1media = pxt.create_table('app.media', {2 'video': pxt.Video,3 'doc': pxt.Document,4 'meta': pxt.Json5})67media.insert([8 {'video': 's3://bucket/demo.mp4'},9 {'doc': '/local/report.pdf'}10])
Insert a row → every downstream step triggers automatically. One system of record for all modalities.
Your time is wasted.
ProcessOn ETL, caching, retries, type safety — not on the AI logic that matters. Pixeltable handles the plumbing so you can focus on what you're building.
Your infrastructure doesn't compound.
ShipShipping is easy. Evolving is hard. Every iteration should make your system smarter — that's the competitive advantage glue code can never provide.
Free & open source · Apache 2.0 · No account required
See It In Action
Define your entire data processing and AI workflow declaratively using computed columns on tables. Focus on your application logic, not the data plumbing.
1# Video intelligence — ingest, extract, enrich, index, query2import pixeltable as pxt3from pixeltable.iterators import FrameIterator4from pixeltable.functions import yolox, gemini, whisper, twelvelabs5from pixeltable.functions.huggingface import clip, sentence_transformer67# 01 Ingest — native multimodal types8videos = pxt.create_table('app.videos', {9 'video': pxt.Video,10 'title': pxt.String,11})1213# 02 Extract — frames, audio, transcript (automatic on insert)14frames = pxt.create_view('app.frames', videos,15 iterator=FrameIterator.create(video=videos.video, fps=1)16)17videos.add_computed_column(audio=videos.video.extract_audio())18videos.add_computed_column(19 transcript=whisper.transcribe(videos.audio, model='base')20)2122# 03 Enrich — Gemini multimodal + YOLOX + custom UDF23@pxt.udf24def label_scene(detections: list[dict], description: str) -> str:25 objects = [d['class'] for d in detections[:5]]26 return f"{description} | objects: {', '.join(objects)}"2728videos.add_computed_column(29 description=gemini.generate_content(30 [videos.video, 'Describe this video in one sentence.'],31 model='gemini-2.5-flash'32 )33)34frames.add_computed_column(35 detections=yolox(frames.frame, model_id='yolox_s')36)37frames.add_computed_column(38 label=label_scene(frames.detections, videos.description)39)4041# 04 Index — CLIP, MiniLM, Twelve Labs — always in sync42frames.add_embedding_index('frame',43 image_embed=clip.using(model_id='openai/clip-vit-base-patch32'))44frames.add_embedding_index('label',45 string_embed=sentence_transformer.using(model_id='all-MiniLM-L6-v2'))46videos.add_embedding_index('video',47 embedding=twelvelabs.embed.using(model_name='marengo3.0'))4849# 05 Query — similarity + metadata filtering in one expression50@pxt.query51def find_scenes(query_text: str, ref_image: pxt.Image, title: str):52 text_sim = frames.label.similarity(query_text)53 img_sim = frames.frame.similarity(ref_image)54 return (frames55 .where(videos.title.contains(title))56 .order_by(text_sim + img_sim, asc=False)57 .limit(10)58 .select(frames.frame, frames.label)59 )
| Column | Type | Computed With |
|---|---|---|
| video | Video | |
| title | String | |
| audio | Audio | extract_audio() |
| transcript | Json | whisper(base) |
| description | String | gemini(video) |
| embedding_index | Twelve Labs (video) | |
| Column | Type | Computed With |
|---|---|---|
| frame | Image | FrameIterator |
| detections | Json | yolox(frame) |
| label | String | @pxt.udf |
| embedding_index | CLIP (image) | |
| embedding_index | MiniLM (text) | |
Insert a video → Gemini, YOLOX, Whisper + custom @pxt.udf as computed columns → CLIP, MiniLM, Twelve Labs indexes → query via @pxt.query. Experiment to production.
From Raw Data to Production
Deploy as a full backend or a sidecar to your existing stack.
Watch & Learn
Tutorials, conference talks, and deep-dives from the Pixeltable team.
Developing with AI Tools
Pixeltable's declarative API means AI coding assistants get it right on the first try. Ten lines of code gives you a persistent, versioned, incrementally-optimized pipeline.
Your Backend for Multimodal AI
pip install pixeltableYour entire AI data stack| Instead of ... | Pixeltable gives you ... |
|---|---|
| PostgreSQL / MySQL | pxt.create_table()— schema is Python, versioned automatically |
| Pinecone / Weaviate / Qdrant | add_embedding_index()— one line, stays in sync |
| S3 / boto3 / blob storage | pxt.Image / Video / Audio / Document— native types with caching |
| Airflow / Prefect / Celery | Computed columns— trigger on insert — no orchestrator needed |
| LangChain / LlamaIndex (RAG) | @pxt.query + .similarity()— computed column chaining |
| pandas / polars (multimodal) | .sample(), add_computed_column()— prototype to production |
| DVC / MLflow / W&B | history(), revert(), time travel— built-in snapshots |
| Custom retry / rate-limit / caching | Built into every AI integration— results cached, only new rows recomputed |
What Can You Build?
Three paths from day one — all built on the same table abstraction, same pip install
Everything You Need to Know
Common questions about building with Pixeltable
Every Era of DataGets an Owner
Oracle for relational. Snowflake for analytics. Databricks for batch.
The multimodal data plane is next.
Colab


