How Pixeltable Works
Store multimodal data in tables, define AI workflows as computed columns, and query everything together. Pixeltable handles orchestration, caching, and model execution automatically.
Build Your First AI Workflow
Four steps to create powerful multimodal AI applications with Pixeltable's declarative approach.
Create Tables with Multimodal Types
The Declarative Foundation
Define tables for any data type — images, videos, documents, structured data — in a single schema. Add computed columns that transform your data using Python expressions. Pixeltable orchestrates computation automatically for all existing and future rows.
- Unified schema for structured, unstructured, and multimodal data
- Computed columns run automatically — define once, never re-run manually
- Pixeltable manages the dependency graph and incremental updates
1import pixeltable as pxt23# Create a table with typed columns4t = pxt.create_table('films', {5 'name': pxt.String,6 'revenue': pxt.Float,7 'budget': pxt.Float,8}, if_exists="replace")910t.insert([11 {'name': 'Inside Out', 'revenue': 800.5, 'budget': 200.0},12 {'name': 'Toy Story', 'revenue': 1073.4, 'budget': 200.0},13])1415# Computed column — auto-calculated for every row16t.add_computed_column(profit=(t.revenue - t.budget))1718# +------------+--------+19# | name | profit |20# +------------+--------+21# | Inside Out | 600.5 |22# | Toy Story | 873.4 |23# +------------+--------+
Add AI Models as Computed Columns
Bring Your Own Model
Wrap any Python function — data cleaning, model inference, API calls — in a @pxt.udf decorator. It becomes a reusable pipeline component. Pixeltable handles parallelization, caching, and dependency resolution automatically.
- @pxt.udf turns any Python function into a pipeline component
- Integrate any model: HuggingFace, OpenAI, custom PyTorch, etc.
- Pixeltable caches results and only recomputes when inputs change
1import PIL2import pixeltable as pxt34# Any Python function becomes a pipeline component5@pxt.udf6def detect(image: PIL.Image.Image) -> list[str]:7 from yolox.models import Yolox8 from yolox.data.datasets import COCO_CLASSES9 model = Yolox.from_pretrained("yolox_s")10 result = model([image])11 return [COCO_CLASSES[label] for label in result[0]["labels"]]1213# Apply as a computed column — runs for every row14t.add_computed_column(classification=detect(t.image))1516# +----------------------+------------------+17# | image | classification |18# +----------------------+------------------+19# | <Image: cat.jpg> | ['cat', 'couch'] |20# | <Image: birds.png> | ['bird'] |21# +----------------------+------------------+
Search Across All Data Types
Built-in Vector Search
Add embedding indexes with one line — no separate vector database needed. Pixeltable generates embeddings, stores them co-located with your data, and keeps them automatically in sync. Search by text, image, or any modality.
- No separate vector DB: embeddings live next to your data
- Automatic sync: embeddings update when source data changes
- Cross-modal search: text-to-image, image-to-image, and more
1import pixeltable as pxt2from pixeltable.functions.huggingface import clip34# Add a CLIP embedding index — one line5images.add_embedding_index(6 'img',7 embedding=clip.using(model_id='openai/clip-vit-base-patch32')8)910# Text-to-image similarity search11query_text = "a dog playing fetch"12sim = images.img.similarity(query_text)13results = images.order_by(sim, asc=False).limit(5).collect()1415# Image-to-image search works the same way16query_image = 'https://example.com/dog.jpg'17sim = images.img.similarity(query_image)18results = images.order_by(sim, asc=False).limit(5).collect()
Build Incremental RAG Pipelines
Incremental & Automated
Define the entire RAG workflow — chunking, embedding, retrieval, generation — declaratively. Pixeltable orchestrates it end-to-end. Because the pipeline is incremental, only new or updated documents get processed. Your RAG system stays efficient and always up-to-date.
- Declarative RAG: chunking → embedding → retrieval → LLM in a few lines
- Incremental: only new/updated documents get processed
- Everything co-located: source docs, chunks, embeddings, Q&A — all in sync
1import pixeltable as pxt2from pixeltable.functions import openai, huggingface3from pixeltable.iterators import DocumentSplitter45# 1. Source documents6docs = pxt.create_table('docs', {'doc': pxt.Document})7docs.insert([{'doc': 's3://my-data/annual-report.pdf'}])89# 2. Auto-chunk into sentences10chunks = pxt.create_view('chunks', docs,11 iterator=DocumentSplitter.create(12 document=docs.doc, separators='sentence'))1314# 3. Embedding index15embed = huggingface.sentence_transformer.using(model_id='all-MiniLM-L6-v2')16chunks.add_embedding_index('text', string_embed=embed)1718# 4. Retrieval function19@pxt.query20def get_context(query_text: str):21 sim = chunks.text.similarity(query_text)22 return chunks.order_by(sim, asc=False).limit(5)2324# 5. RAG pipeline — ask a question, get an answer25qa = pxt.create_table('qa', {'prompt': pxt.String})26qa.add_computed_column(context=get_context(qa.prompt))27qa.add_computed_column(28 answer=openai.chat_completions(29 model='gpt-4o-mini',30 messages=[{31 'role': 'user',32 'content': qa.context.text + '\nQuestion: ' + qa.prompt33 }]34 ).choices[0].message.content35)3637# Insert a question — pipeline runs automatically38qa.insert([{'prompt': 'Key takeaways from the report?'}])