Multimodal AIData Infrastructure

Explore the code behind Pixelbot, built on Pixeltable. See how declarative data infrastructure simplifies workflows like RAG, tool use, and multimodal data handling.

Multimodal AI Agent/

Try Live Now View Code

pixelbot

setup_pixeltable.py

Storage & Orchestration

endpoint.py

Web Server & API Endpoints

functions.py

UDFs and tool definitions

config.py

Model settings and configuration

requirements.txt

Dependencies

.env

Environment variables

Standard web app files (not Pixeltable-specific)

data

static

templates

logs

6 Core Files

Lineage, Versioning, Caching

Built-in

setup_pixeltable.py

GitHub Project

1pxt.create_dir("agents", if_exists="ignore")
2
3# === DOCUMENT PROCESSING ===
4documents = pxt.create_table(
5    "agents.collection",
6    {
7        "document": pxt.Document, 
8        "uuid": pxt.String, 
9        "timestamp": pxt.Timestamp, 
10        "user_id": pxt.String
11    },
12)
13chunks = pxt.create_view(
14    "agents.chunks",
15    documents,
16    iterator=DocumentSplitter.create(
17        document=documents.document,
18        separators="paragraph",
19        metadata="title, heading, page"
20    ),
21)
22chunks.add_embedding_index(
23    "text",
24    string_embed=sentence_transformer.using(
25        model_id=config.EMBEDDING_MODEL_ID
26    ),
27)
28
29@pxt.query
30def search_documents(query_text: str, user_id: str):
31    sim = chunks.text.similarity(query_text)
32    return (
33        chunks
34        .where((chunks.user_id == user_id) & (sim > 0.5))
35        .order_by(sim, asc=False)
36        .select(
37            chunks.text, 
38            source_doc=chunks.document, 
39            sim=sim
40        )
41        .limit(20)
42    )
43
44# === IMAGE PROCESSING ===
45images = pxt.create_table(
46    "agents.images",
47    {
48        "image": pxt.Image, 
49        "uuid": pxt.String, 
50        "timestamp": pxt.Timestamp, 
51        "user_id": pxt.String
52    },
53)
54images.add_computed_column(
55    thumbnail=pxt_image.b64_encode(
56        pxt_image.resize(images.image, size=(96, 96))
57    ),
58)
59images.add_embedding_index(
60    "image",
61    embedding=clip.using(model_id=config.CLIP_MODEL_ID),
62)
63
64# ... and so on for Video, Audio, Memory, Chat History, Personas, Image Generation ...
65
66# === AGENT WORKFLOW DEFINITION ===
67tools = pxt.tools(
68    functions.get_latest_news,
69    functions.fetch_financial_data,
70    search_video_transcripts,
71)
72
73tool_agent = pxt.create_table(
74    "agents.tools",
75    {
76        "prompt": pxt.String,
77        "timestamp": pxt.Timestamp,
78        "user_id": pxt.String,
79        "initial_system_prompt": pxt.String,
80        "final_system_prompt": pxt.String,
81        "max_tokens": pxt.Int,
82        "temperature": pxt.Float,
83    },
84)
85
86# === DECLARATIVE WORKFLOW WITH COMPUTED COLUMNS ===
87# Step 1: Initial LLM Reasoning (Tool Selection)
88tool_agent.add_computed_column(
89    initial_response=messages(
90        model=config.CLAUDE_MODEL_ID,
91        system=tool_agent.initial_system_prompt,
92        messages=[{"role": "user", "content": tool_agent.prompt}],
93        tools=tools,
94        tool_choice=tools.choice(required=True),
95    ),
96)
97
98# Step 2: Tool Execution
99tool_agent.add_computed_column(
100    tool_output=invoke_tools(tools, tool_agent.initial_response)
101)
102
103# Step 3: Context Retrieval (Parallel RAG)
104tool_agent.add_computed_column(
105    doc_context=search_documents(tool_agent.prompt, tool_agent.user_id)
106)
107# ... other context retrieval steps ...
108
109# Step 7: Final LLM Reasoning (Answer Generation)
110tool_agent.add_computed_column(
111    final_response=messages(
112        model=config.CLAUDE_MODEL_ID,
113        system=tool_agent.final_system_prompt,
114        messages=tool_agent.final_prompt_messages,
115    ),
116)
117
118# Step 8: Extract Final Answer Text
119tool_agent.add_computed_column(
120    answer=tool_agent.final_response.content[0].text
121)

Declarative AI pipeline: tables → embeddings → RAG → tools. One file replaces hundreds of lines of traditional ML code.

HOW IT WORKS

Declarative. Multimodal. Incremental.

Pixeltable automates storage, orchestration, incremental computation, & model execution. Focus on logic, not infrastructure.

Build

Transform

Retrieve

Serve

1. Unified Data Foundation

Natively manage diverse data types (images, videos, audio, docs, embeddings) without duplication. Persistent, versioned tables. Eliminate separate DBs/stores.

Native Multimodal Types (Image, Video, Audio, Doc, Embeddings, JSON)

No Data Duplication (Local/S3 References)

Persistent & Versioned Tables

Eliminates Separate DBs/Stores

pixeltable_workflow.py

Docs

1import pixeltable as pxt
2
3# Create a directory for your tables
4pxt.create_dir('demo_project')
5
6# Define table with image and text columns
7img_table = pxt.create_table(
8    'demo_project.images', 
9    {
10        'input_img': pxt.Image,
11        'raw_text': pxt.String  # For UDF example
12    }
13)
14
15# Insert data (paths or URLs and text)
16img_table.insert([
17    {
18        'input_img': 'image1.jpg', 
19        'raw_text': 'Text for image 1'
20    },
21    {
22        'input_img': 'image2.png', 
23        'raw_text': 'Text for image 2'
24    }
25])

BY THE NUMBERS

Unify Storage and Orchestration

pip install pixeltable→Your entire AI data stack

90%

Before: 1000+After: 100

Reduction in pipeline complexity

Simplify your AI data pipelines with declarative processing

75%

Before: MonthsAfter: Days

Faster development cycles

Accelerate your ML development with automated workflows

60%

Before: $10k/moAfter: $4k/mo

Lower infrastructure costs

Deploy serverless functions when you need them, all without leaving pixeltable

* Performance metrics based on typical use cases and internal benchmarks.

MULTIMODAL AI DEVELOPMENT

Build Production-Ready AI Applications

Accelerate your multimodal workflows with unified data infrastructure for AI.

Computer Vision

Automate complex CV workflows with unified data management and declarative Python.

frames.add_computed_column(
  objects=yolox(frames.frame)
)

Unified DatasetsDeclarative PythonAutomated PipelinesVersioning

RAG & Semantic Search

Build reliable RAG systems with auto-synced multimodal indexes, simplifying vector DB management.

docs.add_embedding_index(
  'content', embedding=clip
)

Auto-Syncing IndexMultimodal SearchSimplified Vector DBMetadata Filtering

Build AI Agents Faster

Unified infrastructure for agent data, state, and tools. Focus on agent logic, not plumbing.

@pxt.udf
def agent_tool(query: str):
  return process(query)

Unified InfraBuilt-in State/MemoryAny LLM/ToolCustom Python UDFs

View All Examples

A New Kind of Multimodal AI DatabaseStart building with Pixeltable today

Join ML engineers and data scientists using Pixeltable to build powerful multimodal AI applications with unified data management and orchestration.

View on GitHub Explore Documentation