Beginner10 mintechnologyresearchenterprise

Incremental Updates: Save 70% on AI Compute Costs

Stop reprocessing entire datasets when one row changes. Pixeltable tracks dependencies and recomputes only what's necessary — saving compute, time, and money.

The Challenge

Traditional AI pipelines reprocess entire datasets when anything changes. Adding one document to 100,000 existing ones triggers a full re-embedding, re-indexing, and re-inference run — wasting hours of compute time and significant API costs.

The Solution

Pixeltable provides intelligent incremental updates. The system tracks which rows changed, identifies affected computed columns, and recomputes only the minimum necessary. Embedding indexes are updated incrementally too.

Implementation Guide

Step-by-step walkthrough with code examples

Step 1 of 1

Automatic Incrementality

See how Pixeltable processes only what's changed.

python
1import pixeltable as pxt
2from pixeltable.functions import openai
3
4# Create table with expensive AI processing
5docs = pxt.create_table('app.docs', {
6 'document': pxt.Document,
7 'title': pxt.String,
8})
9
10# Expensive embedding generation
11docs.add_computed_column(
12 embedding=openai.embeddings(
13 docs.title, # Simplified for demo
14 model='text-embedding-3-large'
15 )
16)
17
18docs.add_embedding_index('title', embedding=docs.embedding)
19
20# Initial load: processes all 100K documents
21docs.insert([...hundred_k_documents])
22
23# Add 1 new document: processes ONLY the new one
24docs.insert([{'document': 'new.pdf', 'title': 'Latest Report'}])
25# ✅ 1 embedding generated, not 100,001
26
27# Update a title: recomputes ONLY that row's embedding
28docs.update(
29 {'title': 'Updated Report'},
30 where=docs.title == 'Latest Report'
31)
32# ✅ 1 row recomputed, index updated incrementally
Incrementality is automatic — no configuration needed. The engine tracks dependencies at the row level.

Key Benefits

70% reduction in compute costs on iterative workflows
95% time savings on incremental updates
Automatic dependency tracking across all computed columns
Embedding indexes updated incrementally
No manual change-detection or batch-management code

Real Applications

Large-scale document processing with frequent updates
Continuously updating ML training data
Real-time RAG systems with growing knowledge bases
Production pipelines with daily data ingestion

Prerequisites

Basic understanding of AI pipelines
Python programming
Python 3.9+
Understanding of data pipelines

Performance

Cost Reduction
Average compute cost savings
70%
Update Speed
vs full reprocessing
95% faster

Ready to Get Started?

Install Pixeltable and start building in minutes. One pip install, no infrastructure to manage.