← Back to Blog
Thursday, November 21 2024

The AI Data Insfrastructure Should Be Declarative, Multimodal, and Incremental

Transform how you build AI applications with Pixeltable, an open-source framework that unifies data storage, versioning, and model orchestration under a simple table interface. Instead of wrestling with data pipelines and infrastructure plumbing, you can focus on what matters: building amazing AI applications that leverage your models and algorithms while Pixeltable handles all the undifferentiated heavy lifting.

Think of how web developers use relational databases like Postgres or Snowflake. The database provides a simple table interface while handling complex operations behind the scenes:

  • File storage and indexing
  • Transaction management
  • Query optimization
  • Converting declarative queries into file operations

This frees developers to focus on application logic and data modeling rather than infrastructure plumbing.

Pixeltable’s Approach

Pixeltable follows a similar philosophy but extends it to AI workflows. It provides a familiar table interface that encompasses both data and transformation logic

  • You control:
    • What transformations to apply
    • Which models to use
    • How to structure your algorithms
  • Pixeltable handles:
    • Orchestration
    • Transactional storage
    • Data retrieval
    • Incremental updates

The interface is easily extensible through user-defined functions (UDFs).

Video Processing Example

This approach shines with video processing. Pixeltable includes built-in functionality for:

  • Video frame extraction
  • Audio separation
  • Frame sampling at specified rates
  • Automated metadata extraction

You can extend this through:

  • Custom object detection models
  • Integration with specialized video processing libraries

For example:

# Create a video table
videos = pxt.create_table('videos', {'video': pxt.Video})

# Create a frames view with automatic frame extraction
frames = pxt.create_view(
    'frames',
    videos,
    iterator=FrameIterator.create(video=videos.video, fps=1)
)

# Add object detection
frames['detections'] = yolox(frames.frame, model_id='yolox_s')

Image Processing Example

Similarly for image processing. Pixeltable provides:

  • Image loading and storage
  • Basic transformations (resize, crop, rotate)
  • Efficient batch processing
  • Automatic format handling

You can extend with:

  • Custom computer vision models
  • Image segmentation algorithms
  • Feature extraction pipelines
  • Integration with deep learning frameworks

For example:

# Create an image table
images = pxt.create_table('images', {'image': pxt.Image})

# Add embeddings for similarity search
images['embedding'] = clip_image(images.image, model='openai/clip-vit-base-patch32')

# Create an embedding index
images.add_embedding_index('embedding', metric='cosine')

Key Benefits

  1. Simplified Data Management:
    • No need to manage frame extraction scripts
    • Automatic handling of video/image formats
    • Built-in versioning and lineage tracking
  2. Efficient Processing:
    • Intelligent caching of processed frames
    • Incremental updates for model outputs
    • Parallel processing where possible
  3. Declarative Interface:
    • Express complex video/image pipelines in simple table operations
    • Chain transformations easily
    • Query processed results efficiently
  4. Development Focus:
    • You maintain full control over domain-specific logic where your product adds value
    • Pixeltable handles the undifferentiated heavy lifting of data operations
    • Focus on innovation rather than infrastructure

For example, a complete workload might look like:

# Define custom processing logic
@pxt.udf
def custom_video_analytics(frame: PIL.Image.Image) -> dict:
# Your specialized analysis code here
    return results

# Manage data
videos = pxt.create_table('videos', {'video': pxt.VideoType()})
frames = pxt.create_view('frames', videos,
    iterator=FrameIterator.create(video=videos.video, fps=1))

# Add computed columns
frames['detections'] = yolox(frames.frame)# Object detection

frames['analytics'] = custom_video_analytics(frames.frame) # Custom analysis

This pipeline automatically handles:

  • Frame extraction and caching
  • Parallel processing where possible
  • Incremental updates when new videos are added
  • Efficient storage and retrieval
  • Search capabilities

While you focus on:

  • Detection algorithms
  • Analytics logic
  • Model selection and tuning
  • Business-specific requirements

The result is a more efficient development process that lets you focus on your core competencies while Pixeltable handles the complex data management infrastructure.