Beginner15 minmediaeducationbusiness

Audio Transcription Pipeline with OpenAI Whisper

Transcribe audio files at scale with Pixeltable and OpenAI Whisper. Automatic batching, error handling, and incremental processing.

Quick Start View on GitHub Documentation

The Challenge

Transcribing large volumes of audio requires complex orchestration: file handling, API rate limiting, error retries, parallel processing, and result storage across separate systems.

The Solution

Pixeltable automates the entire transcription workflow. Add audio files to a table, and Whisper transcription runs as a computed column with built-in batching and error handling.

Implementation Guide

Step-by-step walkthrough with code examples

Step 1 of 2

Audio Table

Create a table for audio files with automatic transcription.

python

1import pixeltable as pxt
2from pixeltable.functions import openai
3
4# Audio processing table
5audio = pxt.create_table('app.audio', {
6    'audio_file': pxt.Audio,
7    'title': pxt.String,
8    'speaker': pxt.String,
9})
10
11# Automatic Whisper transcription
12audio.add_computed_column(
13    transcript=openai.transcriptions(
14        audio=audio.audio_file,
15        model='whisper-1'
16    )
17)
18
19# Insert files: transcription runs automatically
20audio.insert([
21    {'audio_file': '/recordings/meeting_01.mp3',
22     'title': 'Team Standup', 'speaker': 'All'},
23])

Every audio file is transcribed automatically on insert. Results are cached: re-inserting the same file is instant.

Use arrow keys to navigate

Key Benefits

90% reduction in transcription pipeline code

Built-in rate limiting and error handling

Scales from single files to thousands automatically

Incremental: only new audio files are transcribed

Real Applications

Podcast transcription and search

Meeting recording analysis

Customer call center analytics

Lecture and training content indexing

Prerequisites

Basic Python programming

OpenAI API key

Python 3.9+

OpenAI API key with Whisper access

Performance

Throughput

With automatic parallelization

100+ files/hr

Learn More

Whisper Transcription Guide

Detailed audio processing walkthrough

Build an end-to-end video analysis system with Pixeltable. Ingest video, extract frames, run multimodal AI models, generate embeddings, and enable semantic search, all as computed columns on a table.

Declarative AI Infrastructure: Define Pipelines, Not Plumbing

Replace thousands of lines of orchestration code with declarative computed columns. Pixeltable handles execution, dependencies, caching, and incremental updates automatically.

Multimodal AI Apps: Process Any Data Type in One System

Build applications that work with images, videos, audio, and documents simultaneously. Pixeltable treats all modalities as first-class column types with automatic cross-modal operations.

Ready to Get Started?

Install Pixeltable and start building in minutes. One pip install, no infrastructure to manage.

View on GitHub Quick Start Guide Starter Kit