
Fireworks Summer Audio Updates: Fastest Transcription now with Diarization and Batch API
By Yunyi Chi|5/12/2025
TL;DR
Last winter, Fireworks launched the market’s fastest Whisper-based speech transcription service (as measured by Artificial Analysis). Fireworks offers both:
- Pre-recorded transcription: Transcribe an audio file - an hour of audio can be transcribed in under 4 seconds! Great for use cases like meeting notes and lecture transcription.
- Streaming transcription: Create a WebSocket to stream audio live. Great for use cases like live captions and voice agents.
Since then we’ve been overwhelmed by the customer response. We’ve already seen a variety of innovative use cases built on top of Fireworks, ranging from drive-thru analytics to customer service evaluations. However, we believe the industry is just getting started with audio.
Today, we’re improving on our transcription service by introducing:
- Diarization: Unlock richer audio insights by identifying speakers on pre-recorded audio.
- Batch API: Unlock significantly higher rate limits and 40% lower pricing by using a Batch API that typically returns response within 24 hours — and for the rest of May, it’s 100 % FREE!
Diarization
Why Diarization
One of our most common feature requests for pre-recorded transcription was speaker diarization. Speaker diarization is the ability to identify speakers in audio. It’s crucial for a variety of use cases like providing detailed meeting note transcriptions or getting per-speaker analytics for phone calls.
Our Speaker Diarization feature is built for real-world scale and reliability, with:
- High concurrency and scalability – With high concurrency and scalable architecture, the system delivers linear scaling across long or high-volume audio recordings without sacrificing performance.
- Seamless integration – Outputs include speaker_id tags in verbose_json format, making it easy to plug into downstream applications like subtitles, speaker analytics, or dialogue segmentation.
- Accuracy optimized with smart controls – We've improved diarization quality via speaker count estimation and pause handling, and let users specify known speaker numbers to further boost accuracy.
Get Started
Get started with the diarization today in code through docs or try it in our UI playground.
To enable word-level speaker diarization, follow these steps:
- Set response_format=verbose_json, timestamp_granularities=word,segment.
- Set diarize=true to activate diarization.
- Use min_speakers, max_speakers to boost diarization accuracy (Optional).
These settings ensure that each word in the response includes a speaker_id field.
import requests
with open("audio.mp3", "rb") as f:
response = requests.post(
"https://audio-prod.us-virginia-1.direct.fireworks.ai/v1/audio/transcriptions",
headers={"Authorization": f"Bearer <YOUR_API_KEY>"},
files={"file": f},
data={
"model": "whisper-v3",
"temperature": "0",
"vad_model": "silero",
"response_format" : "verbose_json",
"timestamp_granularities" : "word,segment",
"diarize" : "true",
"min_speakers" : "1",
"max_speakers" : "2"
},
)
if response.status_code == 200:
print(response.json())
else:
print(f"Error: {response.status_code}", response.text)
Here's an example response for diarization.
We've added a speaker_id field to each word and segment to indicate who’s speaking. (Only part of the full response is shown below for brevity.)
Ready to try? Diarization normally includes a 40% compute surcharge — but we’re waiving that fee for the rest of May.
Batch API
Why Batch API
The second major update is Batch API. We’ve heard from both enterprise and public platform customers that:
- Rate limits can be constraining - By default, Fireworks serverless offers high rate limits of 200 for audio-prod and 400 for audio-turbo. However, some users want to process a large backlog of files all at once.
- Flexible latency and lower cost options - Previously, the general Fireworks audio transcription API assumes that customers prioritize latency and will drop requests when hitting rate limits. However, use cases like offline analytics don’t require real-time responses and users might prefer to have lower cost transcription or “fail gracefully” by providing slower responses when hitting rate limits.
We’re solving these problems with the introduction of our audio Batch API.
Why use the Batch API:
- The Batch API costs ~40% less than the typical API.
- Fireworks typically provides a response within 24 hours - Fireworks schedules the transcription job as capacity frees up on Fireworks compute.
- More tools are coming to help you manage and integrate batch requests more easily.
Get Started
The Batch API solution works as follows:
- Submit a request via the Batch API with your target endpoint and path.
- Poll the API to see if your response has been completed and retrieve result once ready.
- (Optional) Parse the response based on your original request format and its content_type.
For more details, see the Create Batch Request and Check Batch Status docs.
Now, here’s a simple code example to show how it works:
import requests
import time
import sys
FIREWORKS_API_KEY = "<YOUR_API_KEY>"
file_list = ["audio1.flac", "audio2.flac"] # List of files to submit
batch_ids = []
account_ids = []
# Submit requests
for idx, filename in enumerate(file_list):
with open(filename, "rb") as f:
response = requests.post(
"https://audio-batch.link.fireworks.ai/v1/audio/transcriptions?endpoint_id=audio-prod",
headers={"Authorization": f"Bearer {FIREWORKS_API_KEY}"},
files={"file": f},
data={
"model": "whisper-v3",
"temperature": "0",
"vad_model": "silero",
"response_format": "json"
},
)
if response.status_code != 200:
print(f"[Request {idx+1}] Error: {response.status_code}", response.text)
sys.exit(1)
data = response.json()
account_ids.append(data["account_id"])
batch_ids.append(data["batch_id"])
print(f"[Request {idx+1}] Batch submitted successfully.")
print(data)
# Wait before polling (you can adjust the delay as needed)
wait_seconds = 10
print(f"Waiting {wait_seconds} seconds before polling...")
time.sleep(wait_seconds)
# Poll for results
for i in range(len(file_list)):
response = requests.get(
f"https://audio-batch.link.fireworks.ai/v1/accounts/{account_ids[i]}/batch_job/{batch_ids[i]}",
headers={"Authorization": f"Bearer {FIREWORKS_API_KEY}"}
)
if response.status_code == 200:
print(f"[Result {i+1}]")
print(response.json())
else:
print(f"[Result {i+1}] Error: {response.status_code}", response.text)
To explore more use cases—like managing the progress of each batch request locally and parsing raw responses—check out this cookbook.
Ready to try? Batch API is 100 % FREE for the next two weeks—sign up now and start batch-processing your audio workloads (transcription, translation, and more) at scale.
Build on Fireworks
With speaker diarization and batch processing onboard, it’s even easier to build AI applications with Fireworks. Our audio service pairs with text inference and other modalities to drive compound AI pipelines for contact-center analytics, large-scale transcription, and media indexing—linking speech, text, imagery, and domain-specific models in one streamlined workflow.
Fireworks makes it easy to build AI applications on top of the fastest inference. Fireworks provides one place for:
- Build: Experiment with open models in seconds. Combine modals and modalities with tool use and constrained output generation
- Customize: Optimize your system for quality and speed with by tuning on your data to customize model behavior and performance
- Scale: Use the right deployment option for your traffic and enjoy production-reliability, data security and scalable costs
Keep in touch with us on Discord or Twitter. Stay tuned for more updates coming soon about Fireworks and audio!