Models Docs Pricing Blog Careers Contact Team Log in

© 2026 Fireworks AI All rights reserved.

Pages

Home Pricing Models Docs

Company

Legal

Terms of Service Privacy Policy

Featured Blogs

Agentic AI Systems

Agentic AI Systems

AI is evolving from passive responders into proactive agents that can perceive, reason, and act autonomously. We’re witnessing the rise of agentic systems-AI that goes beyond generating text responses to planning, executing, and learning across complex, multi-step tasks. This is a detailed blog about What Agentic AI Systems are, Agentic AI Design Patterns, and best practices.

Building an open-source Browser Agent on Fireworks AI

Building an open-source Browser Agent on Fireworks AI

5/16/2025

Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial

Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial

5/12/2025

Fireworks Summer Audio Updates: Fastest Transcription now with Diarization and Batch API

Fireworks Summer Audio Updates: Fastest Transcription now with Diarization and Batch API

5/12/2025

Qwen 3 on Fireworks AI: Controllable Chain-of-Thought and Tool Calling at Frontier Scale

Qwen 3 on Fireworks AI: Controllable Chain-of-Thought and Tool Calling at Frontier Scale

5/5/2025

Optimizing Llama 4 Maverick on Fireworks AI

Optimizing Llama 4 Maverick on Fireworks AI

4/28/2025

Mixture of Experts (MoE): Scaling AI with More Efficient Models

Mixture of Experts (MoE): Scaling AI with More Efficient Models

4/14/2025

Building Enterprise-Scale RAG Systems with Fireworks AI and MongoDB Atlas

Building Enterprise-Scale RAG Systems with Fireworks AI and MongoDB Atlas

4/9/2025

Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

3/19/2025

Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference

Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference

3/18/2025

Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost

Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost

3/12/2025

Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action

Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action

2/14/2025

DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical

DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical

2/7/2025

DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining

DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining

2/5/2025

From text to task: Constrained generation for structured extraction in R1

From text to task: Constrained generation for structured extraction in R1

2/1/2025

Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?

Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?

1/31/2025

Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient

Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient

1/30/2025

Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels

Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels

1/28/2025

Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI

Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI

1/28/2025

DeepSeek R1: All you need to know 🐳

DeepSeek R1: All you need to know 🐳

1/24/2025

Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality

Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality

1/23/2025

DeepSeek V3 just got vision capabilities!

DeepSeek V3 just got vision capabilities!

1/13/2025

Document inlining: Crossing the modality gap with Compound AI

Document inlining: Crossing the modality gap with Compound AI

12/20/2024

How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks

How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks

12/10/2024

How Upwork and Fireworks deliver faster, smarter proposals for freelancers

How Upwork and Fireworks deliver faster, smarter proposals for freelancers

11/20/2024

Fireworks f1: A breakthrough in complex reasoning with Compound AI

Fireworks f1: A breakthrough in complex reasoning with Compound AI

11/13/2024

Three projects, one platform: A developer's winning streak with Fireworks AI

Three projects, one platform: A developer's winning streak with Fireworks AI

11/13/2024

20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds

20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds

11/3/2024

FLUX.1 on Fireworks: Fast, frugal, and flexible

FLUX.1 on Fireworks: Fast, frugal, and flexible

10/13/2024

FireAttention V3: Enabling AMD as a viable alternative for GPU inference

FireAttention V3: Enabling AMD as a viable alternative for GPU inference

10/2/2024

Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

9/24/2024

How Enterprises are using Multimodal Models in production with Fireworks

How Enterprises are using Multimodal Models in production with Fireworks

9/24/2024

Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency

Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency

9/18/2024

FireOptimizer: Customizing latency and quality for your production inference workload

FireOptimizer: Customizing latency and quality for your production inference workload

8/22/2024

Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction

Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction

8/21/2024

Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

8/10/2024

How Fireworks evaluates quantization precisely and interpretably

How Fireworks evaluates quantization precisely and interpretably

8/1/2024

Introducing Llama 3.1 inference endpoints in partnership with Meta

Introducing Llama 3.1 inference endpoints in partnership with Meta

7/18/2024

Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

7/8/2024

How Cursor built Fast Apply using the Speculative Decoding API

How Cursor built Fast Apply using the Speculative Decoding API

6/23/2024

FireAttention V2: 12x faster to make Long Contexts practical for Online Inference

FireAttention V2: 12x faster to make Long Contexts practical for Online Inference

6/20/2024

Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost

Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost

6/17/2024

Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM

Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM

6/3/2024

GPUs on-demand: Not serverless, not reserved, but some third thing

GPUs on-demand: Not serverless, not reserved, but some third thing

6/3/2024

Code Generation with Large Language Models - Fireworks AI Take

Code Generation with Large Language Models - Fireworks AI Take

5/8/2024

Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell

Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell

5/6/2024

Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

4/18/2024

Getting Started with Stability’s API Powered by Fireworks

Getting Started with Stability’s API Powered by Fireworks

4/17/2024

Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI

Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI

3/21/2024

Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference

Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference

3/8/2024

Fireworks Platform Spring 2024 Updates

Fireworks Platform Spring 2024 Updates

3/1/2024

FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights

FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights

2/20/2024

Why do all LLMs need structured output modes?

Why do all LLMs need structured output modes?

2/20/2024

FireLLaVA: the first commercially permissive OSS LLaVA model

FireLLaVA: the first commercially permissive OSS LLaVA model

1/18/2024

FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

1/8/2024

Fireworks Raises the Quality Bar with Function Calling Model and API Release

Fireworks Raises the Quality Bar with Function Calling Model and API Release

12/20/2023

Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

12/14/2023

LLM Inference Performance Benchmarking (Part 1)

LLM Inference Performance Benchmarking (Part 1)

11/3/2023

New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!

New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!

11/2/2023

Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance

Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance

10/27/2023

Accelerating Code Completion with Fireworks Fast LLM Inference

Accelerating Code Completion with Fireworks Fast LLM Inference

10/11/2023

Fireworks.ai Now Available on LangChain Prompt Playground

Fireworks.ai Now Available on LangChain Prompt Playground

10/2/2023

Simplifying Code Infilling with Code Llama and Fireworks.ai

Simplifying Code Infilling with Code Llama and Fireworks.ai

9/12/2023

Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning

Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning

8/29/2023

Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform

Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform

8/17/2023

Multi-Query Attention is All You Need

Multi-Query Attention is All You Need

7/12/2023