Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

By |3/19/2025

At Fireworks, our mission is to empower developers with the premier toolchain using open models, delivering transparency, steerability, control, privacy, low latency, and cost efficiency.

As agentic products continue gaining widespread adoption, the speed and efficiency of advanced AI models like DeepSeek R1 have become critical factors for product differentiation. Staying ahead, we continuously push the boundaries of performance and cost-efficiency through innovations like our specialized version of FireAttention and a distributed inference engine tailored specifically for DeepSeek’s unique MLA, MTP, and wide MoE architecture.

Introducing New Deployment Options

Today, we're thrilled to announce exciting new options for deploying DeepSeek on Hopper GPUs, enhancing both speed and throughput. Expect even more advancements as we soon bring Blackwell GPUs into production.

Explore Our Optimized DeepSeek Offerings:

1. Ultra-Fast DeepSeek R1

Speeds reaching up to 130 tokens per second at low batch sizes on Fireworks Enterprise
Ideal for real-time, low-latency interactive experiences at scale

2. Fast DeepSeek R1

Speeds up to 90 tokens per second on Fireworks Serverless
Perfect balance between speed and cost-efficiency for real-time interactive experiences
Note: Speeds may vary with load on shared Serverless deployments

3. Basic DeepSeek R1

Optimized for throughput and cost-effectiveness
Matches standard DeepSeek pricing ($0.55/$2.19 per million tokens)
Ideal for cost-sensitive, real-time use cases without compromising model quality

Screenshot_2025-03-19_at_9.33.44_AM.png

Comprehensive Developer Platform

These enhancements build on our extensive developer platform capabilities:

👉 Secure Hosting: DeepSeek hosted securely in the US and EU, with zero data retention by default.

👉 Model Quality & Customization:

Fine-tuning DeepSeek R1 and V3 through quantization-aware tuning
Controllable reasoning effort: shorter, optimized Chain-of-Thought (CoT) with reasoning_effort = low
Additional specialized models, such as Perplexity R1-1776, offering heightened accuracy for deep research, alongside numerous tuned DeepSeek models already in production.

👉 Agentic Development Capabilities:

Multi-modal workflow: vision capabilities integrated into DeepSeek v3 and R1
Seamless agentic tool use: function-calling support on DeepSeek v3, facilitating easy integrations with external tools and APIs
Constrained generation capabilities: JSON mode and Grammar mode support on DeepSeek v3 and R1

Get Started Today!

Experience the power, speed, and efficiency of the enhanced DeepSeek offerings on the Fireworks AI Developer Cloud. Accelerate your AI development with unmatched control and performance.

👉 Sign up now to explore Fireworks AI Developer Cloud.