Back to companies

Fireworks AI

🇺🇸

Fireworks AI, Inc.

GrowthRedwood City, California, United Statesfireworks.ai/

Total funding$327M

ConfidenceHigh 30Medium 38Low 1

Company info

Full nameFireworks AI, Inc.

Founded2022年

HeadquartersRedwood City, California, United States

Websitefireworks.ai/

Region🇺🇸 United States

StageGrowth

Employees11-50（LinkedIn band

Report date2026-03-10

Overview

面向企业生产环境的 AI inference cloud，聚焦开源模型的高性能推理、微调与可控部署。

Industry tags

AI InfrastructureGenerative AIEnterprise AI Platform

Key people

NameRole

Lin QiaoCo-Founder, CEO

Dmytro DzhulgakovCo-Founder, CTO

Benny ChenCo-Founder

Chenyu ZhaoCo-Founder

Dmytro IvchenkoCo-Founder

James ReedCo-Founder

Pawel GarbackiCo-Founder

Core products and services

Inference Platform

Fireworks Inference Cloud

提供 serverless、on-demand、enterprise 多种部署形态的开源模型推理云。

High confidence · 2 sources · 2+ independent authoritative sources

API

Serverless Inference API

OpenAI-compatible API，覆盖 text/image/audio/multimodal 推理调用。

High confidence · 2 sources · 2+ independent authoritative sources

Infrastructure Service

Dedicated / On-demand GPU Deployments

提供 dedicated 与 on-demand GPU 部署，满足稳定 SLA 与成本弹性。

High confidence · 2 sources · 2+ independent authoritative sources

Model Customization

Fine-tuning + LoRA

支持 LoRA 与模型训练/服务一体化定制流程。

High confidence · 2 sources · 2+ independent authoritative sources

Inference Optimization

FireOptimizer / FireAttention

通过 speculative decoding 与 long-context kernel 优化延迟与吞吐。

High confidence · 2 sources · 2+ independent authoritative sources

Function-calling Model

Firefunction-v2

面向 compound AI 与 agent 工作流的 open-weight function-calling 模型。

High confidence · 2 sources · 2+ independent authoritative sources

Funding history

Total funding 超过$327M（截至2025年10月官方口径）

Date	Round	Amount	Valuation	Investors	Confidence
2022年	Seed（reported）	$25M	未披露	Benchmark, Sequoia Capital, Angel investors	Medium confidence · 1 sources · Single authoritative source
2024年03月	Series A（reported）	$25M	未披露	Benchmark, Sequoia Capital, Angel investors	Medium confidence · 1 sources · Single authoritative source
2024年07月	Series B	$52M	$552M	Sequoia Capital, NVIDIA, AMD, MongoDB Ventures	High confidence · 2 sources · 2+ independent authoritative sources
2025年10月	Series C	$250M	$4B	Lightspeed, Index Ventures, Evantic, Sequoia Capital	High confidence · 2 sources · 2+ independent authoritative sources
2025年10月	Series C（primary + secondary structure disclosed）	Included in $250M total（component amounts undisclosed）	$4B	官方披露本轮含 primary 与 secondary 结构, 云启资本	High confidence · 2 sources · 2+ independent authoritative sources
2025年10月	Strategic Investment	Included in Series C	$4B	NVIDIA, AMD, MongoDB, Databricks	High confidence · 2 sources · 2+ independent authoritative sources

Product release timeline

2025年10月Medium confidence · 1 sources · Single authoritative source

Agent workflow integration AWS AgentCore integration

发布与 AWS AgentCore 的生产级 agent 集成路径。

2025年10月Medium confidence · 1 sources · Single authoritative source

RAG APIs Embeddings + Reranking endpoints

发布基于 Qwen3 的 embeddings 与 reranking API。

2025年08月Medium confidence · 1 sources · Single authoritative source

DeepSeek endpoints DeepSeek V3.1

上线 DeepSeek V3.1 端点与混合推理模式。

2025年06月Medium confidence · 1 sources · Single authoritative source

Fine-tuning Reinforcement Fine-Tuning (Beta)

发布 RFT Beta，面向专家模型训练。

2025年04月Medium confidence · 1 sources · Single authoritative source

Llama 4 endpoint Llama 4 Maverick day-1

上线 Llama 4 Maverick day-1 推理服务。

2025年03月Medium confidence · 1 sources · Single authoritative source

NVIDIA NIM support NIM deployments on Fireworks

支持 NVIDIA NIM 微服务在 Fireworks 部署。

2024年08月Medium confidence · 1 sources · Single authoritative source

FireOptimizer Adaptive speculative execution

发布推理优化能力，用于延迟/质量平衡。

2024年07月Medium confidence · 1 sources · Single authoritative source

Llama 3.1 endpoints 8B/70B/405B

作为 launch partner 上线 Llama 3.1（含405B）端点。

2024年06月Medium confidence · 1 sources · Single authoritative source

Custom model + deployment Custom model import + on-demand H100

支持自定义模型导入与 H100 on-demand 部署。

2024年06月Medium confidence · 1 sources · Single authoritative source

GPUs on-demand Commercial rollout

on-demand GPU 产品层面商业化扩展。

2024年06月High confidence · 2 sources · 2+ independent authoritative sources

Firefunction v2

发布 Firefunction-v2，强化 function-calling 与 compound AI 工作流。

2024年06月High confidence · 2 sources · 2+ independent authoritative sources

FireAttention V2

发布 FireAttention V2，提升 long-context 推理效率。

2024年04月High confidence · 2 sources · 2+ independent authoritative sources

Llama 3 endpoints Llama 3 8B/70B

上线 Llama 3 推理端点并支持配套微调路径。

2024年03月Medium confidence · 1 sources · Single authoritative source

RAG integration stack MongoDB Atlas + Fireworks workflow

发布 MongoDB Atlas 与 Fireworks API 的 RAG 集成方案。

Key events

2026

被 NVIDIA 列为 Blackwell 生态中的领先 inference provider。

2025

签署 AWS Strategic Collaboration Agreement，并获得 AWS Generative AI Competency。

与 AWS GenAI Innovation Center Startup Team 启动联合加速计划。

与 AWS AgentCore 完成生产环境集成。

完成 Series C（含 primary + secondary 结构），公司仍未上市。

Series C 后估值提升至 $4B。

与 Sentient 的高规模部署合作进入公开案例。

2024

完成 Series B 融资，企业仍为 private company。

融资后估值达到 $552M。

宣布与 MongoDB 在企业 RAG 架构上的合作。

媒体报道其开源 API 市场定位，并披露早期约 $25M 融资背景。

2022

公司由前 PyTorch/Meta 背景团队创立，定位于 inference infrastructure。

Competitive landscape

Together AI (Serverless/Dedicated Inference + Fine-Tuning)

— 以 usage-priced token API 与 dedicated inference 组合做 GTM，强调开源模型目录和模型定制能力；在开源模型托管与企业推理场景与 Fireworks AI 正面重叠。 [S1](https://www.together.ai/pricing) [S2](https://sacra.com/c/fireworks-ai/)

Baseten (Model APIs + Dedicated Deployments)

— 采用按量 + enterprise 计划，主打部署运维和模型 API 封装，面向生产推理团队；与 Fireworks 在生产级 inference 工作负载争夺同类客户。 [S1](https://www.baseten.co/pricing/) [S2](https://sacra.com/c/fireworks-ai/)

GroqCloud (Token-as-a-Service Inference)

— 依托自研芯片与线性 token 定价、cache economics，强调极致速度与可预测成本；在低延迟和 cost-per-token 叙事上直接对标 Fireworks。 [S1](https://groq.com/pricing) [S2](https://sacra.com/c/fireworks-ai/)

DeepInfra (Inference API Infrastructure)

— 采用按 token/执行时间计费，面向成本敏感型开源模型推理需求；并与 Fireworks 同属 NVIDIA Blackwell 生态对比语境中的 provider 阵营。 [S1](https://deepinfra.com/pricing) [S2](https://blogs.nvidia.com/blog/inference-open-source-models-blackwell-reduce-cost-per-token/)

Anyscale (Ray Platform (pay-as-you-go))

— 以 Ray-native 基础设施和按量计费切入，强调更深层运行时控制与可扩展 serving；在需要比 API-only 更深可编排能力的企业场景中与 Fireworks 形成间接竞争。 [S1](https://www.anyscale.com/pricing) [S2](https://northflank.com/blog/7-best-fireworks-ai-alternatives-for-inference)

Modal (Serverless Compute for AI Workloads)

— 以 serverless autoscaling GPU/CPU 和按使用计费服务高灵活性工作负载；在自定义推理与工程可编程性场景可替代专用 inference 平台，从而与 Fireworks 竞争预算与技术选型。 [S1](https://modal.com/pricing) [S2](https://northflank.com/blog/7-best-fireworks-ai-alternatives-for-inference)

Growth metrics

Daily tokens processed10T+ tokens/day—2025年10月

API reliability99.99% uptime—2024年07月

Throughput per GPU (Sentient workload)+25% to +50%—2025年07月

Sentient launch usage1.5M responses in 5 days; 90K unique users—2025年07月

Sentient weekly query volume5.6M queries/week; ~13 QPS—2025年07月

Company customers10,000+ companies10x vs Series B period（official claim）2025年10月

Annualized revenue run-rate$280M+—2025年10月

Competitive narrative

Differentiators

创始团队具备 PyTorch 核心背景，并将 FireAttention/FireOptimizer 等内核级优化作为性能护城河。 [S1](https://fireworks.ai/blog/fireattention-v2-long-context-inference) [S2](https://fireworks.ai/blog/fireoptimizer)

产品形态覆盖 serverless、dedicated、on-demand 与 enterprise 交付，企业级部署与可控性更强。 [S1](https://fireworks.ai/) [S2](https://fireworks.ai/blog/why-gpus-on-demand)

同时提供 inference + fine-tuning + evaluation/workflow，不局限于单一模型托管 API。 [S1](https://fireworks.ai/blog/reinforcement-fine-tuning) [S2](https://fireworks.ai/blog/firefunction-v2-launch-post)

估值从 2024年07月 $552M 提升到 2025年10月 $4B，资本与生态背书明显增强。 [S1](https://www.bloomberg.com/news/articles/2024-07-11/sequoia-nvidia-back-startup-fireworks-ai-at-552-million-valuation) [S2](https://finance.yahoo.com/news/fireworks-ai-raises-250m-series-113000042.html)

Challenges and risks

专业 inference provider 与 hyperscaler 邻近方案竞争激烈，价格战和性能战并行。 [S1](https://sacra.com/c/fireworks-ai/) [S2](https://blogs.nvidia.com/blog/inference-open-source-models-blackwell-reduce-cost-per-token/)

开源模型同质化可能压缩毛利，要求其持续保持 runtime efficiency 和产品化领先。 [S1](https://sacra.com/c/fireworks-ai/) [S2](https://fireworks.ai/blog/fireoptimizer)

对 GPU 供给周期与上游硬件生态依赖较高，成本优势受供应链与代际切换影响。 [S1](https://fireworks.ai/blog/series-c) [S2](https://blogs.nvidia.com/blog/inference-open-source-models-blackwell-reduce-cost-per-token/)

作为 private company，公开披露深度有限，独立可比的财务与性能验证仍有盲区。 [S1](https://www.forbes.com/companies/fireworks-ai/) [S2](https://fireworks.ai/blog/series-c)

Market position

Fireworks AI 当前处于“hyperscaler 托管能力”与“开发者轻量模型托管平台”之间的中间层，主打高性能开源模型 inference cloud 和企业生产可控部署。其商业叙事核心是“速度、成本、可控性”三者并举，并通过 dedicated/on-demand 与 kernel 优化能力提高企业迁移门槛。从融资与运营披露看，公司在 2024-2025 年已由早期扩张进入 late-stage scale-up 阶段。中短期内，其市场位置将取决于两点：一是能否在与 Together/Baseten/Replicate 的平台广度竞争中维持性能优势，二是能否在与 Groq/DeepInfra 的 token economics 竞争中持续兑现单位成本与吞吐优势。

Sources

fireworks.ai — fireworks.aiHigh confidence · 2+ independent authoritative sources forbes.com — forbes.comHigh confidence · 2+ independent authoritative sources finance.yahoo.com — finance.yahoo.comHigh confidence · 2+ independent authoritative sources linkedin.com — linkedin.comHigh confidence · 2+ independent authoritative sources S1 — fireworks.aiHigh confidence · 2+ independent authoritative sources