DeepSeek-V3

The author claim that it outperforms Claude 3.5 Sonnet
By Anthropic
Claude 3.5 Sonnet
and ChatGPT 4o
By openai
ChatGPT 4o
for coding tasks.

  • 671B MoE parameters

  • 37B activated parameters

  • Trained on 14.8T high-quality tokens

  • Context length: 128k tokens

  • 60 tokens/second (3x faster than V2!)

  • Multi-Token Prediction (MTP): speculative decoding for inference acceleration

  • Improved reasoning capabilities

  • Fully open-source models & papers

Prices

Before 2024-02-08After 2024-02-08
Input (cache miss)$0.14/M tokens$0.27/M tokens
Input (cache hit)$0.014/M tokens$0.07/M tokens
Output$0.28/M tokens$1.10/M tokens

See Also

DeepSeek
Hugging Face: DeepSeek-V3-Base
Hugging Face: DeepSeek-V3

Introducing DeepSeek-V3 · 2024-12-26 · deepseek-ai
DeepSeek-V3 Technical Report · 2024-12-27 · deepseek-ai et al.
DeepSeek-V3 + Cline & Aider: This is The BEST AI Coding Setup Right Now! (Beats Cursor!) · 2024-12-27 · AICodeKing
DeepSeek-V3 + Cline: Develop a Full-stack App For FREE Without Writing ANY Code! · 2024-12-29 · WorldofAI