DeepSeek-V3

https://github.com/deepseek-ai/DeepSeek-V3

The author claim that it outperforms Claude 3.5 Sonnet
By Anthropic Claude 3.5 Sonnet and ChatGPT 4o
By openaiChatGPT 4o for coding tasks.

671B MoE parameters
37B activated parameters
Trained on 14.8T high-quality tokens
Context length: 128k tokens
60 tokens/second (3x faster than V2!)
Multi-Token Prediction (MTP): speculative decoding for inference acceleration
Improved reasoning capabilities
Fully open-source models & papers

Prices

	Before 2024-02-08	After 2024-02-08
Input (cache miss)	$0.14/M tokens	$0.27/M tokens
Input (cache hit)	$0.014/M tokens	$0.07/M tokens
Output	$0.28/M tokens	$1.10/M tokens

See Also

DeepSeek
Hugging Face: DeepSeek-V3-Base
Hugging Face: DeepSeek-V3

Introducing DeepSeek-V3 · 2024-12-26 · deepseek-ai
DeepSeek-V3 Technical Report · 2024-12-27 · deepseek-ai et al.
DeepSeek-V3 + Cline & Aider: This is The BEST AI Coding Setup Right Now! (Beats Cursor!) · 2024-12-27 · AICodeKing
DeepSeek-V3 + Cline: Develop a Full-stack App For FREE Without Writing ANY Code! · 2024-12-29 · WorldofAI