DeepSeek-V3
The author claim that it outperforms Claude 3.5 Sonnet
By AnthropicClaude 3.5 Sonnet and ChatGPT 4o
By openaiChatGPT 4o for coding tasks.
671B MoE parameters
37B activated parameters
Trained on 14.8T high-quality tokens
Context length: 128k tokens
60 tokens/second (3x faster than V2!)
Multi-Token Prediction (MTP): speculative decoding for inference acceleration
Improved reasoning capabilities
Fully open-source models & papers
Prices
Before 2024-02-08 | After 2024-02-08 | |
---|---|---|
Input (cache miss) | $0.14/M tokens | $0.27/M tokens |
Input (cache hit) | $0.014/M tokens | $0.07/M tokens |
Output | $0.28/M tokens | $1.10/M tokens |
See Also
DeepSeek
Hugging Face: DeepSeek-V3-Base
Hugging Face: DeepSeek-V3
Introducing DeepSeek-V3 · 2024-12-26
· deepseek-ai
DeepSeek-V3 Technical Report · 2024-12-27
· deepseek-ai
et al.
DeepSeek-V3 + Cline & Aider: This is The BEST AI Coding Setup Right Now! (Beats Cursor!) · 2024-12-27
· AICodeKing
DeepSeek-V3 + Cline: Develop a Full-stack App For FREE Without Writing ANY Code! · 2024-12-29
· WorldofAI