DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI.

Chinese AI startup DeepSeek has quietly released a new large language model that’s already sending ripples through the artificial intelligence industry — not just for its capabilities, but for how it’s being deployed. The 641-gigabyte model, dubbed DeepSeek-V3-0324, appeared on AI repository Hugging Face today with virtually no announcement, continuing the company’s pattern of low-key but impactful releases.
What makes this launch particularly notable is the model’s MIT license — making it freely available for commercial use — and early reports that it can run directly on consumer-grade hardware, specifically Apple’s Mac Studio with M3 Ultra chip.
“The new DeepSeek-V3-0324 in 4-bit runs at > 20 tokens/second on a 512GB M3 Ultra with mlx-lm!” wrote AI researcher Awni Hannun on social media. While the $9,499 Mac Studio might stretch the definition of “consumer hardware,” the ability to run such a massive model locally is a major departure from the data center requirements typically associated with state-of-the-art AI.
The 685-billion-parameter model arrived with no accompanying whitepaper, blog post, or marketing push — just an empty README file and the model weights themselves. This approach contrasts sharply with the carefully orchestrated product launches typical of Western AI companies, where months of hype often precede actual releases.
Early testers report significant improvements over the previous version. AI researcher Xeophon proclaimed in a post on X.com: “Tested the new DeepSeek V3 on my internal bench and it has a huge jump in all metrics on all tests. It is now the best non-reasoning model, dethroning Sonnet 3.5.”
Source: Venturebeat