Hey friends,
It’s been impossible to escape the avalanche of headlines about DeepSeek, but I noticed there’s too much noise and misleading information (like DeepSeek R1 only costing $5.6 million to train). So I spent last night reading all these different sources to stitch together a clear explanation of why this is important news.
To make it more digestible, I made this visual explainer because of all the positive feedback I got from you last week.
Watch on: TikTok | Instagram | YouTube
DeepSeek shocked the world for 4 reasons:
Training Cost: Different outlets and content creators reported that DeepSeek R1 cost $5.6 million to train. This is incorrect. The base model, DeepSeek V3, cost $5.6 million to train in its final run which excludes the different experiments they did leading up to the final result. The cost to train R1 was likely more and we don’t know how much. We don’t know how much it cost OpenAI to train its o1 model either. The only estimate we have is that GPT-4 cost more than $100 million to train.
Training Method: DeepSeek used Reinforcement Learning (letting the model learn and improve based on rewards) as opposed to supervised fine-tuning (feeding specific examples it should learn from) like OpenAI did with o1. Here’s a visual explainer of different machine learning methods if you need a refresher.
Usage Cost: The biggest shock might’ve been that DeepSeek R1 is open-source and free for consumers to use. It’s also 97% cheaper for developers and businesses who want to use their API within their own applications.
Hardware: Due to the US’s export controls, NVIDIA can only sell H800 GPUs to China which are a modified (and weaker) version of the H100s that all American companies use. Training a reasoning model on less efficient hardware puts a big red question mark on the need for massive hardware investments and sent NVIDIA’s stock down more than 15%.
All these factors combined have put into question America’s perceived dominance in the AI space. China was able to match OpenAI’s o1 performance across benchmarks with worse hardware, less data, while making it open-source.
I hope this helps clarify the facts of the story and equips you with the right information.
Credits & Acknowledgments:
The best resources that I read by far were:
Ben Thompson’s DeepSeek FAQ which goes into way more detail if you’re curious. Read it here.
Share
If you know someone who would enjoy this type of content, tell them to subscribe to Year 2049 at this link (year2049.substack.com) or share this post with them.