作者:Siddharth Jindal
Chinese AI startup DeepSeek has reported a theoretical daily profit margin of 545% for its inference services, despite limitations in monetisation and discounted pricing structures. The company shared these details in a recent GitHub post, outlining the operational costs and revenue potential of its DeepSeek-V3 and R1 models.
Based on DeepSeek-R1’s pricing model—charging $0.14 per million input tokens for cache hits, $0.55 per million for cache misses, and $2.19 per million output tokens—the theoretical revenue generated daily is $562,027.
However, the company acknowledged that actual earnings were significantly lower due to lower pricing for DeepSeek-V3, free access to web and app services, and automatic nighttime discounts. “Our pricing strategy prioritises accessibility and long-term adoption over immediate revenue maximisation,” DeepSeek said.
According to the company, DeepSeeks inference services run on NVIDIA H800 GPUs, with matrix multiplications and dispatch transmissions using the FP8 format, while core MLA computations and combine transmissions operate in BF16. The company scales its GPU usage based on demand, deploying all nodes during peak hours and reducing them at night to allocate resources for research and training.
The GitHub post revealed that over a 24-hour period from February 27, 2025, to 12:00 PM on February 28, 2025, 12:00 PM, DeepSeek recorded peak node occupancy at 278, with an average of 226.75 nodes in operation. With each node containing eight H800 GPUs and an estimated leasing cost of $2 per GPU per hour, the total daily expenditure reached $87,072.
The above revelation could affect the US stock market. The launch of DeepSeek’s latest model, R1, which the company claims was trained on a $6 million budget, triggered a sharp market reaction. NVIDIA’s stock tumbled 17%, wiping out nearly $600 billion in value, driven by concerns over the model’s efficiency.
However, NVIDIA chief Jensen Huang, during the recent earnings call, said the company’s inference demand is accelerating, fuelled by test-time scaling and new reasoning models. “Models like OpenAI’s, Grok 3, and DeepSeek R1 are reasoning models that apply inference-time scaling. Reasoning models can consume 100 times more compute,” he said.
“DeepSeek-R1 has ignited global enthusiasm. It’s an excellent innovation. But even more importantly, it has open-sourced a world-class reasoning AI model,” Huang said.
According to a recent report, DeepSeek plans to release its next reasoning model, the DeepSeek R2, ‘as early as possible.’ The company initially planned to release it in early May but is now considering an earlier timeline. The model is said to produce ‘better coding’ and reason in languages beyond English.