DeepSeek 的冲击可能会重塑全球竞赛。
By Sarosh Nagar, a researcher at University College London, and
David Eaves, an associate professor of digital government and a
co-deputy director of University College London’s Institute for
Innovation and Public Purpose.
February 5, 2025
The rapid release of DeepSeek-R1—one of the newest models by Chinese
AI firm DeepSeek—sent the world into a frenzy and the Nasdaq into a
dramatic plunge. The reason is simple— DeepSeek-R1, a type of artificial
intelligence reasoning model that takes time to “think” before it
answers questions, is up to 50 times cheaper to run than many U.S. AI
models. Distilled versions of it can also run on the computing power of
a laptop, while other models require several of Nvidia’s most expensive
chips. But what has really turned heads is DeepSeek’s claim that it only
spent about $6 million to finally train its model—much less than
OpenAI’s o1. While this figure is misleading and does not include the
substantial costs of prior research, refinement, and more, even partial
cost reductions and efficiency gains may have significant geopolitical
implications.
中国人工智能公司 DeepSeek 的最新模型之一
DeepSeek-R1的迅速发布让全世界陷入疯狂,纳斯达克指数也随之大幅下跌。原因很简单——DeepSeek-R1
是一种人工智能推理模型,在回答问题之前需要时间“思考”,其运行成本比许多美国人工智能模型低
50
倍。它的精简版本也可以在笔记本电脑的计算能力上运行,而其他模型则需要英伟达的几种最昂贵的芯片。但真正引人注目的是
DeepSeek 声称它最终只花费了约 600 万美元来训练其模型——比 OpenAI 的 o1
少得多。虽然这个数字具有误导性,并且不包括先前研究、改进等的大量成本,但即使是部分成本降低和效率提高也可能产生重大的地缘政治影响。
So, why is DeepSeek-R1 so much cheaper to train, run, and use? The
answer lies in several computational efficiency improvements made to the
R1 model. First, R1 used a different machine learning architecture
called “mixture of experts,” which divides a larger AI model into
smaller subnetworks, or “experts.” This approach means that when given a
prompt, R1 only needs to activate the experts relevant to a given task,
greatly decreasing its computational costs.
那么,为什么 DeepSeek-R1 的训练、运行和使用成本如此之低?答案在于 R1
模型的多项计算效率改进。首先,R1使用了一种不同的机器学习架构,称为“混合专家架构”,它将较大的人工智能模型划分为较小的子网络,或“专家”。这种方法意味着,当给出提示时,R1只需激活与给定任务相关的专家,从而大大降低了其计算成本。
Second, DeepSeek improved how efficiently R1’s algorithms used its
computational resources to perform various tasks. For example, R1 uses
an algorithm that DeepSeek previously introduced called Group Relative
Policy Optimization, which is less computationally intensive than other
commonly used algorithms. Beyond these areas, DeepSeek made other
computational optimizations as well. For example, it used fewer decimals
to represent some numbers in the calculations that occur during model
training—a technique called mixed precision training—and improved the
curation of data for the model, among many other improvements. Together,
these computational efficiency improvements produced a model that was
more cost-efficient than many other existing ones.
其次,DeepSeek 提高了 R1
算法使用其计算资源执行各种任务的效率。例如,R1 使用 DeepSeek
之前推出的一种称为群体相对策略优化的算法,该算法比其他常用算法的计算强度较小。除了这些领域之外,
DeepSeek
还进行了其他计算优化。例如,它使用更少的小数来表示模型训练期间发生的计算中的一些数字(一种称为混合精度训练的技术),并改进了模型的数据管理,以及许多其他改进。这些计算效率的提高共同产生了一个比许多其他现有模型更具成本效益的模型。
These efficiency gains are significant and offer, among many others,
four potential—though not guaranteed—implications for the global AI
market. First, these efficiency gains could potentially drive new
entrants into the AI race, including from countries that previously
lacked major AI models. Until now, the prevailing view of frontier AI
model development was that the primary way to significantly increase an
AI model’s performance was through ever larger amounts of compute—raw
processing power, essentially. Smaller players would struggle to access
this much compute, keeping many of them out of the market.
这些效率提升非常显著,并且为全球人工智能市场带来了四种潜在(尽管不能保证)的影响。首先,这些效率提升可能会推动新的参与者加入人工智能竞赛,包括来自以前缺乏主要人工智能模型的国家。到目前为止,前沿人工智能模型开发的普遍观点是,显著提高人工智能模型性能的主要方法是通过更大量的计算——本质上是原始处理能力。规模较小的企业将很难获得如此多的计算能力,从而将许多企业排除在市场之外。
However, R1, even if its training costs are not truly $6 million,
has convinced many that training reasoning models—the top-performing
tier of AI models—can cost much less and use many fewer chips than
presumed otherwise. The result, combined with the fact that DeepSeek
mainly hires domestic Chinese engineering graduates on staff, is likely
to convince other countries, firms, and innovators that they may also
possess the necessary capital and resources to train new models.
然而,即使 R1 的训练成本并非真正的
600万美元,但它已经让许多人相信,训练推理模型(人工智能模型中表现最好的一层)的成本可以比预期低得多,并且使用的芯片也可以少得多。这一结果,加上
DeepSeek
主要聘用中国国内工程学毕业生的事实,可能会让其他国家、公司和创新者相信,他们也可能拥有必要的资本和资源来训练新模型。
Indeed, such perceptions are already taking root. In the wake of R1,
Perplexity CEO Aravind Srinivas called for India to develop its own
foundation model based on DeepSeek’s example. Governments such as
France, for example, have already been supporting homegrown firms, such
as Mistral AI, to enhance their AI competitiveness, with France’s state
investment bank investing in one of Mistral’s previous fundraising
rounds. With the perception of a lower barrier to entry created by
DeepSeek, states’ interest in supporting new, homegrown AI firms may
only grow.
事实上,这样的观念已经扎根。 R1之后,Perplexity首席执行官Aravind
Srinivas呼吁印度基于DeepSeek的例子开发自己的基础模型。例如,法国等国政府已经在支持
Mistral AI 等本土企业,以提高其人工智能竞争力,法国国家投资银行投资了
Mistral 之前的一轮融资。由于 DeepSeek
降低了进入门槛,各国对支持新的本土人工智能公司的兴趣可能只会越来越大。
These lower barriers to entry may also add additional complexity to
the global AI race. In recent months, many assumed that AI would become
a footrace between Washington and Beijing. But now, while the United
States and China will likely remain the primary developers of the
largest models, the AI race may gain a more complex international
dimension. Both U.S. and Chinese firms have heavily courted
international partnerships with AI developers abroad, as seen with
Microsoft’s partnership with Arabic-language AI model developer G42 or
Huawei’s investments in the China-ASEAN AI Innovation Center. With more
entrants, a race to secure these partnerships might now become more
complex than ever.
这些较低的进入门槛也可能会增加全球人工智能竞赛的复杂性。近几个月来,许多人认为人工智能将成为华盛顿和北京之间的一场赛跑。但现在,虽然美国和中国可能仍然是最大模型的主要开发商,但人工智能竞赛可能会获得更加复杂的国际层面。美国和中国公司都大力寻求与国外人工智能开发商的国际合作伙伴关系,例如微软与阿拉伯语人工智能模型开发商G42的合作或华为对中国-东盟人工智能创新中心的投资。随着更多的进入者,争夺这些合作伙伴关系的竞争现在可能会变得比以往任何时候都更加复杂。
Furthermore, efficiency could soon join compute as another central
focus of state industrial policies in the global AI race. Prior to R1,
governments around the world were racing to build out the compute
capacity to allow them to run and use generative AI models more freely,
believing that more compute alone was the primary way to significantly
scale AI models’ performance.
此外,效率很快可能会与计算一起成为全球人工智能竞赛中国家产业政策的另一个焦点。在
R1
之前,世界各国政府都在竞相构建计算能力,以便能够更自由地运行和使用生成式
AI 模型,他们认为仅增加计算量就是显着扩展 AI 模型性能的主要方式。
India’s Mukesh Ambani, for example, is planning to build a massive
3-gigawatt data center in Gujarat, India. However, R1’s launch has
spooked some investors into believing that much less compute and power
will be needed for AI, prompting a large selloff in AI-related stocks
across the United States, with compute producers such as Nvidia seeing
$600 billion declines in their stock value.
例如,印度的穆克什·安巴尼 (Mukesh Ambani)
正计划在印度古吉拉特邦建设一座 30
亿瓦特的大型数据中心。然而,R1的推出让一些投资者感到恐慌,他们相信人工智能所需的计算和电力将大大减少,从而引发美国各地人工智能相关股票的大规模抛售,英伟达
等计算生产商的股价下跌了 6000 亿美元。
Despite these recent selloffs, compute will likely continue to be
essential for two reasons. First, there is the classic economic case of
the Jevons paradox—that when technology makes a resource more efficient
to use, the cost per use of that resource might decline, but those
efficiency gains actually make more people use the resource overall and
drive up demand.
尽管最近出现了抛售,但由于两个原因,计算可能仍然至关重要。首先,杰文斯悖论有一个经典的经济案例——当技术提高资源的使用效率时,每次使用该资源的成本可能会下降,但这些效率的提高实际上使更多的人整体使用该资源并推高需求。
There has been some evidence to support the Jevons paradox in energy
markets, whereby total compute demand might go up in any scenario. The
drop in Nvidia’s stock price was significant, but the company’s enduring
$2.9 trillion valuation suggests that the market still sees compute as a
vital part of future AI development. Second, R1’s gains also do not
disprove the fact that more compute leads to AI models that perform
better; it simply validates that another mechanism, via efficiency
gains, can drive better performance as well.
有一些证据支持能源市场中的杰文斯悖论,即在任何情况下总计算需求都可能会上升。
英伟达 股价大幅下跌,但该公司持续 2.9
万亿美元的估值表明,市场仍然将计算视为未来人工智能发展的重要组成部分。其次,R1
的收益也不能反驳这样一个事实:更多的计算会导致 AI
模型表现更好;它只是验证了另一种机制,通过提高效率,也可以带来更好的性能。
These reasons suggest that compute demand could actually increase,
not decrease—but at the same time, improving efficiency will likely be a
priority for both firms and governments. In particular, firms in the
United States—which have been spooked by DeepSeek’s launch of R1—will
likely seek to adopt its computational efficiency improvements alongside
their massive compute buildouts, while Chinese firms may try to double
down on this existing advantage as they increase domestic compute
production to bypass U.S. export controls.
这些原因表明,计算需求实际上可能会增加,而不是减少,但与此同时,提高效率可能会成为公司和政府的首要任务。特别是,因
DeepSeek R1
的推出而受到惊吓的美国公司可能会寻求在大规模计算扩展的同时采用其计算效率的改进,而中国公司可能会试图加倍利用这一现有优势,增加国内计算产量以绕过美国的出口管制。
Governments in both countries may try to support firms in these
efficiency gains, especially since documents such as the Biden
administration’s 2024 National Security Memorandum made having the
world’s most performant AI systems a national priority.
两国政府可能会尝试支持企业提高效率,特别是考虑到拜登政府的 2024
年国家安全备忘录等文件将拥有世界上性能最高的人工智能系统作为国家优先事项。
R1’s lower price, especially when compared with Western models, has
the potential to greatly drive the adoption of models like it worldwide,
especially in parts of the global south. This kind of rapid AI adoption
might accelerate AI’s benefits to economic growth in these countries,
potentially increasing their long-term geopolitical heft and posing new
challenges for U.S. policymakers concerned about the global use of
Chinese AI tools.
R1
的价格较低,特别是与西方模型相比,有可能极大地推动此类模型在全球范围内的采用,尤其是在南半球部分地区。这种人工智能的快速采用可能会加速人工智能对这些国家经济增长的好处,有可能增加这些国家的长期地缘政治影响力,并给担心全球使用中国人工智能工具的美国政策制定者带来新的挑战。
However, as DeepSeek sees this vast global market, many of America’s
powerhouse AI developers might also double down on building more
computationally efficient and lower-price models to make competitive
offerings in the AI markets in these countries, suggesting an AI race
across the global south—at the level of adoption, in addition to
partnerships—may occur.
然而,随着 DeepSeek
看到这个巨大的全球市场,许多美国强大的人工智能开发商也可能会加倍努力构建计算效率更高、价格更低的模型,以便在这些国家的人工智能市场上提供有竞争力的产品,这表明除了合作伙伴关系之外,南半球可能会在采用层面上展开人工智能竞赛。
Very little can be guaranteed in a competition as fast-moving as
this one. However, DeepSeek’s efficiency gains have provided a challenge
to existing assumptions of the global AI race and may change its
competitive dynamics in a way previously unpredicted. Across much of the
world, it is possible that DeepSeek’s cheaper pricing and more efficient
computations might give it a temporary advantage, which could prove
significant in the context of long-term adoption.
在一场如此快速发展的比赛中,几乎没有什么可以保证的。然而,DeepSeek
的效率提升对全球人工智能竞赛的现有假设提出了挑战,并可能以一种以前无法预测的方式改变其竞争动态。在世界大部分地区,DeepSeek
更便宜的定价和更高效的计算可能会为其带来暂时的优势,而这在长期采用的背景下可能会产生重大影响。
However, it may not also be long before both U.S. and homegrown or
regional alternatives enter the fray as well, triggering further
competition over who will use which platforms. With more models and
prices than ever before, only one thing is certain—the global AI race is
far from over and is far twistier than anyone thought.
然而,可能用不了多久,美国和本土或地区的替代方案也会加入竞争,引发关于谁将使用哪些平台的进一步竞争。随着模型和价格比以往任何时候都多,只有一件事是确定的——全球人工智能竞赛远未结束,而且比任何人想象的都要曲折得多。