达摩院大模型M6突破10万亿参数,成全球最大AI预训练模型
Dharma Academy's large model M6 has exceeded 10 trillion parameters, becoming the world's largest AI pre-training model.
在人工智能前沿领域,中国AI实现突破。11月8日,阿里巴巴达摩院公布多模态大模型M6最新进展,其参数已从万亿跃迁至10万亿,规模远超谷歌、微软此前发布的万亿级模型,成为全球最大的AI预训练模型。同时,M6做到了业内极致的低碳高效,使用512 GPU在10天内即训练出具有可用水平的10万亿模型。相比去年发布的大模型GPT-3,M6实现同等参数规模,能耗仅为其1%。
China's AI has achieved breakthroughs in the frontier field of artificial intelligence. On November 8, Dharma Academy of Alibaba announced the latest development of the multi-modal large model M6, whose parameters have jumped from trillion to 10 trillion, far exceeding the trillion-level model previously released by Google and Microsoft. It has become the world's largest AI pre-training model. Besides, M6 achieves the ultimate low-carbon and high-efficiency in the industry since it is able to train 10 trillion usable models within 10 days by using 512 GPU. Compared with the large model GPT-3 released last year, M6 consumes only 1% of its energy when achieving the same parameter scales.
M6是达摩院研发的通用性人工智能大模型,拥有多模态、多任务能力,其认知和创造能力超越传统AI,尤其擅长设计、写作、问答,在电商、制造业、文学艺术、科学研究等领域有广泛应用前景。
M6 is a universal AI model developed by Dharma Academy. It has multi-modal and multi-task capabilities, whose cognitive and creative capabilities surpass traditional AI. It is especially good at designing, writing, and Q&A. It has a wide application prospect in the fields of e-commerce, manufacturing, literature and art, scientific research and so on.
与传统AI相比,大模型拥有成百上千倍“神经元”数量,且预先学习过海量知识,表现出像人类一样“举一反三”的学习能力。因此,大模型被普遍认为是未来的“基础模型”,将成下一代AI基础设施。然而,其算力成本相当高昂,训练1750亿参数语言大模型GPT-3所需能耗,相当于汽车行驶地月往返距离。
The large model has hundreds or thousands of times the number of "neurons" compared with traditional AI, and has learned a lot of knowledge in advance, showing the learning ability of "drawing inferences from one instance" like humans. Therefore, the large model is regarded as the "foundation model" of the future and may become the next generation of AI infrastructure. However, the cost of its calculation power is quite high. The energy consumption required to train the 175-billion parameter language large model GPT-3 is equivalent to the power that a car costs when traveling between the moon and the earth.
今年5月,通过专家并行策略及优化技术,达摩院M6团队将万亿模型能耗降低超八成、效率提升近11倍。10月,M6再次突破业界极限,通过更细粒度的CPU offload、共享-解除算法等创新技术,让收敛效率进一步提升7倍,这使得模型规模扩大10倍的情况下,能耗未显著增加。这一系列突破极大降低了大模型研究门槛,让一台机器训练出一个千亿模型成为可能。
In May of this year, the M6 team of Dharma Academy reduced the energy consumption of the trillion-model by over 80% and increased the efficiency by nearly 11 times through expert parallel strategies and optimization techniques. In October, M6 broke the limit of the industry once again. Through innovative technologies such as more fine-grained CPU offload and share-unload algorithm, the convergence efficiency was further improved by 7 times. This made the model scale 10 times larger without increasing the energy consumption significantly. This series of breakthroughs has greatly lowered the threshold for large model research, making it possible for one machine to train a hundred billion models.
同时,达摩院联合阿里云推出了M6服务化平台,为大模型训练及应用提供完备工具,首次让大模型实现“开箱即用”,算法人员及普通用户均可方便地使用平台。达摩院还推出了当前最大规模的中文多模态评测数据集MUGE,覆盖图文描述、文本生成图像、跨模态检索任务,填补了缺少中文多模态权威评测基准的空白。
At the same time, Dharma Academy and Alibaba Cloud launched the M6 service platform, which provides complete tools for training and application of large models, allowing large models to be used "out of the box" for the first time, and both algorithm personnels and ordinary users can use the platform easily. Dharma Academy also launched MUGE, the largest Chinese multi-modal evaluation data set at present, which covers graphic description, text-generated images, and cross-modal retrieval tasks, fills the gap of lacking in the Chinese multi-modal authoritative evaluation benchmarks.
作为国内首个商业化落地的多模态大模型,M6已在超40个场景中应用,日调用量上亿。今年,大模型首次支持双11。M6在犀牛智造为品牌设计的服饰已在淘宝上线;凭借流畅的写作能力,M6正为天猫虚拟主播创作剧本;依靠多模态理解能力,M6正在增进淘宝、支付宝等平台的搜索及内容认知精度。
As the first commercialized multi-modal large model in China, M6 has been applied in more than 40 scenarios, with a daily call of hundreds of millions. This year, the big model supports Double Eleven for the first time. The clothing designed by M6 for the brand in Rhino Smart Manufacturing has been launched on Taobao; with its smooth writing ability, M6 is creating scripts for Tmall virtual anchors; relying on multi-modal understanding capabilities, M6 is enhancing the search and content of Taobao, Alipay and other platforms Cognitive Accuracy.
M6生成的未来感汽车图
Futuristic Car Diagram Generated by M6
达摩院智能计算实验室负责人周靖人表示,“接下来,我们将深入研究大脑认知机理,致力于将M6的认知力提升至接近人类的水平,比如,通过模拟人类跨模态的知识抽取和理解方式,构建通用的人工智能算法底层框架;另一方面,不断增强M6在不同场景中的创造力,产生出色的应用价值。”
Jingren Zhou, head of the Intelligent Computing Laboratory of Dharma Academy, said, "Next, we will conduct in-depth research on the cognitive mechanism of the brain, and strive to improve the cognitive ability of M6 to the level close to humans. For example, constructing a general underlying framework of artificial intelligence algorithm by simulating the way of human cross modal knowledge extraction and understanding; on the other hand, continuing to enhance the creativity of M6 in different scenarios to generate excellent application value.”
据了解,达摩院语言大模型PLUG近期也已升级至2万亿参数,成为全球最大中文语言模型,其所属AliceMind语言模型体系同样推出了服务化平台。
It is understood that the language large model PLUG developed by Dharma Academy has been upgraded to 2 trillion parameters recently, becoming the world's largest Chinese language model. Its AliceMind language model system has also launched a service platform.