Meta believes its model could lead a new wave of song trends, just as synthesizers changed music.
①Meta has released AudioCraft (directly translated as Audio Craft), an open-source artificial intelligence (AI) tool that helps users create music and audio based on text prompts;
② This AI tool combines three models or technologies, AudioGen, EnCodec and MusicGen, into one.
Caixin News Agency, August 3, EST Wednesday, Meta released an open-source artificial intelligence (AI) tool AudioCraft (directly translated as Audio Craft), which can help users create music and audio based on text prompts.
(Source: Meta's official website)
Meta says that this AI tool combines three models or technologies - AudioGen, EnCodec and MusicGen - into one, which can be used to generate high-quality, realistic audio and music from textual content.
Meta describes on its website that MusicGen has been trained with Meta-owned and specially licensed music to generate music from textual cues, while AudioGen has been trained with public sound effects to generate audio from textual cues, such as simulated dog barks or footsteps; coupled with an improved version of the EnCodec codec, users can be more efficient in generating higher-quality music.
In early June, Meta launched an open-source AI model called MusicGen, a deep learning language model that generates music from text cues.
Meta's EnCodec is a deep-learning-based audio codec powered by AI that can compress audio up to 10 times smaller than the MP3 format with no loss in audio quality.
AudioGen is an artificial intelligence model from Meta and a team of researchers at the Hebrew University of Jerusalem that can generate audio from input text or extend existing audio.AudioGen can distinguish between different sound objects and separate them acoustically.
Meta also demonstrated a flowchart of how MusicGen and AudioGen work, and said it will make the models open source so that researchers and practitioners can train their own models with their own datasets and help advance the field of AI-generated audio and music.
The AudioCraft family of models generates consistent, high-quality music and audio over time compared to other music models, and also simplifies the overall design of the audio generation model, making the tool simple to use.
Meta believes its models can lead the way for a new wave of song trends, just as synthesizers have changed music. "We think MusicGen can turn into a new kind of musical instrument, just like the synthesizer first appeared."
Of course, Meta also recognizes that it's still difficult to create complex and great music, which is why it chose to open-source AudioCraft in order to diversify the data used to train it.
Earlier this year, Google also released a music generation model called MusicLM, which was made available to all users in May upwards. In addition to this, some of the more common music models currently available are Riffusion, Mousai and Noise2Music.
①Meta发布了一款开源人工智能(AI)工具AudioCraft(直译为音频技艺),该工具可以帮助用户根据文本提示创作音乐和音频;
②这款人工智能工具将AudioGen、EnCodec和MusicGen三种模型或技术融为一炉。
财联社8月3日讯 美东时间周三,Meta发布了一款开源人工智能(AI)工具AudioCraft(直译为音频技艺),该工具可以帮助用户根据文本提示创作音乐和音频。
(来源:Meta官网)
Meta表示,这款人工智能工具将AudioGen、EnCodec和MusicGen三种模型或技术融为一炉,可用文本内容生成高质量、逼真的音频和音乐。
Meta在官网介绍称,MusicGen接受过Meta拥有的和特别授权的音乐训练,可以从文本提示生成音乐,而AudioGen接受过公共音效训练,可从文本提示生成音频,比如模拟狗叫或脚步声;再加上EnCodec编解码器的改进版本,用户可以更高效率地生成更高质量的音乐。
在6月初,Meta推出了名为MusicGen的开源人工智能模型,这是一种深度学习语言模型,可以根据文本提示生成音乐。
Meta的EnCodec是一个基于深度学习的音频编解码器,由人工智能驱动,可以在音频质量没有损失的前提下,将音频压缩到比MP3格式还要小10倍的程度。
AudioGen则是一个来自Meta和耶路撒冷希伯来大学的研究团队的人工智能模型,可以通过输入文本来生成音频,亦可以扩展现有音频。AudioGen可以区分不同的声音对象,并在声学上将它们分开。
Meta还演示了MusicGen和AudioGen工作的流程图,并表示将让这些模型开源,让研究人员和从业人员可以用自己的数据集训练适合自己的模型,并帮助推进人工智能生成音频和音乐领域的发展。
与其他音乐模型相比,AudioCraft系列模型能够生成长期一致的高质量音乐和音频,还简化了音频生成模型的整体设计,使得该工具简单易用。
Meta相信它的模型可以引领新一波歌曲潮流,就像合成器改变音乐一样。“我们认为MusicGen可以变成一种新型的乐器,就像最初出现的合成器一样。”
当然,Meta也承认创作复杂而又优秀的音乐还是比较困难的,因此它选择将AudioCraft开源,以使用于训练它的数据多样化。
今年早些时候,谷歌也发布了名为MusicLM的音乐生成模型,并于5月向上月向所有用户开放。除此之外,目前较为常见的音乐模型还有Riffusion、Mousai和Noise2Music等。
本文来自财联社,编辑:牛占林,36氪经授权发布。
该文观点仅代表作者本人
Add comment
Comments