Scientists at Facebook’s parent company Meta have created MusicGen, an AI music generator. The language model can take text suggestions such, for example, “up-beat acoustic folk” or “Pop dance track with catchy melodies” and turn them into brand-new 12-second music clips, according to Meta’s Fundamental AI Research (FAIR) team.
Features of MusicGen
The model, which was made available as open source over the weekend, can also create original music using melodic hints. According to Meta, it trained MusicGen using 20,000 hours of licenced music, 10,000 “high-quality” licenced recordings, and 390,000 instrument-only files from ShutterStock and Pond5, according to TechCrunch. The entry of Meta into the field of AI music generator is a turning point in this rapidly evolving industry, as the company is now the second digital behemoth after Google to create its own language model that can produce original music from text inputs.
In January, Google debuted MusicLM, an ‘experimental AI’ tool that can create high-fidelity music from text instructions and humming, and this month it became available to the general public. According to Google, the MusicLM tool’s public usage level requires users to enter a prompt like “soulful jazz for a dinner party” in order for it to function. For the user entering the prompt, the MusicLM model will then produce two renditions of the desired song. Then you may vote for the one you like best, which, according to Google, will “help improve the AI model.”
Emphasis on Western-style Music
According to The Decoder, MusicGen “performs better on both subjective and objective metrics that test how well the music matches the lyrics and how plausible the composition is” when compared to other music models like Riffusion, Mousai, MusicLM, and Noise2Music. The musical comparisons produced by the various models are displayed below.
Gabriel Synnaeve, a Facebook research scientist, stated that Meta has released “code (MIT) and pretrained models (CC-BY non-commercial) publicly for open research, reproducibility, and for the broader music community to investigate this technology” over the weekend.
Another study by Meta’s researchers describes the effort put into training the model. They discuss the ethical issues surrounding the creation of generative AI models in the study. The study team “first made sure that all the data we trained on was covered by legal agreements with the right holders, in particular through an agreement with ShutterStock,” according to the report.
The dataset we used, which has a higher percentage of western-style music, may be lacking diversity, the paper continued. The simplification we employ in this study, such as the use of a single stage language model and fewer auto-regressive processes, can, in our opinion, widen the applications to new datasets.