In order to allow users to completely utilise generative AI to produce music and sounds, Meta just released a new open-source AI code called AudioCraft.
It is made up of three AI models that each focus on a different aspect of sound production. Music is created by MusicGen using text inputs. This model was trained using “20,000 hours of music owned by Meta or specifically licenced for this purpose.” AudioGen was trained on common sound effects and produces audio from written prompts that simulates canine barking or footstep sounds. Users can now produce sounds with less artefacts thanks to an updated version of Meta’s EnCodec decoder, which is what happens when you over-process audio.
The business provided some AudioCraft sample audio for media to hear. The whistling, sirens, and humming noises that were produced sounded rather natural. The songs’ guitar strings felt authentic, yet they also had an unnatural quality to them.
Meta is merely the most recent company to attempt fusing music with AI. A big language model developed by Google called MusicLM, which is only available to researchers, created minutes’ worth of sounds based on text cues. Then, a song that was “AI-generated” and had Drake and The Weeknd’s voices on it went popular before being pulled down. Recently, several musicians, including Grimes, have urged listeners to contribute their voices to AI-created compositions.
EDM and festivals like Ultra aren’t new; musicians have been experimenting with electronic audio for a very long time. However, music created by computers frequently sounds like it has been edited from audio. The sounds of AudioCraft and other generative AI-generated music are made just using text and a sizable sound data bank.
Instead of being the upcoming huge pop smash, AudioCraft currently sounds more like stock music that can be utilised for ambience or lift music. But Meta is certain that its new design can usher in a new generation of songs in the same way as synthesisers did when they first became widely used.
According to the company’s blog, “We think MusicGen can turn into a new type of instrument — just like synthesisers when they first appeared.” Since audio frequently comprises millions of points where the model performs an action as opposed to written text models like Llama 2, which only contain thousands, Meta acknowledged the difficulties in developing AI models capable of producing music. The business claims that open sourcing is necessary for AudioCraft in order to diversity the training data.