MusicLM structure incorporates MuLan + AudioLM and MuLan + w2b-Bert + Soundstream, states AI scientist Keunwoo Choi.
The three models cast “the process of conditional music creation as a hierarchical sequence-to-sequence modeling assignment. And it causes music at 24 kHz that stays consistent over several minutes.”
MusicLM has exceeded previous versions both in audio quality and commitment to the text description.
Future research has announced MusicCaps, a dataset comprised of 5.5k music-text pairs with affluent text descriptions provided by human professionals.
The MusicLM of Google has made public more than 5,000 music-text pairs available for individuals to experiment with its innovation. The company doesn’t intend to release the model to the public yet, but users can view and listen to its developed music here.