Looking ahead: They came first for our art, then for our prose and jumbled essays.

Looking ahead: They came first for our art, then for our prose and jumbled essays.

Now they are coming for music, with a "new" machine learning algorithm that's adapting image generation to create, interpolate and loop new music clips and genres.

Now they are coming for music, with a "new" machine learning algorithm that's adapting image generation to create, interpolate and loop new music clips and genres.

The Stable Diffusion (SD) algorithm was modified for music by Seth Forsgren and Hayk Martiros, resulting in an unusual new type of "music machine."

The Stable Diffusion (SD) algorithm was modified for music by Seth Forsgren and Hayk Martiros, resulting in an unusual new type of "music machine."

Riffusion works on the same principle as SD, turning a text prompt into new, AI-generated content.

Riffusion works on the same principle as SD, turning a text prompt into new, AI-generated content.

The primary distinction is that the system was explicitly trained using sonograms, which can visually represent music and audio.

The primary distinction is that the system was explicitly trained using sonograms, which can visually represent music and audio.

A sonogram (or spectrogram for audio frequencies), as described on the Riffusion website, is a visual representation of the frequency content of a sound sample.

A sonogram (or spectrogram for audio frequencies), as described on the Riffusion website, is a visual representation of the frequency content of a sound sample.

The X-axis represents time, while the Y-axis represents frequency. The color of each pixel gives the amplitude of the audio at the frequency and time given by its row and column.

The X-axis represents time, while the Y-axis represents frequency. The color of each pixel gives the amplitude of the audio at the frequency and time given by its row and column.

Audio processing takes place after the model, and the system can provide infinitely many different prompts by changing the seed.

Audio processing takes place after the model, and the system can provide infinitely many different prompts by changing the seed.

Riffusion creates a new sonogram and uses Torchaudio to convert the visual to sound.

Riffusion creates a new sonogram and uses Torchaudio to convert the visual to sound.

Riffusion creates a new sonogram and uses Torchaudio to convert the visual to sound.

Riffusion creates a new sonogram and uses Torchaudio to convert the visual to sound.