Deep Learning for Generating Hindustani Classical Music

Exploring deep learning in the context of non-Western traditional music.

I created an audio-responsive mandala in p5.js using audio that is AI-generated. I used Dance Diffusion to generate Hindustani classical music, using a dataset from MTG Saraga.

Iterating on Dance Diffusion’s glitch-440k model, I generated around an hour of audio in many iterations and remixed the files to create a song that was entirely AI-generated.

The Data

Let’s start with listening to what Hindustani classical music actually sounds like.

This is a sample from my data, which is taken from the MTG Saraga dataset. I used this dataset to feed into my models.

The Models

I used three primary deep learning techniques and models. Check out my code with the links below!

  1. Dance Diffusion

  2. Fine-tune Dance Diffusion

  3. Custom Model (created by me!)

The Tools

I used Google Colab, Python, Audacity, Adobe Premiere, p5.js, and JavaScript for this project.

I went through several iterations, using different combinations of the popular audio generation model Dance Diffusion in order to generate different sounds. I also created a model myself using WaveNet architecture that generated Mel-spectrograms and converted those to audio.

You can listen to all the different iterations on the right.

Dance Diffusion on maestro-150k.

Dance Diffusion on glitch-440k.

Dance Diffusion on custom dataset.

Custom model using WaveNet architecture to generate audio.

AI can be difficult to understand.

Here, I explain the previous iterations you can hear above and give a little reasoning as to why my audio outputs in previous models may not have worked well. You can also hear the generated audio from others models with my explanation to understand how AI works a little more.