If you are only interested in inference, you can download just the Slakh2100 test-set that we used. It is available at the following link.
# Move into this directory cd data/ # Extract dataset here tar -xvf slakh2100-testset-22050.tar.xzThe test set alone should occupy around 7GB of memory.
If you are interested in training some models of your own, you need to download the complete dataset.
Instructions for downloading are the following:
- Download the compressed data for each stem (bass, drums, etc.)
- Extract the data in this folder (i. e.
data/) - [optional] Delete the compressed data
- Run the shell script
convert_data_format.sh
In the sections below, you can find a more precise description for each of these steps.
You can download the data we used in our experiments from the following links:
Move the downloaded files into the data/ directory.
data/bass_22050.tar.xz data/drums_22050.tar.xz data/guitar_22050.tar.xz data/piano_22050.tar.xz # Move inside this directory cd data/ # Decompress and extract data tar -xvf bass_22050.tar.xz tar -xvf drums_22050.tar.xz tar -xvf guitar_22050.tar.xz tar -xvf piano_22050.tar.xzThis step might take a while, especially depending on your hardware. If you have a fast internet connection, consider instead downloading the zipped versions from here.
After the extraction of all the sources dataset, you should have four directories:
data/bass_22050/ data/drums_22050/ data/guitar_22050/ data/piano_22050/ To free up some space it is possible now to delete the compressed version of the data. It will no longer be necessary.
rm data/*_22050.tar.xz Before being able to use the dataset for training, it is necessary to run the following command:
# Move inside this directory cd data/ # Make script executable chmod +x ./convert_data_format.sh # Convert the format of your data ./convert_data_format.shThis command will convert the downloaded data into a format that the training script can digest. In particular, after running everything, your data/ directory should contain the slakh2100 folder, organized in the following fashion:
data/ └─── slakh2100/ └─── train/ └─── Track00001/ └─── bass.wav └─── drums.wav └─── guitar.wav └─── piano.wav ... ...
⚠️ NOTE: After running the script, the space occupied bydata/should not change drastically, since all the files are hard-links, and are not actually copied.