This visualizer explores BigGAN (Brock et al., 2018) latent space by using pitch/tempo of an audio file to generate and interpolate between noise/class vector inputs to the model. Classes are chosen manually or optionally using semantic similarity on BERT encodings of a lyrics corpus.
usage: visualizer.py [-h] -s SONG [-r {128,256,512}] [-d DURATION] [-ps [200-295]] [-ts [0.05-0.8]] [-c CLASSES [CLASSES ...]] [-n NUM_CLASSES] [-j [0-1]] [-fl i*2^6] [-t [0.1-1]] [-sf [10-30]] [-bs BATCH_SIZE] [-o OUTPUT_FILE] [--use_last_vectors] [--use_last_classes] [--sort_pitch] [-l LYRICS] [-e {sbert,doc2vec}] [-es {best,random,ransac}] - In order to speed up runtime, code can be run on Google Colab GPUs (or other cloud notebook providers) using
biggan_music_visualizer.ipynb(hosted here). - The
[-n NUM_CLASSES]parameter selects the number of classes to interpolate between. - Default behavior is to select
[-n NUM_CLASSES]random classes. The[-c CLASSES [CLASSES ...]]parameter can be used to select specific ImageNet classes. A full list can be found here, and a list categorized by coarse descriptors here. Be sure to use theintids and not thestringlabels, and set[-n NUM_CLASSES]to the number of chosen classes. - Use the
[--sort_by_power]flag to map classes to the[-n NUM_CLASSES]highest power pitches. By default, classes are mapped to a chromatic scale. - The
[-d DURATION]parameter can be useful to generate short videos while tweaking other parameters. Once the desired parameters are set, use the[--use_last_vector]flag and remove the[-d DURATION]parameter to generate the same video at full length. - Reducing the output resolution with
[-r {128,256,512}]and/or increasing the frame length with[-fl i*2^6]can help reduce the runtime. - To compute classes through semantic similarity to a lyrics file, use the
[-l LYRICS]parameter. The embedding technique and strategy for choosing classes can be set with[-e {sbert,doc2vec}]and[-es {best,random,ransac}]respectively. - Pitch and tempo sensitivity can be set with
[-ps [200-295]]and[-ts [0.05-0.8]]respectively. Jitter, truncation and smooth factor can be set with[-j [0-1]],[-t [0.1-1]]and[-sf [10-30]]respectively. - See the help column of the
argumentssection for details on all parameters.
| short | long | default | range | help |
|---|---|---|---|---|
-h | --help | show this help message and exit | ||
-s | --song | path to input audio file [REQUIRED] | ||
-r | --resolution | 512 | {128,256,512} | output video resolution |
-d | --duration | None | int | output video duration |
-ps | --pitch_sensitivity | 220 | [200-295] | controls the sensitivity of the class vector to changes in pitch |
-ts | --tempo_sensitivity | 0.25 | [0.05-0.8] | controls the sensitivity of the noise vector to changes in volume and tempo |
-c | --classes | None | manually specify [--num_classes] ImageNet classes | |
-n | --num_classes | 12 | [1-12] | number of unique classes to use |
-j | --jitter | 0.5 | [0-1] | controls jitter of the noise vector to reduce repitition |
-fl | --frame_length | 512 | i*2^6 | number of audio frames to video frames in the output |
-t | --truncation | 1 | [0.1-1] | BigGAN truncation parameter controls complexity of structure within frames |
-sf | --smooth_factor | 20 | [10-30] | controls interpolation between class vectors to smooth rapid flucations |
-bs | --batch_size | 20 | int | BigGAN batch_size |
-o | --output_file | name of output file stored in output/, defaults to [--song] path base_name | ||
--use_last_vectors | False | bool | set flag to use previous saved class/noise vectors | |
--use_last_classes | False | bool | set flag to use previous classes | |
--sort_pitches | False | bool | set flag to sort pitches by the ordering of classes | |
-l | --lyrics | None | path to lyrics file; setting [--lyrics LYRICS] computes classes by semantic similarity under BERT encodings | |
-e | --encoding | sbert | {sbert,doc2vec} | controls choice of sentence embeddings technique |
-es | --encoding_strategy | None | {random,best,ransac} | controls strategy for choosing classes: [-e sbert] can use best or random while [-e doc2vec] can use ransac |
Thanks to Matt Siegelman for providing the inspiration as well as a boilerplate for the project.
