BigGAN Audio Visualizer

Description

This visualizer explores BigGAN (Brock et al., 2018) latent space by using pitch/tempo of an audio file to generate and interpolate between noise/class vector inputs to the model. Classes are chosen manually or optionally using semantic similarity on BERT encodings of a lyrics corpus.

Usage:

usage: visualizer.py [-h] -s SONG [-r {128,256,512}] [-d DURATION] [-ps [200-295]] [-ts [0.05-0.8]] [-c CLASSES [CLASSES ...]] [-n NUM_CLASSES] [-j [0-1]] [-fl i*2^6] [-t [0.1-1]] [-sf [10-30]] [-bs BATCH_SIZE] [-o OUTPUT_FILE] [--use_last_vectors] [--use_last_classes] [--sort_pitch] [-l LYRICS] [-e {sbert,doc2vec}] [-es {best,random,ransac}]

In order to speed up runtime, code can be run on Google Colab GPUs (or other cloud notebook providers) using biggan_music_visualizer.ipynb (hosted here).
The [-n NUM_CLASSES] parameter selects the number of classes to interpolate between.
Default behavior is to select [-n NUM_CLASSES] random classes. The [-c CLASSES [CLASSES ...]] parameter can be used to select specific ImageNet classes. A full list can be found here, and a list categorized by coarse descriptors here. Be sure to use the int ids and not the string labels, and set [-n NUM_CLASSES] to the number of chosen classes.
Use the [--sort_by_power] flag to map classes to the [-n NUM_CLASSES] highest power pitches. By default, classes are mapped to a chromatic scale.
The [-d DURATION] parameter can be useful to generate short videos while tweaking other parameters. Once the desired parameters are set, use the [--use_last_vector] flag and remove the [-d DURATION] parameter to generate the same video at full length.
Reducing the output resolution with [-r {128,256,512}] and/or increasing the frame length with [-fl i*2^6] can help reduce the runtime.
To compute classes through semantic similarity to a lyrics file, use the [-l LYRICS] parameter. The embedding technique and strategy for choosing classes can be set with [-e {sbert,doc2vec}] and [-es {best,random,ransac}] respectively.
Pitch and tempo sensitivity can be set with [-ps [200-295]] and [-ts [0.05-0.8]] respectively. Jitter, truncation and smooth factor can be set with [-j [0-1]], [-t [0.1-1]] and [-sf [10-30]] respectively.
See the help column of the arguments section for details on all parameters.

Arguments

short	long	default	range	help
`-h`	`--help`			show this help message and exit
`-s`	`--song`			path to input audio file `[REQUIRED]`
`-r`	`--resolution`	`512`	`{128,256,512}`	output video resolution
`-d`	`--duration`	`None`	`int`	output video duration
`-ps`	`--pitch_sensitivity`	`220`	`[200-295]`	controls the sensitivity of the class vector to changes in pitch
`-ts`	`--tempo_sensitivity`	`0.25`	`[0.05-0.8]`	controls the sensitivity of the noise vector to changes in volume and tempo
`-c`	`--classes`	`None`		manually specify `[--num_classes]` ImageNet classes
`-n`	`--num_classes`	`12`	`[1-12]`	number of unique classes to use
`-j`	`--jitter`	`0.5`	`[0-1]`	controls jitter of the noise vector to reduce repitition
`-fl`	`--frame_length`	`512`	`i*2^6`	number of audio frames to video frames in the output
`-t`	`--truncation`	`1`	`[0.1-1]`	BigGAN truncation parameter controls complexity of structure within frames
`-sf`	`--smooth_factor`	`20`	`[10-30]`	controls interpolation between class vectors to smooth rapid flucations
`-bs`	`--batch_size`	`20`	`int`	BigGAN batch_size
`-o`	`--output_file`			name of output file stored in `output/`, defaults to `[--song]` path base_name
	`--use_last_vectors`	`False`	`bool`	set flag to use previous saved class/noise vectors
	`--use_last_classes`	`False`	`bool`	set flag to use previous classes
	`--sort_pitches`	`False`	`bool`	set flag to sort pitches by the ordering of classes
`-l`	`--lyrics`	`None`		path to lyrics file; setting `[--lyrics LYRICS]` computes classes by semantic similarity under BERT encodings
`-e`	`--encoding`	`sbert`	`{sbert,doc2vec}`	controls choice of sentence embeddings technique
`-es`	`--encoding_strategy`	`None`	`{random,best,ransac}`	controls strategy for choosing classes: `[-e sbert]` can use `best` or `random` while `[-e doc2vec]` can use `ransac`

Acknowledgments

Thanks to Matt Siegelman for providing the inspiration as well as a boilerplate for the project.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
output		output
saved_vectors		saved_vectors
.gitignore		.gitignore
README.md		README.md
_config.yml		_config.yml
biggan_music_visualizer.ipynb		biggan_music_visualizer.ipynb
encoding.py		encoding.py
imagenet-simple-labels.json		imagenet-simple-labels.json
index.md		index.md
requirements.txt		requirements.txt
utils.py		utils.py
visualizer.py		visualizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BigGAN Audio Visualizer

Description

Usage:

Arguments

Acknowledgments

References

About

Uh oh!

Releases

Packages

Languages

rushk014/biggan-visualizer

Folders and files

Latest commit

History

Repository files navigation

BigGAN Audio Visualizer

Description

Usage:

Arguments

Acknowledgments

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages