- Added
invert-ga-overengineered.py - Usage: Same as other, see below or args @ code
- Better text embeddings -> Better model inversion:
~ Who needs a diffusion model? Img2Img with the 'text encoder'! 😉 ~
- Added Gradient Ascent (GA): Uses an input image instead of a text prompt.
- Optimizes text embeddings for cosine similarity with image embeddings
- Prints CLIP's 'opinion' about image to console
- Uses text embeddings for inversion image generation
⚠️ Same as without GA, innocent images (prompts) can lead to nefarious and NSFW inversions.- Refer to the paper by the original authors for details (see below).
- ✅ Usage example (only use this code for
--use_image; useinvert.pyfor a text--prompt):
python invert-ga.py --num_iters 3400 --use_image "in/catshoe.jpg" --img_size 64 --tv 0.0005 --batch_size 13 --bri 0.4 --con 0.4 --sat 0.4 --save_every 10 --print_every 10 --model_name ViT-L/14- ✅ Added support for
ViT-L/14@336(to all code), usage example:
python invert.py --num_iters 3400 --prompt "an ai robot" --img_size 64 --tv 0.005 --batch_size 13 --bri 0.4 --con 0.4 --sat 0.4 --save_every 10 --print_every 10 --model_name ViT-L/14@336px- GA + Inversion examples (generated with my improved ViT-L/14 fine-tune):
- Original CLIP Gradient Ascent Script: Used with permission by Twitter / X: @advadnoun
Warning: This paper contains sexually explicit images and language, offensive visuals and terminology, discussions on pornography, gender bias, and other potentially unsettling, distressing, and/or offensive content for certain readers.
Installing requirements:
pip install requirements.txtHow to run:
python invert.py \ --num_iters 3400 \ # Number of iterations during the inversion process. --prompt "The map of the African continent" \ # The text prompt to invert. --img_size 64 \ # Size of the image at iteration 0. --tv 0.005 \ # Total Variation weight. --batch_size 13 \ # How many augmentations to use at each iteration. --bri 0.4 \ # ColorJitter Augmentation brightness degree. --con 0.4 \ # ColorJitter Augmentation contrast degree. --sat 0.4 \ # ColorJitter Augmentation saturation degree. --save_every 100 \ # Frequency at which to save intermediate results. --print_every 100 \ # Frequency at which to print intermediate information. --model_name ViT-B/16 # ['RN50', 'RN101', 'RN50x4', 'RN50x16', 'ViT-B/32', 'ViT-B/16']

