Official inference codebase for the FLUX family of flow-matching text-to-image models by Black Forest Labs. FLUX models use a transformer architecture that performs flow matching in latent space, conditioned on text encoded by T5 and CLIP.
| Name | Description |
|---|---|
flux-dev | Guidance-distilled 12B model, high quality |
flux-dev-krea | FLUX.1-Krea-dev variant |
flux-schnell | Distilled 4-step model, fast inference |
flux-dev-canny | ControlNet-style Canny edge conditioning |
flux-dev-depth | Depth map conditioning |
flux-dev-fill | Inpainting / outpainting |
flux-dev-redux | Image-conditioned generation |
flux-dev-kontext | In-context image editing |
Models are downloaded automatically from HuggingFace on first use. Several are gated and require you to accept their terms on HuggingFace before downloading.
- Python >= 3.10
- CUDA GPU (recommended)
# Clone the repo git clone https://github.com/NhuGiap04/Test-time-steering.git cd Test-time-steering # Install with PyTorch support pip install -e ".[torch]" # Or install everything (Gradio + Streamlit demos too) pip install -e ".[all]"Gated models (e.g. flux-dev) require a HuggingFace account with access granted. Authenticate before running:
huggingface-cli login # or set the environment variable export HF_TOKEN=your_token_hereflux --name flux-schnell \ --prompt "a serene mountain lake at sunrise" \ --width 1360 \ --height 768 \ --output_dir output/For flux-dev, which supports guidance:
flux --name flux-dev \ --prompt "a photo of an astronaut riding a horse on Mars" \ --guidance 3.5 \ --num_steps 50 \ --output_dir output/Generated images are saved to output/img_<idx>.jpg.
flux --name flux-dev --loopIn loop mode you can change the prompt, resolution, guidance, seed, and steps between generations using slash commands (/w, /h, /g, /s, /n).
import torch from flux.util import load_ae, load_clip, load_flow_model, load_t5 from flux.sampling import denoise, get_noise, get_schedule, prepare, unpack device = torch.device("cuda") name = "flux-schnell" t5 = load_t5(device, max_length=256) # shorter context for schnell clip = load_clip(device) model = load_flow_model(name, device=device) ae = load_ae(name, device=device) H, W = 768, 1360 x = get_noise(1, H, W, device=device, dtype=torch.bfloat16, seed=42) inp = prepare(t5, clip, x, prompt="a cat sitting on a windowsill") timesteps = get_schedule(4, inp["img"].shape[1], shift=False) # 4 steps, no shift with torch.inference_mode(): x = denoise(model, **inp, timesteps=timesteps, guidance=0.0) x = unpack(x.float(), H, W) with torch.autocast(device_type="cuda", dtype=torch.bfloat16): x = ae.decode(x)import torch from flux.util import load_ae, load_clip, load_flow_model, load_t5 from flux.sampling import denoise, get_noise, get_schedule, prepare, unpack device = torch.device("cuda") name = "flux-dev" t5 = load_t5(device, max_length=512) # longer context for dev clip = load_clip(device) model = load_flow_model(name, device=device) ae = load_ae(name, device=device) H, W = 768, 1360 x = get_noise(1, H, W, device=device, dtype=torch.bfloat16, seed=42) inp = prepare(t5, clip, x, prompt="a photo of an astronaut riding a horse on Mars") timesteps = get_schedule(50, inp["img"].shape[1], shift=True) # 50 steps with shift with torch.inference_mode(): x = denoise(model, **inp, timesteps=timesteps, guidance=3.5) x = unpack(x.float(), H, W) with torch.autocast(device_type="cuda", dtype=torch.bfloat16): x = ae.decode(x)flux-dev-krea is a fine-tune of flux-dev and shares the exact same architecture and parameter count (~12B). Usage is identical — just swap the model name:
import torch from flux.util import load_ae, load_clip, load_flow_model, load_t5 from flux.sampling import denoise, get_noise, get_schedule, prepare, unpack device = torch.device("cuda") name = "flux-dev-krea" # only this line changes vs flux-dev t5 = load_t5(device, max_length=512) clip = load_clip(device) model = load_flow_model(name, device=device) ae = load_ae(name, device=device) H, W = 768, 1360 x = get_noise(1, H, W, device=device, dtype=torch.bfloat16, seed=42) inp = prepare(t5, clip, x, prompt="a serene mountain lake at sunrise") timesteps = get_schedule(50, inp["img"].shape[1], shift=True) with torch.inference_mode(): x = denoise(model, **inp, timesteps=timesteps, guidance=3.5) x = unpack(x.float(), H, W) with torch.autocast(device_type="cuda", dtype=torch.bfloat16): x = ae.decode(x)python demo_gr.py # Gradio UI python demo_st.py # Streamlit UI python demo_st_fill.py # Streamlit inpainting UIFor faster inference with TensorRT:
pip install -e ".[tensorrt]" flux --name flux-dev --trt --prompt "your prompt here"Pre-exported ONNX models are downloaded automatically from the corresponding HuggingFace ONNX repository.