Technical Report • Project page • Demo (Temporarily Unavailable)
minidalle3.mp4
An experimental attempt to obtain the interactive and interleave text-to-image and text-to-text experience of DALL•E 3 and ChatGPT.
- Download the checkpoint and save it as following
checkpoints - models - sdxl_models- run the following commands, and you will get a gradio-based web demo.
export OPENAI_API_KEY="your key" python -m minidalle3.web - To use other LLM rather than ChatGPT, such as
baichuan.
python -m minidalle3.llm.baichuan export OPENAI_API_BASE="http://0.0.0.0:10039/v1" python -m minidalle3.web
chatglm,baichuan,internlmare tested. llama have not supported yet. qwen is not tested.
- Support generating image interleaved in the conversations.
- Support generating multiple images at once.
- Support selecting image.
- Support refinement.
- Support prompt refinement/variation.
- Instruct tuned LLM/SD.
If you find this repo helpful, please consider citing us.
@misc{minidalle3, author={Lai, Zeqiang and Zhu, Xizhou and Dai, Jifeng and Qiao, Yu and Wang, Wenhai}, title={Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models}, year={2023}, url={https://github.com/Zeqiang-Lai/Mini-DALLE3}, }
