Skip to content

chenllliang/DreamEngine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DreamEngine

Static Badge Static Badge

截屏2025-02-23 22 38 04

DreamEngine is a unified framework that integrates multimodal encoders like QwenVL with diffusion models through a two-stage training approach, enabling advanced text-image interleaved control and achieving state-of-the-art performance in generating images with complex, concept-merged inputs.

demo.mp4

Updates:

  • 2025-03-03: Release checkpoint and a demo for text-guided object fusion.

Run the Demo locally

bash setup.sh # setup the paths in demo.py python src/scripts/eval/demo.py 

Model Structure

截屏2025-02-27 23 14 47

Training

截屏2025-02-27 23 15 16

Demos

截屏2025-02-27 23 15 03 截屏2025-02-27 23 15 24 截屏2025-02-27 23 15 30

Citation

If you feel the work helpful, please kindly cite

@misc{chen2025multimodalrepresentationalignmentimage, title={Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think}, author={Liang Chen and Shuai Bai and Wenhao Chai and Weichu Xie and Haozhe Zhao and Leon Vinci and Junyang Lin and Baobao Chang}, year={2025}, eprint={2502.20172}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.20172}, }

About

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages