DreamEngine

DreamEngine is a unified framework that integrates multimodal encoders like QwenVL with diffusion models through a two-stage training approach, enabling advanced text-image interleaved control and achieving state-of-the-art performance in generating images with complex, concept-merged inputs.

demo.mp4

Updates:

2025-03-03: Release checkpoint and a demo for text-guided object fusion.

Run the Demo locally

bash setup.sh # setup the paths in demo.py python src/scripts/eval/demo.py

Model Structure

Training

Demos

Citation

If you feel the work helpful, please kindly cite

@misc{chen2025multimodalrepresentationalignmentimage, title={Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think}, author={Liang Chen and Shuai Bai and Wenhao Chai and Weichu Xie and Haozhe Zhao and Leon Vinci and Junyang Lin and Baobao Chang}, year={2025}, eprint={2502.20172}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.20172}, }

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DreamEngine

Run the Demo locally

Model Structure

Training

Demos

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DreamEngine

Run the Demo locally

Model Structure

Training

Demos

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages