Skip to content

Conversation

@nihalgeorge01
Copy link

This PR adds support for the Qwen-2-VL (vision-language) model

@nihalgeorge01 nihalgeorge01 changed the title [MODEL] Qwen-2-VL Support [Model] Qwen-2-VL Support Feb 10, 2025
@buqimaolvshangxue
Copy link

qwen2_vl can work by this commit ? @nihalgeorge01 , i also has the needs to support qwen2_vl

@nihalgeorge01
Copy link
Author

Not yet, we are fixing some bugs in the code locally. Working on pushing this out soon

@buqimaolvshangxue
Copy link

Thank you very much for your work! When I was thinking about this problem, I found that when processing the llava model in mlc, the text embedding and the image embedding are directly spliced ​​together to get the final embedding. But it seems that in the approach of qwen2_vl in vllm, the image embedding replaces certain specific positions in the expanded embedding. I wonder if the direct splicing method of llava is feasible? But if the splicing embedding method is not adopted, it seems that the public interface function needs to be modified. @nihalgeorge01

@pillar02
Copy link

@nihalgeorge01 any progress for this PR?

really appreciate your work

@grf53
Copy link

grf53 commented Jun 10, 2025

+1

@Lingrongye
Copy link

请问现在这个合并分支能起作用了吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

6 participants