generated from amazon-archives/__template_Apache-2.0
- Notifications
You must be signed in to change notification settings - Fork 333
Open
Description
Hi! Much appreciated for the excellent work!
I am working on vision-QA task using BLIP2, which consists of three modules:
ViT that extracting vision feature
QFORMER that narrow the gap between vision and language modalities
T5xxl that receive the question and the output of QFORMER to generate answers.
I wonder if it's possible to employ the mm-cot as a utility library in BLIP2 model to enhance vision-QA inference?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels