- Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
Hi!
Thank you for the really cool research and available code. I was wondering, would it be possible / feasable / interesting to train the LLM2CLIP's vision encoder from scratch using the CC-LLM as text encoder?
I noticed in the paper you only finetuned vision encoders with the CC-LLM, but I don't see why we couldn't just immediately train a blank vision encoder. Is it because generating so many embeddings with the CC-LLM would cost too much?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels