Training Vision Encoder from scractch

Hi!
Thank you for the really cool research and available code. I was wondering, would it be possible / feasable / interesting to train the LLM2CLIP's vision encoder from scratch using the CC-LLM as text encoder?
I noticed in the paper you only finetuned vision encoders with the CC-LLM, but I don't see why we couldn't just immediately train a blank vision encoder. Is it because generating so many embeddings with the CC-LLM would cost too much?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Vision Encoder from scractch #52

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training Vision Encoder from scractch #52

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions