A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder
machine-learning deep-learning vqa clip text-encoding image-and-text visual-question-answering vqa-dataset image-encoding vizwiz clip-model vizwiz-vqa visual-question-anwsering open-ai-clip vqa-2023
- Updated
Jun 27, 2023 - Jupyter Notebook