Stars
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Code for the paper "CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification", Computers in Biology and Medicine, 2025.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Undergraduate period academic garbage