This repository is the PyTorch implementation for the paper A Deep Multi-modal Explanation Model for Zero-shot Learning in IEEE Transactions on Image Processing (TIP), 2020.
In this project, we provide the data, source codes and Grad-CAM visualization.
- PyTorch
- Python
Download the data from here.
Note that, we extract new visual features from ResNet-101 instead of using the features from previous works.
For each image, we extract one visual feature without using any data augmentation like crop and flip, because the data augmentation will affect the correct alignment of visual explanations afterwards.
-
Run
DME.pyto train the visual-semantic embedding module. -
Run
DME_joint.pyto train the textual explanation module. -
Run
.\Grad-CAM\gradcam_resnet101.pyto generate the visual explanation.
This repo is based on the codebase of f-CLSWGAN
More instructions will be provided later.
