This repository contains the code for the paper COBRA: Contrastive Bi-Modal Representation Algorithm (ArXiv) by Vishaal Udandarao, Abhishek Maiti, Deepak Srivatsav, Suryatej Reddy, Yifang Yin and Rajiv Ratn Shah.
We present a novel framework COBRA that aims to train two modalities (image and text) in a joint fashion inspired by the Contrastive Predictive Coding (CPC) and Noise Contrastive Estimation (NCE) paradigms which preserve both inter and intra-class relationships. We empirically show that this framework reducesthe modality gap significantly and generates a robust and task agnostic joint-embedding space. We outperform existing work on four diverse downstream tasks spanning across seven benchmark cross-modal datasets.
A visualisation of the loss function:
The 7 datasets used to empirically prove our results are:
- PKU-XMedia
- MS-COCO
- NUS-Wide 10k
- Wikipedia
- FakeNewsNet
- MeTooMA
- CrisisMMD
The code has been tested on Python 3.6.8 and PyTorch 1.5.1.
- Install all the dependencies using the following command:
pip install -r requirements.txt
- Create a folder
features
to save the trained models - To train COBRA, use the following command:
python main.py
- To switch between NCE contrastive loss and softmax contrastive loss, change the
use_nce
flag. To change the number of anchor points and number of negative samples, modify thenum_anchors
andnum_negative_samples
respectively.
In case of any queries, please open an issue. We will respond as soon as possible.