Pythia-VQA

Pythia is a modular framework for vision and language multimodal research from Facebook AI Research (FAIR). Based on it, this repo conducted some baseline about VQA.

Quickstart

In this quickstart, we are going to train LoRRA model on TextVQA. Follow instructions at the bottom to train other models in Pythia.

Installation

Clone Pythia repository

git clone https://github.com/facebookresearch/pythia ~/pythia

Install dependencies and setup

cd ~/pythia
python setup.py develop

.. note::

  1. If you face any issues with the setup, check the Troubleshooting/FAQ section below.
  2. You can also create/activate your own conda environments before running
     above commands.

Getting Data

Datasets currently supported in Pythia require two parts of data, features and ImDB. Features correspond to pre-extracted object features from an object detector. ImDB is the image database for the datasets which contains information such as questions and answers in case of TextVQA.

For TextVQA, we need to download features for OpenImages’ images which are included in it and TextVQA 0.5 ImDB. We assume that all of the data is kept inside data folder under pythia root folder. Table in bottom shows corresponding features and ImDB links for datasets supported in pythia.

cd ~/pythia;
# Create data folder
mkdir -p data && cd data;

# Download and extract the features
wget https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz
tar xf open_images.tar.gz

# Get vocabularies
wget http://dl.fbaipublicfiles.com/pythia/data/vocab.tar.gz
tar xf vocab.tar.gz

# Download detectron weights required by some models
wget http://dl.fbaipublicfiles.com/pythia/data/detectron_weights.tar.gz
tar xf detectron_weights.tar.gz

# Download and extract ImDB
mkdir -p imdb && cd imdb
wget https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz
tar xf textvqa_0.5.tar.gz

Training

Once we have the data in-place, we can start training by running the following command:

cd ~/pythia;
python tools/run.py --tasks vqa --datasets textvqa --model lorra --config \
configs/vqa/textvqa/lorra.yml

Inference

For running inference or generating predictions for EvalAI, we can download a corresponding pretrained model and then run the following commands:

cd ~/pythia/data
mkdir -p models && cd models;
wget https://dl.fbaipublicfiles.com/pythia/pretrained_models/textvqa/lorra_best.pth
cd ../..
python tools/run.py --tasks vqa --datasets textvqa --model lorra --config \
configs/vqa/textvqa/lorra.yml --resume_file data/models/lorra_best.pth \
--evalai_inference 1 --run_type inference

For running inference on val set, use --run_type val and rest of the arguments remain same. Check more details in pretrained models section.

These commands should be enough to get you started with training and performing inference using Pythia.

Troubleshooting/FAQs

If setup.py causes any issues, please install fastText first directly from the source and then run python setup.py develop. To install fastText run following commands:

git clone https://github.com/facebookresearch/fastText.git
cd fastText
pip install -e .

Tasks and Datasets


+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| Dataset      | Key           | Task       | ImDB Link                                                                              | Features Link                                                                   | Features checksum                  | Notes                      |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| TextVQA      | textvqa       | vqa        | `TextVQA 0.5 ImDB`_                                                                    | `OpenImages`_                                                                   | `b22e80997b2580edaf08d7e3a896e324` |                            |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| VQA 2.0      | vqa2          | vqa        | `VQA 2.0 ImDB`_                                                                        | `COCO`_                                                                         | `ab7947b04f3063c774b87dfbf4d0e981` |                            |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| VizWiz       | vizwiz        | vqa        | `VizWiz ImDB`_                                                                         | `VizWiz`_                                                                       | `9a28d6a9892dda8519d03fba52fb899f` |                            |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| VisualDialog | visdial       | dialog     | Coming soon!                                                                           | Coming soon!                                                                    | Coming soon!                       |                            |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| VisualGenome | visual_genome | vqa        | Automatically downloaded                                                               | Automatically downloaded                                                        | Coming soon!                       | Also supports scene graphs |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| CLEVR        | clevr         | vqa        | Automatically downloaded                                                               | Automatically downloaded                                                        |                                    |                            |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+
| MS COCO      | coco          | captioning | `COCO Caption`_                                                                        | `COCO`_                                                                         | `ab7947b04f3063c774b87dfbf4d0e981` |                            |
+--------------+---------------+------------+----------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------------------------------------+----------------------------+

.. _TextVQA 0.5 ImDB: https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz
.. _OpenImages: https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz
.. _COCO: https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz
.. _VQA 2.0 ImDB: https://dl.fbaipublicfiles.com/pythia/data/imdb/vqa.tar.gz
.. _VizWiz: https://dl.fbaipublicfiles.com/pythia/features/vizwiz.tar.gz
.. _VizWiz ImDB: https://dl.fbaipublicfiles.com/pythia/data/imdb/vizwiz.tar.gz
.. _COCO Caption: https://dl.fbaipublicfiles.com/pythia/data/imdb/coco_captions.tar.gz

After downloading the features, verify the download by checking the md5sum using

echo "<checksum>  <dataset_name>.tar.gz" | md5sum -c -

Demo

Pythia VQA