A Pytorch implementation of CVPR 2020 paper: Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Website for TextVQA dataset.
#计算机科学#A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]
✨✨Latest Research on Multimodal Large Language Models on Scene-Text VQA Tasks
mlci model for textvqa
STVQA and TextVQA OCR results from Amazon Text in Image pipeline