#大语言模型#The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
#自然语言处理#awesome grounding: A curated list of research papers in visual grounding
#大语言模型#[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
#自然语言处理#CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
#自然语言处理#CLIPort: What and Where Pathways for Robotic Manipulation
翻译 - CLIPort:机器人操作的路径和路径
Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"
#大语言模型#PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
We perform functional grounding of LLMs' knowledge in BabyAI-Text
[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.
[TPAMI reviewing] Towards Visual Grounding: A Survey
#大语言模型#Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)
#自然语言处理#Hierarchical Universal Language Conditioned Policies
#自然语言处理#[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)
#自然语言处理#[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
#自然语言处理#[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data
[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.
#计算机科学#This is the official implementation for our paper;"LAR:Look Around and Refer".
Code for CVPR'18 "Grounding Referring Expressions in Images by Variational Context"