grounding · GitHub Topics

Agent S: an open agentic framework that uses computers like a human

agent-computer-interface ai-agents computer-automation gui-agents memory mllm planning retrieval-augmented-generation in-context-reinforcement-learning computer-use grounding

Python 2.2 k

1 天前

BAAI-Agents / Cradle

#大语言模型#The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...

ai-agent ai-agents-framework computer-control cradle gcc generative-ai grounding large-language-models 大语言模型 lmm multimodality vision-language-model vlm 人工智能

Python 2.07 k

5 个月前

TheShadow29 / awesome-grounding

#自然语言处理#awesome grounding: A curated list of research papers in visual grounding

机器视觉自然语言处理 grounding Awesome Lists papers arxiv video-understanding captioning-videos embodied-agent multimodal-deep-learning language-grounding Bukkit

1.07 k

2 年前

FoundationVision / Groma

#大语言模型#[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

grounding 大语言模型 mllm large-language-models foundation-models llama llama2 multimodal vision-language-model

Python 556

10 个月前

mees / calvin

#自然语言处理#CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

自然语言处理 Robotics 深度学习 grounding vision-language manipulation 机器视觉 PyTorch vision vision-and-language

Python 532

2 个月前

cliport / cliport

#自然语言处理#CLIPort: What and Where Pathways for Robotic Manipulation

翻译 - CLIPort：机器人操作的路径和路径

clip Robotics vision 深度学习自然语言处理 grounding vision-language manipulation PyTorch rearrangement 机器视觉

Jupyter Notebook 487

1 年前

allenai / lumos

Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"

decision-making grounding maths planning question-answering reasoning web-agent

Python 463

1 年前

mbzuai-oryx / Video-LLaVA

#大语言模型#PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

大语言模型 lmm Video grounding transcription

Python 257

1 年前

flowersteam / Grounding_LLMs_with_online_RL

We perform functional grounding of LLMs' knowledge in BabyAI-Text

grounding language-model reinforcement-learning

Python 254

8 个月前

linhuixiao / Awesome-Visual-Grounding

[TPAMI reviewing] Towards Visual Grounding: A Survey

grounding Awesome Lists survey

Shell 136

23 天前

linhuixiao / CLIP-VG

[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.

grounding clip

Jupyter Notebook 119

3 个月前

TIGER-AI-Lab / StructLM

#大语言模型#Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)

grounding 大语言模型 reasoning

Python 76

6 个月前

TheShadow29 / zsgnet-pytorch

#自然语言处理#Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural Language Queries (https://arxiv.org/abs/1908.07129)

grounding vision 自然语言处理 objects

Python 71

5 年前

lukashermann / hulc

#自然语言处理#Hierarchical Universal Language Conditioned Policies

机器视觉深度学习 grounding manipulation 自然语言处理 PyTorch Robotics vision vision-and-language vision-language

Python 71

1 年前

TheShadow29 / vognet-pytorch

#自然语言处理#[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)

grounding Video pytorch-implementation vision vision-and-language 自然语言处理 captioning-videos

Python 67

5 年前

TheShadow29 / VidSitu

#自然语言处理#[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

vision vision-and-language grounding 自然语言处理 Video srl captioning-videos captioning

Python 59

4 年前

zjukg / DUET

[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

pretrained-language-model PyTorch transformer zero-shot-learning cross-modal grounding semantic

Python 50

1 年前

linhuixiao / HiVG

[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.

clip grounding

Python 45

4 天前

mees / hulc2

#自然语言处理#[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data

机器视觉深度学习 grounding manipulation 自然语言处理 PyTorch Robotics vision vision-and-language vision-language

Python 42

1 年前

yuleiniu / vc

Code for CVPR'18 "Grounding Referring Expressions in Images by Variational Context"

cvpr2018 Tensorflow grounding

Python 30

7 年前