”gpt-4v“ 的搜索结果

LLaVA

@haotian-liu

#大语言模型#LLaVA是一个具有 GPT-4V 级别功能的大语言和视觉模型助手

gpt-4 聊天机器人 ChatGPT llama multimodal

Python22.99 k

1 年前

Google Bing GitHub

vimGPT

@ishan0102

Browse the web with GPT-4V and Vimium

Python2.67 k

9 个月前

GPT-4V-Act

@ddupont808

AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

JavaScript1.05 k

7 个月前

SoM

Microsoft@microsoft

[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs

Python1.42 k

1 年前

gpt4v-emotion

@zeroQiaoba

GPT-4V with Emotion

Python93

2 年前

feishu-openai

@ConnectAI-E

#大语言模型#🎒 飞书 ×（GPT-4 + GPT-4V + DALL·E-3 + Whisper）= 飞一般的工作体验 🚀 语音对话、角色扮演、多话题讨论、图片创作、表格分析、文档导出 🚀

ChatGPT feishu-bot Go openai ChatGPT API

Go5.58 k

4 个月前

Medical-Help-App-using-GPT-4V

@AIAnytime

Medical Help App using GPT-4V

Python25

2 年前

RLAIF-V

@RLHF-V

[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

聊天机器人 gpt-4v multimodal

Python385

2 个月前

GPT-4V_OCR

@SCUT-DLVCLab

Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)

Python125

2 年前

GPT4V-AD-Exploration

@PJLab-ADG

On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent

296

1 年前

MM-Navigator

@zzxslp

GPT-4V in Wonderland: LMMs as Smartphone Agents

gpt4v llm-agents

Python133

1 年前

Awesome-Multimodal-Prompts

@langgptai

#Awesome# Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.

ChatGPT gpt4 multimodal prompt-engineering prompts

256

2 年前

Q-Bench

@Q-Future

①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.

image-quality-assessment large-language-models low-level-vision

Jupyter Notebook269

1 年前

OmniParser

Microsoft@microsoft

OmniParser 是一个屏幕解析工具，将用户屏幕截图解析为结构化的，易于理解的元素。以显著增强 GPT-4V 识别能力

Jupyter Notebook22.58 k

3 个月前

SeeAct

@OSU-NLP-Group

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).

agent

Python759

5 个月前

HallusionBench

@tianyi-lab

#大语言模型#[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmark vlms gpt-4 gpt-4v llava

Python287

8 个月前