Browse the web with GPT-4V and Vimium
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs
GPT-4V with Emotion
#大语言模型#🎒 飞书 ×(GPT-4 + GPT-4V + DALL·E-3 + Whisper)= 飞一般的工作体验 🚀 语音对话、角色扮演、多话题讨论、图片创作、表格分析、文档导出 🚀
Medical Help App using GPT-4V
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent
#Awesome# Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.
OmniParser 是一个屏幕解析工具,将用户屏幕截图解析为结构化的,易于理解的元素。以显著增强 GPT-4V 识别能力
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
#大语言模型#[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
#大语言模型#OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go
#大语言模型#OpenAI ChatGPT/GPT-4/GPT-3 SDK Go Client to Interact with the GPT-4/GPT-3 APIs.
#大语言模型#Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
Plugins for Auto-GPT