Browse the web with GPT-4V and Vimium
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
Set-of-Mark Prompting for GPT-4V and LMMs
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
GPT-4V with Emotion
#大语言模型#🎒 飞书 ×(GPT-4 + GPT-4V + DALL·E-3 + Whisper)= 飞一般的工作体验 🚀 语音对话、角色扮演、多话题讨论、图片创作、表格分析、文档导出 🚀
Medical Help App using GPT-4V
Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
#大语言模型#Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
#大语言模型#OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go
Platform to experiment with the AI Software Engineer. Terminal based. NOTE: Very different from https://gptengineer.app
Multimodal-GPT
#大语言模型#OpenAI ChatGPT/GPT-4/GPT-3 SDK Go Client to Interact with the GPT-4/GPT-3 APIs.
Javascript BPE Encoder Decoder for GPT-2 / GPT-3
Plugins for Auto-GPT
GPT RStudio addins that enable GPT assisted coding, writing & analysis
#大语言模型#Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model