gpt4v · GitHub Topics

#大语言模型#AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

agent ChatGPT generative-ai gpt4 gpt4v 大语言模型

Python 5.71 k

25 天前

X-PLUG / MobileAgent

#安卓#Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

agent gpt4v mllm mobile-agents multimodal multimodal-large-language-models multimodal-agent Android App GUI 移动自动化 copilot harmony iOS

Python 4.05 k

3 天前

AmberSahdev / Open-Interface

#大语言模型#Control Any Computer Using LLMs.

gpt 大语言模型机器学习 macOS openai Python 自动化 assistant assistant-computer-control gpt4 gpt4v gpt4vision Linux pyautogui pyinstaller self-driving self-driving-software Windows

Python 2.03 k

1 个月前

reworkd / tarsier

Vision utilities for web interaction agents 👀

OCR Playwright Selenium webscraping pypi-package gpt4v llms Python

Jupyter Notebook 1.64 k

5 个月前

ictnlp / LLaVA-Mini

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

efficient gpt4o gpt4v large-language-models large-multimodal-models llava multimodal Video vision vision-language-model visual-instruction-tuning llama multimodal-large-language-models

Python 443

3 个月前

bdekraker / WebcamGPT-Vision

#大语言模型#Lightweight GPT-4 Vision processing over the Webcam

ChatGPT 机器视觉 gpt-4 gpt4-api gpt4v openai

JavaScript 281

1 年前

langgptai / Awesome-Multimodal-Prompts

#Awesome# Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.

ChatGPT gpt4 multimodal prompt-engineering prompts gpt4v newbing Awesome Lists prompt-injection dall-e

249

1 年前

ShareGPT4Omni / ShareGPT4V

#大语言模型#[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

ChatGPT gpt gpt-4v gpt4v instruction-tuning language-model large-language-models large-multimodal-models large-vision-language-models vision-language-model eccv2024

Python 210

9 个月前

pAIrprogio / vscode-ui-sketcher

Draw your projects to life

gpt4v tldraw User interface design VS Code Extension

TypeScript 201

1 年前

soulteary / amazing-openai-api

Convert different model APIs into the OpenAI API format out of the box.

azure-openai azure-openai-api gemini-pro google-gemini openai openai-api gpt4v gpt4vision

Go 150

1 年前

zzxslp / MM-Navigator

GPT-4V in Wonderland: LMMs as Smartphone Agents

gpt4v llm-agents

Python 134

9 个月前

kyegomez / MambaByte

#计算机科学#Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

人工智能 gpt4v 机器学习 mamba multi-modality Parsing

Python 115

9 天前

cameronking4 / sketch2app

The ultimate sketch to code app made using GPT4o serving 25k+ users. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandb...

sketch2code wireframe gpt4v design2code code-generator gpt4 code-assistant Next openai

JavaScript 79

1 年前

BUAADreamer / Chinese-LLaVA-Med

中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

llava medical mllm multimodal 中文 qwen1-5 人工智能 gpt4v minigpt4 transformers

Python 77

1 年前

admineral / GPT4-Vision-React-Starter

Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description

gpt4 gpt4-api gpt4v openai openaiapi 人工智能 ChatGPT API gpt-4-vision-preview openai-api

TypeScript 75

1 年前

roboflow / gpt-checkup

Monitor the performance of OpenAI's GPT O3 Mini model over time.

机器视觉 gpt4v o1

HTML 34

2 天前

reidbarber / webmarker

Mark web pages for use with vision-language models

prompt prompt-engineering som vision-language-model claude gemini gpt4o gpt4v llms Playwright operator computer-use

TypeScript 33

13 天前

limeberri / gpt4v-video-voiceover

Video Voiceover with gpt-4o-mini

gpt4v openai Python Streamlit Jupyter Notebook

Jupyter Notebook 33

7 个月前

Azure-Samples / rag-as-a-service-with-vision

#大语言模型#This repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion...

azure-ai-search cosmosdb gpt-4o gpt4v gpt4vision 大语言模型 openai rag vision

Python 25

4 个月前

neka-nat / mylangrobot

#大语言模型#Language instructions to mycobot using GPT-4V

ChatGPT gpt4v segment-anything Whisper gpt-4-vision gpt-4-vision-preview

Python 23

1 年前