Web Scraping with GPT-4 Vision API and Puppeteer
GPT4V-level open-source multi-modal model based on Llama3-8B
#大语言模型#AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
AI agent that can SEE 👁️, control, navigate, & do stuff for you on your browser.
On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent
Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.