Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
A video and audio player with replaceable UI component.
Set-of-Mark Prompting for GPT-4V and LMMs
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Aligning LMMs with Factually Augmented RLHF
Tutorial files to accompany Sorensen, Hohenstein, and Vasishth paper: http://www.ling.uni-potsdam.de/~vasishth/statistics/BayesLMMs.html
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, qwen-vl, qwen2-vl, phi3-v etc.
A faster lmm for GWAS. Supports GPU backend.
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
Workshop 7 - General and generalized linear mixed models (LMM and GLMM)