#大语言模型#Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
#自然语言处理#Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
a state-of-the-art-level open visual language model | 多模态预训练模型
The simplest, fastest repository for training/finetuning small-sized VLMs.
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
#大语言模型#Solve Visual Understanding with Reinforced VLMs
#计算机科学#Collection of AWESOME vision-language models for vision tasks
Inference Microsoft Florence2 VLM
Witness the aha moment of VLM with less than $3.
#前端开发#Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured train...
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
#大语言模型#MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
#大语言模型#🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
#大语言模型#Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
#数据仓库#🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applic...
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
#大语言模型#VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 Vision-Language Model. Includes a Gradio-based interface for que...
LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces,...