qwen2-vl · GitHub Topics

#大语言模型#Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen2.5, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL2.5,...

大语言模型 lora llama sft deploy multimodal peft internvl liger qwen2-vl qwen2-5 distill rft deepseek-r1 embedding grpo open-r1 llama4

Python 6.9 k

12 小时前

langmanus / langmanus

#大语言模型#A community-driven AI automation framework that builds upon the incredible work of the open source community. Our goal is to combine language models with specialized tools for tasks like web search, c...

agi 自动化 deep-research langchain langgraph 大语言模型 qwen qwen2-vl agent agents 人工智能 multi-agent multi-agent-systems deepseek deepseek-r1

Python 5.13 k

18 天前

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

captioning fine-tuning florence-2 multimodal objectdetection paligemma phi-3-vision transformers vision-and-language vqa qwen2-vl

Python 2.54 k

5 天前

2U1 / Qwen2-VL-Finetune

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

聊天机器人 multimodal qwen2-vl vision-language vision-language-model qwen2-5

Python 616

10 天前

PaddlePaddle / PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...

aigc stable-diffusion clip image-to-text text-to-image controlnet multimodal text-to-video dit llava sora qwen2-vl minicpm-v

Python 614

3 天前

NetEase-Media / grps_trtllm

#大语言模型#Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, d...

大语言模型 openai tensorrt-llm chatglm llama3 qwen2 function-call ai-agent llama-index multi-modal deepseek-r1 phi qwq qwen2-vl minicpm-v

Python 128

19 天前

lucasjinreal / Crane

A Pure Rust based LLM (Any LLM based MLLM such as Spark-TTS) Inference Engine, powering by Candle framework.

llama-cpp mllm qwen2-vl Rust

Rust 93

18 天前

drive-bench / toolkit

#大语言模型#Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

autonomous-driving ChatGPT internvl qwen2-vl

Python 64

2 个月前

arcstep / illufly

#大语言模型#✨🦋 illufly 是自我进化的 Agent 框架: 基于自我进化，快速创造价值

agent 人工智能 glm-4 gpt 大语言模型 multiagent openai qwen qwen2 qwen2-vl rag growth

Python 60

3 天前

soulteary / dify-with-qwen-vl

视频理解：千问视频多模态模型 & Dify

dify qwen2 qwen2-vl

Python 47

7 个月前

fireicewolf / wd-llm-caption-cli

A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.

qwen2-vl florence-2

Python 34

1 个月前

see2023 / autoXHS

#网络爬虫#基于多模态大模型的智能搜索助手，通过AI技术实现小红书平台的智能化信息检索和知识整合|An intelligent search assistant based on multimodal large models, enabling smart information retrieval and knowledge integration on the Xiaohongshu platform.

大语言模型 qwen2-vl Selenium xiaohongshu spider

Python 12

5 个月前

shaadclt / Qwen2-VL-OCR-VQA

This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabiliti...

optical-character-recognition qwen2-vl visual-question-answering

Jupyter Notebook 11

6 个月前