minicpm-v · GitHub Topics

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19.2 k

1 个月前

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...

aigc stable-diffusion clip image-to-text text-to-image controlnet multimodal text-to-video dit llava sora qwen2-vl minicpm-v

Python 614

3 天前

RLHF-V / RLAIF-V

[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

聊天机器人 gpt-4v multimodal llava minicpm-v

Python 346

1 个月前

NetEase-Media / grps_trtllm

#大语言模型#Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, d...