multi-modality · GitHub Topics

#大语言模型#LLaVA是一个具有 GPT-4V 级别功能的大语言和视觉模型助手

gpt-4 聊天机器人 ChatGPT llama multimodal llava foundation-models instruction-tuning multi-modality visual-language-learning llama-2 llama2 vision-language-model

Python 22.17 k

8 个月前

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

instruction-tuning instruction-following large-vision-language-model visual-instruction-tuning multi-modality in-context-learning large-language-models large-vision-language-models multimodal-chain-of-thought multimodal-in-context-learning multimodal-large-language-models chain-of-thought

14.67 k

2 天前

jina-ai / clip-as-service

#计算机科学#🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

翻译 - 使用BERT模型将可变长度句子映射到固定长度向量

bert sentence-encoding 深度学习 clip-model clip-as-service bert-as-service cross-modal-retrieval multi-modality neural-search openai PyTorch onnx cross-modality

Python 12.63 k

1 年前

kyegomez / swarms

#大语言模型#The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai

人工智能 attention-mechanism gpt4 langchain 机器学习 multi-modal-imaging multi-modality multimodal swarms transformer-models agents prompt-engineering prompt-toolkit prompting tree-of-thoughts ChatGPT gpt4all huggingface langchain-python

Python 4.79 k

6 天前

lucidrains / deep-daze

#计算机科学#Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

翻译 - 使用Openai的剪辑和警报器（隐式神经表示网络）的简单命令行工具

人工智能深度学习 transformers siren implicit-neural-representation text-to-image multi-modality

Python 4.37 k

3 年前

EvolvingLMMs-Lab / Otter

#大语言模型#🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

gpt-4 visual-language-learning artificial-inteligence 深度学习 foundation-models multi-modality 机器学习 ChatGPT instruction-tuning large-scale-models embodied-ai

Python 3.25 k

1 年前

InternLM / InternLM-XComposer

#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

ChatGPT visual-language-learning multi-modality foundation gpt-4 instruction-tuning mllm multimodal vision-language-model language-model large-language-model large-vision-language-model 大语言模型 vision-transformer gpt

Python 2.81 k

3 个月前

DLR-RM / 3DObjectTracking

Algorithms and Publications on 3D Object Tracking

pose-estimation 机器视觉 Bukkit cvpr2022 real-time object-tracking multi-modality rgbd tracking

C++ 862

1 年前

OpenBMB / VisRAG

Parsing-free RAG supported by VLMs

rag retrieval retrieval-augmented-generation vision-language-model multi-modal multi-modality document-retrieval document-understanding

Python 662

2 个月前

OpenGVLab / Multi-Modality-Arena

#大语言模型#Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...

chat 聊天机器人 ChatGPT gradio large-language-models llms vqa multi-modality vision-language-model

Python 512

1 年前

kyegomez / Gemini

#计算机科学#The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google

人工智能 gemini gpt4 机器学习 multi-modality multimodla

Python 447

1 个月前