multimodal · GitHub Topics

#大语言模型#The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

rag lmstudio localai vector-database ollama local-llm llama3 大语言模型 webui ai-agents crewai multimodal agent-framework-javascript custom-ai-agents deepseek deepseek-r1 mcp mcp-servers

JavaScript 42.71 k

1 天前

haotian-liu / LLaVA

#大语言模型#LLaVA是一个具有 GPT-4V 级别功能的大语言和视觉模型助手

gpt-4 聊天机器人 ChatGPT llama multimodal llava foundation-models instruction-tuning multi-modality visual-language-learning llama-2 llama2 vision-language-model

Python 22.17 k

8 个月前

jina-ai / serve

#计算机科学#Jina 是一个基于深度学习的搜索框架，支持各种类型如图片，视频，长文本，PDF等。

neural-search cloud-native 深度学习机器学习框架 gRPC Kubernetes multimodal mlops pipeline FastAPI generative-ai Docker jaeger llmops OpenTelemetry cncf 微服务 orchestration prometheus

Python 21.51 k

19 天前

microsoft / unilm

#自然语言处理#Unilm是一个跨任务、语言和模式的大规模自监督预训练模型

自然语言处理 pre-trained-model unilm minilm layoutlm layoutxlm beit document-ai trocr beit-3 foundation-models 大语言模型 multimodal mllm

Python 21.05 k

1 个月前

deepseek-ai / Janus

#大语言模型#Janus-Series: Unified Multimodal Understanding and Generation Models

any-to-any foundation-models 大语言模型 multimodal vision-language-pretraining unified-model

Python 17.09 k

2 个月前

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

翻译 - NeMo：用于对话式AI的工具包

machine-translation speaker-recognition asr tts generative-ai multimodal 深度学习 neural-networks speaker-diariazation speech-translation speech-synthesis large-language-models

Python 13.62 k

4 小时前

mediar-ai / screenpipe

#大语言模型#全天候24小时 AI 屏幕和麦克风录制。构建具有完整上下文的 AI 应用。与 Ollama 配合使用。Rewind.ai 的替代品。开放。安全。您拥有自己的数据。Rust 开发。

人工智能机器视觉大语言模型机器学习 multimodal vision agents agi

TypeScript 13.26 k

1 天前

rerun-io / rerun

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

可视化机器视觉 Python Robotics Rust multimodal C++

Rust 8.18 k

21 小时前

bentoml / BentoML

#大语言模型#The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

翻译 - 轻松进行模型服务

model-serving mlops llmops generative-ai llm-inference 深度学习 llm-serving 机器学习 Python multimodal ml-engineering 大语言模型

Python 7.6 k

16 小时前

modelscope / ms-swift

#大语言模型#Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen2.5, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL2.5,...

大语言模型 lora llama sft deploy multimodal peft internvl liger qwen2-vl qwen2-5 distill rft deepseek-r1 embedding grpo open-r1 llama4

Python 6.9 k

9 小时前

enricoros / big-AGI

#大语言模型#AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlight...

ChatGPT generative-ai ui chatgpt-ui agi large-language-models stable-diffusion gpt gpt-4 openai openai-api anthropic beam gpt-5 multimodal groq mistral

TypeScript 6.31 k

1 天前

SkalskiP / courses

#自然语言处理#This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

机器视觉深度学习深度神经网络机器学习 mlops multimodal transformers 教程自然语言处理 generative-model stable-diffusion

Python 5.96 k

1 年前

swyxio / ai-notes

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under ...

人工智能 prompt-engineering stable-diffusion openai gpt gpt-3 multimodal

HTML 5.62 k

2 天前

TEN-framework / TEN-Agent

#大语言模型#TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...

agent gemini gpt-4 大语言模型 multimodal nextjs14 openai realtime voice-assistant C++Go Python 人工智能 gpt-4o rag vision real-time asr low-latency tts

Python 5.59 k

1 天前

facebookresearch / mmf

#计算机科学#A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

翻译 - 来自Facebook AI Research（FAIR）的视觉和语言多模式研究的模块化框架

PyTorch vqa pretrained-models multimodal 深度学习 captioning dialog textvqa hateful-memes multi-tasking

Python 5.56 k

5 天前

kyegomez / swarms

#大语言模型#The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai

人工智能 attention-mechanism gpt4 langchain 机器学习 multi-modal-imaging multi-modality multimodal swarms transformer-models agents prompt-engineering prompt-toolkit prompting tree-of-thoughts ChatGPT gpt4all huggingface langchain-python

Python 4.79 k

6 天前

om-ai-lab / VLM-R1

#大语言模型#Solve Visual Understanding with Reinforced VLMs

deepseek-r1 grpo 大语言模型 multimodal vlm qwen reinforcement-learning

Python 4.56 k

2 天前

kyegomez / tree-of-thoughts

#大语言模型#Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%

人工智能 ChatGPT gpt4 multimodal prompt-engineering 深度学习 prompt prompt-learning prompt-tuning

Python 4.48 k

5 个月前