multimodality · GitHub Topics

#计算机科学#A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

翻译 - 一个简单的命令行工具，用于图片生成的文本，使用Openai的剪辑和Biggan

人工智能深度学习 text-to-image Generative Adversarial Network multimodality

Python 2.57 k

3 年前

BAAI-Agents / Cradle

#大语言模型#The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...

ai-agent ai-agents-framework computer-control cradle gcc generative-ai grounding large-language-models 大语言模型 lmm multimodality vision-language-model vlm 人工智能

Python 2.07 k

5 个月前

hymie122 / RAG-Survey

#大语言模型#Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

aigc rag survey diffusion-models 大语言模型 multimodality

1.59 k

8 个月前

PreferredAI / cornac

A Comparative Framework for Multimodal Recommender Systems

recommender-system recommendation-algorithms recommendation-engine matrix-factorization collaborative-filtering multimodal-learning recommendation-system multimodality

Python 944

1 个月前

ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

multimodal-learning multimodality multimodal search ranking retrieval-model retrieval activitynet clip

Python 933

1 年前

AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

聊天机器人 llama3 multimodal multimodal-large-language-models multimodality qwen vision-language-model

Python 885

19 天前

fnzhan / Generative-AI

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

aigc diffusion-model gans multimodality

TeX 754

1 年前

aimclub / FEDOT

#计算机科学#Automated modeling and machine learning framework FEDOT

automl 机器学习 evolutionary-algorithms automated-machine-learning hyperparameter-optimization parameter-tuning 自动化 multimodality

Python 665

3 天前

VITA-MLLM / Woodpecker

#大语言模型#✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

hallucination hallucinations large-language-models 大语言模型 mllm multimodal-large-language-models multimodality

Python 635

4 个月前

jshilong / GPT4RoI

#大语言模型#GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

gpt 大语言模型 multimodality roi 机器视觉

Python 525

10 个月前

microsoft / LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

clip multimodality

Python 505

19 天前

zengyan-97 / X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

multimodality vision-and-language

Python 477

2 年前

afiaka87 / clip-guided-diffusion

#计算机科学#A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

multimodal image-generation text-to-image-synthesis text-to-image openai 深度学习人工智能 diffusion multimodality

Python 462

3 年前

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

#大语言模型#🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

aigc large-language-models large-vision-language-models multimodal-generation multimodal-large-language-models multimodal-models multimodality text-to-3d text-to-audio text-to-image text-to-speech text-to-video 大语言模型 mllm

HTML 453

9 天前

MMMU-Benchmark / MMMU

#自然语言处理#This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

机器视觉深度学习深度神经网络 evaluation foundation-models large-language-models large-multimodal-models 大语言模型 llms 机器学习 multimodal multimodal-deep-learning multimodal-learning multimodality 自然语言处理 question-answering STEM visual-question-answering

Python 412

1 个月前

HazyResearch / fonduer

#计算机科学#A knowledge base construction engine for richly formatted data

multimodality 机器学习

Python 409

4 年前

lium-lst / nmtpytorch

#计算机科学#Sequence-to-Sequence Framework in PyTorch

深度学习 PyTorch seq2seq nmt neural-machine-translation asr speech-recognition multimodality cnn

Jupyter Notebook 391

2 年前

kyegomez / Med-PaLM

#计算机科学#Towards Generalist Biomedical AI

biomedical 深度学习 gpt4 multimodal multimodal-deep-learning multimodality Open Source

Python 373

1 年前

kyegomez / CM3Leon

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

attention attention-is-all-you-need dalle multimodal multimodal-learning multimodality

Python 360

1 年前

OmicsML / dance

#计算机科学#DANCE: a deep learning library and benchmark platform for single-cell analysis

Bioinformatics 数据科学深度学习 graph-neural-networks 机器学习 multimodality Python benchmark computational-biology

Python 356

19 天前