clip · GitHub Topics

mikel-brostrom / boxmot

#计算机科学#BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models

翻译 - 使用 YOLO v5 和深度排序的实时多目标跟踪器

Python 7.18 k

17 小时前

CVHub520 / X-AnyLabeling

#大语言模型#Effortless data labeling with AI support from Segment Anything and other awesome models.

labeling-tool paddle PyTorch resnet sam yolo 深度学习 onnx clip 大语言模型 annotation-tool classification depth-estimation grounding-dino image-segmentation matting object-detection pose-estimation vlm

Python 5.24 k

15 小时前

OFA-Sys / Chinese-CLIP

#自然语言处理#本项目为CLIP模型的中文版本，使用大规模中文数据进行训练（~2亿图文对），旨在帮助用户快速实现中文领域的图文特征&相似度计算、跨模态检索、零样本图片分类等任务

中文机器视觉 multi-modal-learning 自然语言处理 PyTorch vision-and-language-pre-training image-text-retrieval clip pretrained-models vision-language 深度学习 multi-modal contrastive-loss transformers coreml-models

Python 5.07 k

8 个月前

marqo-ai / marqo

#搜索#Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

深度学习 information-retrieval 机器学习 vector-search tensor-search clip multi-modal 搜索引擎 transformers vision-language semantic-search visual-search 自然语言处理 hnsw knn Hacktoberfest ChatGPT gpt large-language-models

Python 4.83 k

13 小时前

easychen / pushdeer

开放源码的无App推送服务，iOS14+扫码即用。亦支持快应用/iOS和Mac客户端、Android客户端、自制设备

App push clip notification-service

C 4.78 k

1 个月前

open-mmlab / mmpretrain

#计算机科学#OpenMMLab Pre-training Toolbox and Benchmark

翻译 - OpenMMLab图像分类工具箱和基准

image-classification resnet mobilenet PyTorch 深度学习 swin-transformer beit clip constrastive-learning convnext masked-image-modeling moco pretrained-models self-supervised-learning vision-transformer multimodal

Python 3.62 k

5 个月前

yuanzhoulvpi2017 / zero_nlp

#自然语言处理#中文nlp解决方案(大模型、数据、模型、训练、推理)

bert 自然语言处理 transformers gpt2 chatglm-6b clip gpt PyTorch text-generation huggingface-transformers llama2 llama llava

Jupyter Notebook 3.37 k

2 个月前

pharmapsychotic / clip-interrogator

Image to prompt with BLIP and CLIP

clip PyTorch

Python 2.8 k

1 年前

jingyi0000 / VLM_survey

#计算机科学#Collection of AWESOME vision-language models for vision tasks

机器视觉深度学习 knowledge-distillation survey transfer-learning vision-language-model clip

2.66 k

20 天前

rom1504 / clip-retrieval

#计算机科学#Easily compute clip embeddings and build a clip retrieval system with them

semantic-search 深度学习 multimodal 人工智能 clip knn

Jupyter Notebook 2.54 k

1 年前

open-compass / VLMEvalKit

#大语言模型#Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

gpt-4v large-language-models llava multi-modal openai vqa 大语言模型 openai-api qwen gpt 机器视觉 PyTorch gpt4 ChatGPT clip vit evaluation claude gemini

Python 2.19 k

17 小时前

RuffianZhong / RWidgetHelper

Android UI 快速开发，专治原生控件各种不服

state selector circle textview imageview gradient shape ripper shadow clip

Java 1.93 k

1 年前

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

聊天机器人 clip 机器视觉 dino instruction-tuning large-language-models llms mllm multimodal-large-language-models representation-learning

Python 1.89 k

5 个月前

roboflow / awesome-openai-vision-api-experiments

#大语言模型#Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

ChatGPT 机器视觉 openai classification clip zero-shot grounding-dino open-vocabulary-detection open-vocabulary-segmentation segment-anything

Python 1.68 k

3 个月前

QIN2DIM / hcaptcha-challenger

#大语言模型#🥂 Gracefully face hCaptcha challenge with multimodal large language model.

hcaptcha hcaptcha-solver yolo Playwright clip agent gemini 大语言模型 ai-agents ChatGPT openai captcha-solver captcha captcha-solving

Python 1.6 k

14 小时前

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for ...

聊天机器人 clip gpt-4 llama llava vicuna vision-language vision-language-pretraining

Python 1.34 k

14 天前

yzhuoning / Awesome-CLIP

Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).

clip contrastive-learning pre-training

1.19 k

10 个月前

unum-cloud / uform

#向量搜索引擎#Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️