#自然语言处理#🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Apache Lucene.NET
MTEB: Massive Text Embedding Benchmark
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Study guides for MIT's 15.003 Data Science Tools
翻译 - 麻省理工学院15.003数据科学工具的学习指南
#向量搜索引擎#Fast, Accurate, Lightweight Python library to make State of the Art Embedding
#自然语言处理#A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
#大语言模型#The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
#大语言模型#A realtime serving engine for Data-Intensive Generative AI Applications
#自然语言处理#Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
#大语言模型#Profile-Based Long-Term Memory for AI Applications
SGPT: GPT Sentence Embeddings for Semantic Search
#计算机科学#Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
#大语言模型#Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
#搜索#Epsilla is a high performance Vector Database Management System
#自然语言处理#Use late-interaction multi-modal models such as ColPali in just a few lines of code.
#自然语言处理#Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.
#Awesome#My personal note about local and global descriptor