information-retrieval · GitHub Topics

#计算机科学#Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

OCR 深度学习 crnn PyTorch lstm 机器学习 scene-text scene-text-recognition optical-character-recognition cnn data-mining 图像处理 Python easyocr information-retrieval

Python 27.18 k

9 个月前

deepset-ai / haystack

#自然语言处理#Haystack 是一个开源 NLP 框架，利用预训练的 Transformer 模型。帮组开发者能快速实现一个生产级的语义搜索、问答、摘要和文档排名的NLP应用

Python 21.4 k

1 天前

piskvorky / gensim

#自然语言处理#Topic Modelling for Humans

gensim topic-modeling information-retrieval 机器学习自然语言处理数据科学 Python data-mining word2vec word-embeddings 神经网络 fasttext

Python 16.08 k

25 天前

arc53 / DocsGPT

#自然语言处理#DocsGPT 是一个用于“文档”的基于GPT聊天助手，能快速检索项目文档，帮助开发人员轻松地提出与项目相关的问题，并获得准确的答案

人工智能 Python 自然语言处理 React Web app ChatGPT docsgpt information-retrieval language-model 大语言模型机器学习 PyTorch rag semantic-search transformers Hacktoberfest

TypeScript 15.86 k

1 天前

weaviate / weaviate

#搜索#Weaviate 是一个开源矢量数据库，它同时存储对象和矢量，允许将矢量搜索与结构化过滤与云原生数据库的容错和可扩展性相结合，所有这些都可以通过 GraphQL、REST 和各种语言客户端访问。

Go 13.82 k

1 天前

onyx-dot-app / onyx

#大语言模型#Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

enterprise-search rag ai-chat ChatGPT gen-ai Next Python information-retrieval

Python 13.11 k

8 小时前

Unstructured-IO / unstructured

#自然语言处理#Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...

深度学习 document-parsing 机器学习自然语言处理 OCR information-retrieval data-pipelines preprocessing pdf-to-text pdf pdf-to-json document-image-analysis donut document-image-processing document-parser docx langchain 大语言模型

HTML 11.81 k

2 天前

neuml / txtai

#搜索#All-in-one 一站式 embedding 数据库，语义搜索、LLM 编排和语言模型workflows

Python search 机器学习自然语言处理 semantic-search vector-search txtai 大语言模型 vector-database language-model transformers sentence-embeddings large-language-models information-retrieval 搜索引擎 embeddings retrieval-augmented-generation rag 人工智能

Python 11.16 k

12 小时前

FlagOpen / FlagEmbedding

#大语言模型#Retrieval and Retrieval-augmented LLMs

embeddings information-retrieval 大语言模型 sentence-embeddings text-semantic-similarity retrieval-augmented-generation

Python 10.07 k

1 个月前

marqo-ai / marqo

#搜索#Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

深度学习 information-retrieval 机器学习 vector-search tensor-search clip multi-modal 搜索引擎 transformers vision-language semantic-search visual-search 自然语言处理 hnsw knn Hacktoberfest ChatGPT gpt large-language-models

Python 4.9 k

2 天前

apache / lucene-solr

#搜索#Apache Lucene 和 Solr 已迁移至各自独立的仓库

lucene solr search NoSQL Java 后端搜索引擎 information-retrieval

Java 4.37 k

9 个月前

KittyKatt / screenFetch

Fetches system/theme information in terminal for Linux desktop screenshots.

Shell Bash Desktop information-retrieval

Shell 3.98 k

7 个月前

langroid / langroid

#大语言模型#Harness LLMs with Multi-Agent Programming

agents ChatGPT gpt gpt-4 gpt4 language-model 大语言模型 llm-agent multi-agent-systems openai-api 人工智能 llm-framework llama local-llm function-calling information-retrieval rag retrieval-augmented-generation

Python 3.45 k

1 天前

SylphAI-Inc / AdalFlow

#自然语言处理#AdalFlow: The library to build & auto-optimize LLM applications.

Python 3.38 k

1 天前

apache / lucene

#搜索#Apache Lucene 是一个用Java开发的全文搜索引擎

lucene search NoSQL Java 后端搜索引擎 information-retrieval

Java 3.05 k

2 天前

tensorflow / ranking

#计算机科学#Learning to Rank in TensorFlow

ranking 机器学习深度学习 information-retrieval learning-to-rank recommender-systems

Python 2.78 k

1 年前

embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark

benchmark clustering information-retrieval sentence-transformers sts text-embedding retrieval neural-search semantic-search sbert text-classification reranking

Python 2.66 k

1 天前

ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, Swift & Go, leveraging NEON, AVX2, AVX-512, SVE, & SWAR to accelerate search, hashing, sort, edit distances, and memory ops 🦖