#自然语言处理#ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
Generic framework for historical document processing
#自然语言处理#⚡ Cloud-native, AI-powered, document processing pipelines on AWS.
A full-featured Document Management Platform / Document Layer for your application, providing storage, discovery, processing, and retrieval. Deploys directly into your Amazon Web Services Cloud. Pleas...
A Python framework for multi-modal document understanding with Amazon Bedrock
#大语言模型#Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.
An include filter for Pandoc
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced info...
Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates...
Unofficial mirror of git://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)
#自然语言处理#A comprehensive list of annotated training datasets classified by use case.
Semantic extraction from conference proceedings.
#自然语言处理#An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding.
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.
tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.
#自然语言处理#A module for creating stopword lists for any language, based on a set of documents.
#大语言模型#DocGenius AI - Generative AI Chatbot for your Documents
#大语言模型#FileFusion is a powerful file concatenation tool designed specifically for Large Language Model (LLM)
Text line detection for Urdu OCR (UTRNet)
A Python command-line utility intended for automating some copyediting tasks in documents. It allows editing zipped, XML-based files (e.g. docx, odt, or epub), through XSLT stylesheets. Can be rather ...