A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Read and extract text and other content from PDFs in C# (port of PDFBox)
翻译 - 在C#(PdfBox的端口)中读取和提取PDF中的文本和其他内容
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
#自然语言处理#A curated list of resources for Document Understanding (DU) topic
Open-source platform for extracting structured data from documents using AI.
#计算机科学#This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.
Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
#自然语言处理#Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
AssemblyLine 4: File triage and malware analysis
#自然语言处理#A package for parsing PDFs and analyzing their content using LLMs.
Pandora is an analysis framework to discover if a file is suspicious and conveniently show the results
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic...
#计算机科学#RObust document image BINarization
#计算机科学#Document Visual Question Answering
#自然语言处理#Powerful web application that combines Streamlit, LangChain, and Pinecone to simplify document analysis. Powered by OpenAI's GPT-3, RAG enables dynamic, interactive document conversations, making it i...
Post-process Amazon Textract results with Hugging Face transformer models for document understanding
YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis
(ICFHR 2020 oral) Code for "docExtractor: An off-the-shelf historical document element extraction" paper
#自然语言处理#Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.