node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
翻译 - 用于从html,pdf,doc,docx,xls,xlsx,csv,pptx,png,jpg,gif,rtf等提取文本的node.js模块!
#计算机科学#🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
⚠️ ARCHIVED ⚠️ Search across and get full text for OA & closed journals
#自然语言处理#Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelin...
Multiple and Large PDF Documents Text Extraction.
Extract text from plaintext, .docx, .odt and .rtf files. Pure go.
C# and VB.NET samples for Docotic.Pdf library
R Interface to Apache Tika
Build search across multiple documents client-side in your file storage
#自然语言处理# simple rule based named entity recognition
A collection of tools for OCR (optical character recognition).
pdfRest API Toolkit is a REST API service for processing PDF documents, made by developers, for developers. Rapidly integrate PDF workflows with your existing projects and applications, simply and sea...
Repo which contains a small demo to Extract Text from image OCR using Google Vision API in Python
#自然语言处理#Text Processing & Segmentation Framework
view pdf on X11 and the Linux framebuffer; resize pdf; convert pdf to text, html, TeX, groff