Read and extract text and other content from PDFs in C# (port of PDFBox)
翻译 - 在C#(PdfBox的端口)中读取和提取PDF中的文本和其他内容
A Gtk/Qt front-end to tesseract-ocr.
OCR engine for all the languages
Document Layout Analysis resources repos for development with PdfPig.
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
Text Overlay plugin for Mirador 3
Ergonomic line-by-line transcription of scanned text.
Probabilistic Key Value pair extraction using word weights from Invoices - Non Searchable PDF
✏️ Integration of Tesseract for Python using a shared library
Some basic data and text extraction from the New York City Directories
The data for guides to breweries across the United States from 1896 to 1918
A gem that parses positional text from hOCR output and provides convenience methods to find text.
#自然语言处理#CLI-Tool to recognise handwritten text from answer sheets using Tesseract OCR. Using this extracted text to evaluate marks using NLP