Read and extract text and other content from PDFs in C# (port of PDFBox)
翻译 - 在C#(PdfBox的端口)中读取和提取PDF中的文本和其他内容
A Gtk/Qt front-end to tesseract-ocr.
OCR engine for all the languages
Document Layout Analysis resources repos for development with PdfPig.
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
Ergonomic line-by-line transcription of scanned text.
Text Overlay plugin for Mirador 3
Probabilistic Key Value pair extraction using word weights from Invoices - Non Searchable PDF
✏️ Integration of Tesseract for Python using a shared library
Some basic data and text extraction from the New York City Directories
The data for guides to breweries across the United States from 1896 to 1918
A gem that parses positional text from hOCR output and provides convenience methods to find text.
#自然语言处理#CLI-Tool to recognise handwritten text from answer sheets using Tesseract OCR. Using this extracted text to evaluate marks using NLP