hocr · GitHub Topics

Read and extract text and other content from PDFs in C# (port of PDFBox)

翻译 - 在C＃（PdfBox的端口）中读取和提取PDF中的文本和其他内容

pdfbox pdf pdf-document C#netstandard pdf-extractor pdf-document-processor pdf-files alto-xml hocr layout-analysis document-analysis page-xml pdf-generation

C# 1.96 k

1 天前

manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.

Qt OCR pdf-document C++tesseract-ocr GTK hocr scanner

C++ 1.73 k

4 天前

mittagessen / kraken

OCR engine for all the languages

OCR neural-networks alto-xml hocr handwritten-text-recognition layout-analysis optical-character-recognition page-xml

Python 809

2 天前

BobLd / DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

document-layout-analysis layout-analysis table-extraction pdf C#hocr page-xml alto-xml

C# 607

2 年前

UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

OCR hocr page-xml validation transformation

JavaScript 188

2 个月前

cneud / ocr-conversion

Conversions between various OCR formats

alto-xml hocr page-xml OCR

2 年前

filak / hOCR-to-ALTO

Convert between Tesseract hOCR and ALTO XML using XSL stylesheets

hocr

XSLT 55

9 个月前

dbmdz / mirador-textoverlay

Text Overlay plugin for Mirador 3

OCR optical-character-recognition hocr alto-xml

JavaScript 54

1 个月前

UB-Mannheim / ocr-gt-tools

Ergonomic line-by-line transcription of scanned text.

OCR hocr transcription ground-truth web-interface

JavaScript 51

4 年前

dmi3kno / hocr

Text-to-tibble

OCR tesseract tesseract-ocr R rstats hocr

R 36

5 年前

fakabbir / OCR

Probabilistic Key Value pair extraction using word weights from Invoices - Non Searchable PDF

OCR hocr tesseract Python

Python 18

4 年前

macabeus / pyslibtesseract

✏️ Integration of Tesseract for Python using a shared library

tesseract hocr OCR

Python 12

9 年前

GeReV / hocr-editor-ts

A visual hOCR file editor

OCR hocr tesseract-ocr

TypeScript 10

1 年前

iilei / hocr-to-json

OCR hocr

JavaScript 4

2 年前

GeReV / HocrEditor

A visual editor for .hocr files.

hocr tesseract-ocr OCR

C# 4

2 个月前

hadro / new-york-city-directories

Some basic data and text extraction from the New York City Directories

digital-humanities pdfs OCR hocr

8 年前

hadro / brewery-guides

The data for guides to breweries across the United States from 1896 to 1918

hocr data dataset digital-humanities Open Data

8 年前

jlieth / hocr-parser

Python parser for hOCR files using lxml

Python hocr OCR parsing-library

Python 3

5 年前

emmeryn / hocr-turtletext

A gem that parses positional text from hOCR output and provides convenience methods to find text.

hocr extract-text gem Rails

Ruby 3

2 年前

mayurcybercz / AI-Exam-evaluation

#自然语言处理#CLI-Tool to recognise handwritten text from answer sheets using Tesseract OCR. Using this extracted text to evaluate marks using NLP

tesseract-ocr hocr 自然语言处理命令行界面 JSON Python nltk

Jupyter Notebook 3

6 年前