Read and extract text and other content from PDFs in C# (port of PDFBox)
翻译 - 在C#(PdfBox的端口)中读取和提取PDF中的文本和其他内容
OCR engine for all the languages
Document Layout Analysis resources repos for development with PdfPig.
ALTO XML schema - latest and all former versions
Text Overlay plugin for Mirador 3
Python tools for performing various operations on ALTO XML files
Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Library Suite.
#人脸识别#Image Retrieval in Digital Libraries - A Multicollection Experimentation of Machine Learning techniques
Data Mining Historical Newspaper Metadata (METS/ALTO formats)
Convert ALTO XML to plain text + minimal metadata
Command Line Interface (CLI) to export METS/ALTO documents to other formats.
Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames for data analysis
A pipeline to transfer ground truth from Transkribus to eScriptorium.
Helper functions and web app for METS/ALTO archive viewing.