#自然语言处理#Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
#计算机科学#A Unified Toolkit for Deep Learning Based Document Image Analysis
翻译 - 用于文档布局理解的Python库
#计算机科学#A comprehensive list of awesome document image rectification papers.
The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
The official repo for “DocScanner: Robust Document Image Rectification with Progressive Learning”.
The official code for “Geometric Representation Learning for Document Image Rectification”, ECCV, 2022.
文档图像处理工具(Document image processing tool),包括漂白 / 文字方向矫正 / 清晰增强 / 笔记去噪美化 / 去阴影 / 扭曲矫正 / 切边增强(DocBleach / TextOrientationCorrection / DocSharpening / HandwritingDenoisingBeautifying / DocShadowRemoval / ...
Android App for English Handwritten Text Recognition
#计算机科学#Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents
The ScriptNet / competitions site.
#数据仓库#Python wrapper to facilitate data manipulation for the SmartDoc 2015 - Challenge 1 Dataset.
复杂背景图像漂白,文字方向矫正,清晰增强,笔记去噪美化,去阴影,扭曲矫正,去黑点以及切边增强。complex background image bleaching, text direction correction, clarity enhancement, note to blur beautification, shadow removal, distortion correction, b...
A web app evaluating the quality the scanned document images
Sophia Trikoupi dataset (Collection of 46 handwritten, annotated pages)
This script automates the process of extracting text from various file formats (images, PDFs, DOCX) using Optical Character Recognition (OCR) powered by Azure Cognitive Services. The script supports i...