Using GPT to parse PDF
Python PDF Parser (Not actively maintained). Check out pdfminer.six.
翻译 - Pythion PTY解析器
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
翻译 - 查找 PDF 以获取有关每个字符、矩形、线条等的详细信息 - 并轻松提取文本和表格。
PDF解析(文字,章节,表格,图片,参考),基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答,摘要,信息抽取
High performance library for creating, modiyfing and parsing PDF files in C++
The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
翻译 - PDF :: Reader库实现了一个PDF解析器,该解析器尽可能符合Adobe的PDF规范。
Parsing pdf tables using YOLOV3
Parsing resumes in a PDF format from linkedIn
Node.js module for high performance creation, modification and parsing of PDF files and streams
🚜 Parse text and tables from PDF files.
SaralGyaan PDF Parser- A command line parsing tool for PDFs
Swift PDFParser for PDF parsing and text mining. Includes a TrueType font parser
Node.js body parsing middleware
A Ruby-based parsing DSL based on parsing expression grammars.
翻译 - 基于解析表达式语法的基于Ruby的解析DSL。
Pythonic HTML Parsing for Humans™
翻译 - 适用于人类的Pythonic HTML解析™
📕 parsing techniques 中文译本——《解析技术》
A pure-Python module that implements an LR(1) parser generator, as well as CFSM and GLR parser drivers.
Parsing HTML at the command line
翻译 - 在命令行解析HTML
PDF阅读器,使用HTML5构建
ECMAScript parsing infrastructure for multipurpose analysis
翻译 - ECMAScript解析基础架构,可进行多用途分析
A library for turning nebulous data into well-structured data, with a focus on composition, performance, generality, and ergonomics.
A list of generic tools for parsing binary data structures, such as file formats, network protocols or bitstreams