pdf-parsing · GitHub Topics

py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

翻译 - 使用Python读写PDF的实用程序

pypdf2 pdf Python pdf-parser pdf-parsing pdf-manipulation pdf-documents help-wanted

Python 8.93 k

4 天前

jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

翻译 - 查找 PDF 以获取有关每个字符、矩形、线条等的详细信息 - 并轻松提取文本和表格。

pdf pdf-parsing table-extraction

Python 7.55 k

16 天前

galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams

pdf-generation pdf-parsing Node.js pdf-manipulation

C 1.16 k

2 个月前

adithya-s-k / marker-api

Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.

FastAPI pdf-converter pdf-files pdf-parser pdf-parsing API REST API

Python 833

6 个月前

drmingler / docling-api

Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is...

API FastAPI markdown-parser pdf-conversion pdf-converter pdf-parser pdf-parsing pdf-to-markdown

Python 499

1 个月前

jstockwin / py-pdf-parser

A Python tool to help extracting information from structured PDFs.

pdf Parsing pdf-parsing

Python 402

12 天前

chunyenHuang / hummusRecipe

A powerful PDF tool for NodeJS based on HummusJS.

pdf pdf-files pdf-generation pdf-parsing pdf-manipulation Node.js

JavaScript 346

2 年前

thoqbk / traprange

(Java)A Method to Extract Tabular Content from PDF Files

Java pdf pdfbox Parser pdf-parsing pdf-manipulation pdf-files

HTML 332

2 年前

ck-unifr / pdf_parsing

#大语言模型#PDF解析（文字，章节，表格，图片，参考），基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答，摘要，信息抽取

langchain 大语言模型 pdf pdf-parsing rwkv Python chatglm2-6b information-extraction chatpdf Streamlit

Python 192

1 年前

ScientaNL / pdf-extractor

Node.js module for rendering pdf pages to images, svgs, html files, text files and json metadata

pdf-parsing Node.js image-generation

JavaScript 97

2 年前

rostrovsky / pdf-table

Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV

OpenCV opencv3 pdfbox tables table Java java-library pdf-parsing

Java 72

2 年前

iamarunbrahma / pdf-to-markdown

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced info...

document-processing information-retrieval pdf-parsing pdf-to-markdown Python rag retrieval-augmented-generation text-extraction pdf-converter

Python 69

5 个月前

hellpanderrr / linkedin-pdf-parsing

Parsing resumes in a PDF format from linkedIn

linkedin Python pdf-parsing resume-parser

Python 68

9 年前

tuffstuff9 / nextjs-pdf-parser

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

content-extraction filepond Next pdf-parser pdf-parsing

TypeScript 59

1 年前

dipietrantonio / pdf4py

A PDF parser written in Python 3 with no external dependencies.

pdf Parser pdf-parsing Python information-extraction

Python 57

5 年前

DQ-Zhang / refchaser

Written in python, for checking reference lists in systematic reviews and literature reviews, helps with reference list searching both backward&forward by extracting references and creating search que...

research-paper text-mining pdf-parsing

Python 23

5 年前

adrienjoly / npm-pdfreader-example

Example of use of pdfreader: parse a PDF résumé

pdf-parsing Example

JavaScript 16

3 年前

malice-plugins / pdf

Malice PDF Plugin

malice Malware pdf 插件 pdf-parsing Docker malware-analysis

Python 16

6 年前

abdullahshafiq-20 / ResumeConvertorLatex

ResumeTex is an AI-powered tool that converts standard PDF resumes into professionally formatted LaTeX documents. This service helps you create elegant, structured resumes without needing to learn LaT...

自动化 developer-tools document-processing Express LaTeX Node.js Open Source pdf-parsing React resume Tailwind CSS TeX

JavaScript 14

1 天前

meldonization / depdf

An ultimate pdf file disintegration tool

pdf pdf-parsing table-extraction pdftk

Python 11

5 年前