pdf-to-markdown · GitHub Topics

Knowledge Agents and Management in the Cloud

document Parsing pdf pdf-document-processor pptx structured-data document-parser document-parsing docx-to-markdown pdf-to-excel pdf-to-json pdf-to-text ppt-to-json tables ppt-to-markdown pdf-to-markdown

Python 3.88 k

2 天前

wisupai / e2m

#大语言模型#E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M o...

大语言模型 Markdown pdf-to-markdown

Jupyter Notebook 1.06 k

7 个月前

drmingler / docling-api

Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is...

API FastAPI markdown-parser pdf-conversion pdf-converter pdf-parser pdf-parsing pdf-to-markdown

Python 499

1 个月前

iamarunbrahma / vision-parse

Parse PDFs into markdown using Vision LLMs

document-parser pdf-parser pdf-to-markdown text-extraction

Python 339

2 个月前

shoryasethia / markdrop

#大语言模型#A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functio...

Open Source pypi-package image-to-text 大语言模型 pdf-to-markdown pdf-to-text table-to-text agents

Python 89

17 天前

iamarunbrahma / pdf-to-markdown

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced info...

document-processing information-retrieval pdf-parsing pdf-to-markdown Python rag retrieval-augmented-generation text-extraction pdf-converter

Python 69

5 个月前

drmingler / smart-llm-loader

smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From R...

聊天机器人 chunking claude gemini langchain llama-index Markdown openai pdf-converter pdf-parser pdf-to-markdown rag

Python 63

2 个月前

muchdogesec / file2txt

Turn a supported list of filetypes (e.g. .docx) into a markdown structured text file. Also optionally defangs indicators and extract texts from images. Built for threat intel use-cases.

html-to-markdown image-to-text Markdown OCR pdf-to-markdown

Python 12

4 个月前

iw4p / url-to-markdown

#大语言模型#URL to Markdown API is a service that convert web content into clean, structured Markdown format through a simple HTTP GET request. It's built using FastAPI and the MarkItDown library, offering a stra...

html-to-markdown 大语言模型 Markdown pdf-to-markdown vector

Python 9

2 个月前

hparreao / doclingconverter

Quick way to convert files (PDF, DOCX, HTML, PPTX, Images) to (MD, JSON, YAML) using Docling and Streamlit

markdown-converter pdf-converter pdf-to-json pdf-to-markdown Streamlit

Python 8

5 个月前

iamarunbrahma / rag-ingest

RAG-Ingest: A tool for converting PDFs to markdown and indexing them for enhanced Retrieval Augmented Generation (RAG) capabilities.

aws-s3 hybrid-search information-retrieval llamaindex ollama pdf-to-markdown qdrant retrieval-augmented-generation

Python 3

5 个月前

MansurPro / DocuParse

DocuParse is a high-performance tool for converting PDF documents into clean, structured Markdown files. Designed for speed and accuracy, it extracts and formats content while minimizing errors like h...

document-layout-analysis google-colab huggingface-transformers pdf-parsing pdf-to-markdown tesseract-ocr text-extraction

4 个月前

olegiv / pdf_2_md

自动化命令行界面 Markdown pdf pdf-to-markdown Python summarization toc

Python 1

3 天前

Jarus77 / markdrop

#大语言模型# A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functi...

agents image-to-text 大语言模型 Markdown Open Source pdf-to-markdown pypi-package table-to-text

Python 1

8 天前

aidayang / Marker-OneClick

PDF转Markdown软件Marker免安装一键启动整合包

pdf-to-json pdf-to-markdown Python

22 天前

laurentvv / pdf2md-ai

#大语言模型#A powerful Python tool that extracts text and images from PDF documents and converts them to clean, well-formatted Markdown files

图像处理大语言模型 pdf-processing pdf-to-markdown Python

Python 0

23 天前

LatentSpaceIITB / markdrop

A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functio...

Markdown pdf-to-markdown

Python 0

8 天前