Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.
Readability2 converts HTML to plain text.
Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.
Model Context Protocol (MCP) Server for Graphlit Platform
Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...
#自然语言处理#Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more
FileGazer - deep file analysing and categorisation
This Python-based repository hosts a sophisticated service designed for scraping web articles and converting them into Markdown format. The core functionality of this service includes extracting the m...
#计算机科学#Opinionated and Sophisticated Document Region Analyzer.
#自然语言处理#The metadata and text content extractor for almost every file type.