Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.
#计算机科学#A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
Model Context Protocol (MCP) Server for Graphlit Platform
Readability2 converts HTML to plain text.
Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.
Pure ruby implementation of the Boilerpipe content extraction algorithm tuned for online articles
#计算机科学#Web content extraction using machine learning
#网络爬虫#DOM Based Content Extraction via Text Density
🔍 Model Context Protocol (MCP) tool for parsing websites using the Jina.ai Reader
Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...
#自然语言处理#Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more
Benson turns a list of URLs into mp3s of the contents of each web page - take control over your reading backlog!
#自然语言处理#This repository houses a Python application for extracting YouTube video transcripts and summarizing its content.
Seize is light Node or Browser web-page content extractor inspired by arc90 readability and Safari Reader
Mobile First Indexing Tool
#网络爬虫#Via Text Density Simple Web Crawler With Go
This Python-based repository hosts a sophisticated service designed for scraping web articles and converting them into Markdown format. The core functionality of this service includes extracting the m...
This repository is implematation of 📄 DOM based content extraction via text density. Tested for Korean web pages.
#大语言模型#A web application that scrapes web pages, extracts main content, and uses OpenLLaMA to convert the content into specified formats.
DataDigger is a powerful and intuitive web application designed to extract and analyze data from web pages.