#网络爬虫#Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
#网络爬虫#To extract main article from given URL with Node.js
#网络爬虫#Readability / Html Content / Article Extractor & Web Scrapping library written in PHP
SmartReader is a library to extract the main content of a web page, based on a port of the Readability library by Mozilla
Parse markdown article, download images and replace images URL's with local paths
Reddit bot to preview and post hyperlinks as comments
#自然语言处理#NLP Web Service
#网络爬虫#The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.
#自然语言处理#Laravel wrapper for common NLP tasks
Extract article or news by url or html, parse the title and content, output in markdown format.
Involution King Fun Book (IKFB, Chinese: 快卷, 卷王快乐本) is an integrated management system for papers and literature. Powered by Electron.
This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.
This is a small and easy-to-use desktop application that allows exporting Web of Science API Expanded and InCites API data in Excel/CSV/JSON/XML with a configurable and flexible data export structure.
#网络爬虫# 【 Spring Boot 实战开发】10 分钟快速构建一个自己的技术文章博客
#网络爬虫#A web page content extractor
#自然语言处理#📚 Сборник полезных штук из Natural Language Processing: Определение языка текста, Разделение текста на предложения, Получение основного содержимого из html документа
The program can be used to scrape the content from an article from web by an input of a set of URLs in a text file or a URL. This project uses newspaper3k and python-docx libraries. The output of this...