集合主题趋势排行榜

crawling

scrapy / scrapy

#爬虫框架#一款流行，高效，生态丰富的Python爬虫框架

Python scraping crawling 框架爬虫 Hacktoberfest web-scraping web-scraping-python

Python 58.67 k

3 天前

gocolly / colly

#爬虫框架#一个快速优雅的Golang爬虫框架

Go scraper 框架爬虫 scraping crawling spider

Go 24.73 k

4 天前

apify / crawlee

#网络爬虫#Crawlee - 一个用于Node.js 开发的网页爬虫和浏览器自动化库

web-scraping web-crawling npm headless-chrome Puppeteer 自动化 apify scraping crawling 爬虫 headless scraper web-crawler JavaScript Node.js Playwright TypeScript

TypeScript 19.96 k

2 天前

codelucas / newspaper

#网络爬虫#一个Python数据采集框架，能自动提取新闻、文章的标题、关键词、作者、摘要、正文等元数据

Python news 爬虫 crawling scraper news-aggregator

HTML 14.82 k

5 天前

D4Vinci / Scrapling

#网络爬虫#🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

爬虫 crawling crawling-python Playwright Python scraping selectors stealth-game web-scraper web-scraping web-scraping-python webscraping xpath 自动化人工智能 ai-scraping data data-extraction mcp mcp-server

Python 7.6 k

8 小时前

lorien / awesome-web-scraping

#网络爬虫#List of libraries, tools and APIs for web scraping and data processing.

web-scraping captcha-recaptcha crawling crawling-python scraping scraping-framework scraping-python scraping-tool webscraping 爬虫 spider

Makefile 7.38 k

5 天前

apify / crawlee-python

#网络爬虫#Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

apify 自动化 beautifulsoup 爬虫 crawling headless headless-chrome pip Playwright Python scraper scraping web-crawler web-crawling web-scraping Hacktoberfest

Python 6.93 k

2 天前

go-rod / rod

#网络爬虫#Rod 是一个直接基于 DevTools Protocol 高级驱动程序。它是为网页自动化和爬虫而设计的，既可用于高级应用开发也可用于低级应用开发，高级开发人员可以使用低级包和函数来轻松地定制或建立他们自己的Rod版本，高级函数只是建立Rod默认版本的例子。

cdp chrome-headless chrome-devtools chrome-devtools-protocol headless web-scraping 自动化 scraper devtools devtools-protocol rod Go Testing Web gorod crawling

Go 6.33 k

20 天前

MontFerret / ferret

#网络爬虫#Declarative web scraping

Go query-language data-mining scraping scraping-websites dsl cdp crawling scraper 爬虫 Chrome 命令行界面工具 Library

Go 5.88 k

1 个月前

yujiosaka / headless-chrome-crawler

#网络爬虫#Distributed crawler powered by Headless Chrome

headless-chrome Puppeteer jQuery 爬虫 crawling scraper scraping Chrome Chromium Promise

JavaScript 5.62 k

2 年前

hakluke / hakrawler

#网络爬虫#Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

Bug Bounty crawling Hacking OSINT pentesting Reconnaissance reconnaissance

Go 4.89 k

10 个月前

hardkoded / puppeteer-sharp

#网络爬虫#Headless Chrome .NET API

Puppeteer Chrome Chromium 自动化爬虫 crawling C#e2e e2e-testing webautomation

C# 3.78 k

2 天前

ai-robots-txt / ai.robots.txt

#网络爬虫#A list of AI agents and robots to block.

人工智能 crawlers crawling 隐私

Python 3.16 k

3 天前

apache / nutch

#网络爬虫#Apache Nutch is an extensible and scalable web crawler

Java nutch web-crawler crawling hadoop apache

Java 3.08 k

4 天前

edoardottt / cariddi

#网络爬虫#Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

endpoints endpoint-discovery Bug Bounty 爬虫 secret-keys secrets-detection Cybersecurity reconnaissance Reconnaissance crawling Go pentesting 安全 OSINT penetration-testing scraper Hacktoberfest redteam

Go 2.81 k

19 天前

transitive-bullshit / awesome-puppeteer

#网络爬虫#A curated list of awesome puppeteer resources.

Puppeteer headless-chrome Awesome Lists scraping crawling 自动化

2.52 k

1 年前

lorien / grab

#网络爬虫#Web Scraping Framework

web-scraping http-client 框架 Python pycurl asynchronous Network urllib3 spider 爬虫 crawling scraping python-library

Python 2.42 k

1 个月前

zorlan / skycaiji

#网络爬虫#蓝天采集器是一款开源免费的爬虫系统，仅需点选编辑规则即可采集数据，可运行在本地、虚拟主机或云服务器中，几乎能采集所有类型的网页，无缝对接各类CMS建站程序，免登录实时发布数据，全自动无需人工干预！是网页大数据采集软件中完全跨平台的云端爬虫系统

爬虫 crawling spider webcrawler PHP

PHP 2.04 k

2 个月前

NateScarlet / holiday-cn

#网络爬虫#📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

data 自然语言处理 crawling holiday china

Python 1.64 k

3 天前

roach-php / core

#网络爬虫#The complete web scraping toolkit for PHP.

PHP web-scraping crawling

PHP 1.43 k

13 天前

Website
Wikipedia