GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

web-scraping

Website
Wikipedia
scrapy/scrapy
https://static.github-zh.com/github_avatars/scrapy?size=40
scrapy / scrapy

#爬虫框架#一款流行,高效,生态丰富的Python爬虫框架

Pythonscrapingcrawling框架爬虫Hacktoberfestweb-scrapingweb-scraping-python
Python 58.27 k
4 天前
Mintplex-Labs/anything-llm
https://static.github-zh.com/github_avatars/Mintplex-Labs?size=40
Mintplex-Labs / anything-llm

#大语言模型#The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

raglmstudiolocalaivector-databaseollamalocal-llmllama3大语言模型ai-agentsmultimodalcustom-ai-agentsdeepseekmcpmcp-servers无代码qwen3web-scrapingkimimoonshot
JavaScript 49.18 k
4 小时前
dgtlmoon/changedetection.io
https://static.github-zh.com/github_avatars/dgtlmoon?size=40
dgtlmoon / changedetection.io

changedetection.io 是一个用于监控网页内容修改的工具,并支持通过API、邮件、消息等多种方式发送通知

website-monitorwebsite-monitoringchange-detection监控自托管change-alertchange-monitoringwebsite-change-monitorurl-monitorwebsite-change-detectorwebsite-change-detectionwebsite-change-trackerwebsite-change-notificationnotificationsweb-scrapingrestock-monitorwebsite-defacement-monitoringback-in-stockwebsite-watcherRSS
Python 26.58 k
14 小时前
https://static.github-zh.com/github_avatars/ScrapeGraphAI?size=40
ScrapeGraphAI / Scrapegraph-ai

#网络爬虫#Python scraper based on AI

scrapingscraping-pythonautomated-scraper大语言模型人工智能web-crawlerweb-scrapingai-scraping爬虫html-to-markdownMarkdownrag
Python 21.33 k
1 个月前
apify/crawlee
https://static.github-zh.com/github_avatars/apify?size=40
apify / crawlee

#网络爬虫#Crawlee - 一个用于Node.js 开发的网页爬虫和浏览器自动化库

web-scrapingweb-crawlingnpmheadless-chromePuppeteer自动化apifyscrapingcrawling爬虫headlessscraperweb-crawlerJavaScriptNode.jsPlaywrightTypeScript
TypeScript 19.51 k
1 天前
Evil0ctal/Douyin_TikTok_Download_API
https://static.github-zh.com/github_avatars/Evil0ctal?size=40
Evil0ctal / Douyin_TikTok_Download_API

#网络爬虫#🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

PythonpywebioTikTokdouyinAPIscraperFastAPIno-watermarkonline-parsingasyncdouyin-tiktok-apidouyin-tiktok-download爬虫spiderweb-scrapingtiktok-scraperdouyin-scraperdouyin-apitiktok-apitiktok-signature
Python 14.31 k
6 个月前
https://static.github-zh.com/github_avatars/getmaxun?size=40
getmaxun / maxun

#网络爬虫#一个可视化,通过鼠标点击完成数据采集的爬虫平台

自动化无代码scraperweb-automationweb-scraperweb-scrapingAPIbrowserbrowser-automationPlaywright自托管website-to-apirobotic-process-automationrpano-code-web-scraperagentsdata-extractionwebscraping
TypeScript 13.62 k
11 小时前
seleniumbase/SeleniumBase
https://static.github-zh.com/github_avatars/seleniumbase?size=40
seleniumbase / SeleniumBase

SeleniumBase 是一个 Python 浏览器自动化的库,用于web自动化,测试,验证码绕过

PythonSeleniumwebdriverselenium-pythone2e-testingseleniumbasepytest-pluginweb-automationpytestWebKitchromedriveranti-detectionbot-detectioncloudflare-bypassweb-scraping-pythonpython-scraperweb-scrapingTest automationcdpbehave
Python 11.64 k
2 小时前
https://static.github-zh.com/github_avatars/mherrmann?size=40
mherrmann / helium

helium 是一个用于浏览器自动化如 Chrome/Firebox 的Python库

Seleniumselenium-pythonPythonwebdriverChromeFirefoxweb-automationweb-scrapinghelium
Python 8.02 k
5 个月前
https://static.github-zh.com/github_avatars/lorien?size=40
lorien / awesome-web-scraping

#网络爬虫#List of libraries, tools and APIs for web scraping and data processing.

web-scrapingcaptcha-recaptchacrawlingcrawling-pythonscrapingscraping-frameworkscraping-pythonscraping-toolwebscraping爬虫spider
Makefile 7.32 k
9 个月前
D4Vinci/Scrapling
https://static.github-zh.com/github_avatars/D4Vinci?size=40
D4Vinci / Scrapling

#网络爬虫#🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

爬虫crawlingcrawling-pythonPlaywrightPythonscrapingselectorsstealth-gameweb-scraperweb-scrapingweb-scraping-pythonwebscrapingxpath自动化人工智能ai-scrapingdatadata-extractionmcpmcp-server
Python 7.32 k
13 小时前
alirezamika/autoscraper
https://static.github-zh.com/github_avatars/alirezamika?size=40
alirezamika / autoscraper

#网络爬虫#A Smart, Automatic, Fast and Lightweight Web Scraper for Python

scrapingscraperscrapewebscraping爬虫web-scraping人工智能Pythonwebautomation自动化机器学习
Python 6.93 k
3 个月前
https://static.github-zh.com/github_avatars/apify?size=40
apify / crawlee-python

#网络爬虫#Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

apify自动化beautifulsoup爬虫crawlingheadlessheadless-chromepipPlaywrightPythonscraperscrapingweb-crawlerweb-crawlingweb-scrapingHacktoberfest
Python 6.31 k
9 小时前
https://static.github-zh.com/github_avatars/go-rod?size=40
go-rod / rod

#网络爬虫#Rod 是一个直接基于 DevTools Protocol 高级驱动程序。 它是为网页自动化和爬虫而设计的,既可用于高级应用开发也可用于低级应用开发,高级开发人员可以使用低级包和函数来轻松地定制或建立他们自己的Rod版本,高级函数只是建立Rod默认版本的例子。

cdpchrome-headlesschrome-devtoolschrome-devtools-protocolheadlessweb-scraping自动化scraperdevtoolsdevtools-protocolrodGoTestingWebgorodcrawling
Go 6.26 k
17 小时前
https://static.github-zh.com/github_avatars/autoscrape-labs?size=40
autoscrape-labs / pydoll

#网络爬虫#Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.

cdpChromiumPlaywrightPuppeteerSeleniumwebdriverbrowser-automationanti-detection自动化爬虫e2e-testsheadlessscrapingTestingfingerprintingweb-scraping
Python 5.25 k
22 天前
https://static.github-zh.com/github_avatars/adbar?size=40
adbar / trafilatura

#网络爬虫#Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

web-scrapingtext-extraction自然语言处理text-mining爬虫text-preprocessingarticle-extractorreadabilityscrapinghtml-to-markdowncorpus-toolsrss-feednews-aggregatorrag大语言模型
Python 4.68 k
6 天前
https://static.github-zh.com/github_avatars/firecrawl?size=40
firecrawl / firecrawl-mcp-server

🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.

batch-processingclaudecontent-extractiondata-collectionfirecrawlfirecrawl-aillm-toolsmcp-servermodel-context-protocolsearch-apiweb-crawlerweb-scrapingjavascript-renderingmcp
JavaScript 4.54 k
1 天前
https://static.github-zh.com/github_avatars/jaypyles?size=40
jaypyles / Scraperr

#网络爬虫#Self-hosted webscraper.

Open Source自托管webscraperDockerhelmKubernetesPlaywrightPythonscrapingweb-scraperweb-scrapersweb-scrapingwebscraping
TypeScript 4.27 k
2 个月前
https://static.github-zh.com/github_avatars/lexiforest?size=40
lexiforest / curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

cURLhttp-clientcurl-impersonateHTTPja3ja3-fingerprinttls-fingerprintfingerprintingweb-scrapingakamai-fingerprint
Python 4.18 k
12 小时前
snooppr/snoop
https://static.github-zh.com/github_avatars/snooppr?size=40
snooppr / snoop

#网络爬虫#Snoop — инструмент разведки на основе открытых данных (OSINT world)

OSINTTermuxusername-searchusername-checkerpentestweb-scrapingctfscannerredteamblueteamCybersecurity安全nicknameipgeopoliceParserscrapinggeocoderusername
Python 3.45 k
6 天前
loading...