GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

web-crawling

Website
Wikipedia
apify/crawlee
https://static.github-zh.com/github_avatars/apify?size=40
apify / crawlee

#网络爬虫#Crawlee - 一个用于Node.js 开发的网页爬虫和浏览器自动化库

web-scrapingweb-crawlingnpmheadless-chromePuppeteer自动化apifyscrapingcrawling爬虫headlessscraperweb-crawlerJavaScriptNode.jsPlaywrightTypeScript
TypeScript 18.23 k
1 天前
https://static.github-zh.com/github_avatars/apify?size=40
apify / crawlee-python

#网络爬虫#Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

apify自动化beautifulsoup爬虫crawlingheadlessheadless-chromepipPlaywrightPythonscraperscrapingweb-crawlerweb-crawlingweb-scrapingHacktoberfest
Python 5.79 k
12 小时前
omkarcloud/botasaurus
https://static.github-zh.com/github_avatars/omkarcloud?size=40
omkarcloud / botasaurus

The All in One Framework to Build Undefeatable Scrapers

anti-botanti-detectioncloudflare-bypasscloudflare-scrapeantidetect-browserundetectedbypass-cloudflarescraping-frameworkscraping-toolundetectableweb-scraping-pythonbot-detectionscraping-pythonweb-crawlingpython-scraper
Python 2.04 k
1 个月前
https://static.github-zh.com/github_avatars/platonai?size=40
platonai / PulsarRPA

#大语言模型#PulsarRPA: An AI-Enabled, Super-Fast, Thread-Safe Browser Automation Solution! 💖

ai-agentsbrowser-automation大语言模型browser-usedom-apidom-manipulationrpaweb-crawlerweb-crawlingweb-scraperweb-scraping
Kotlin 892
10 天前
https://static.github-zh.com/github_avatars/brightdata?size=40
brightdata / brightdata-mcp

#网络爬虫#A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.

大语言模型mcpmodelcontextprotocolscrapingai-agentsbrowser-automationdata-collectiondata-extractionmcp-serverstructured-dataweb-crawlingweb-scraping
JavaScript 862
2 天前
https://static.github-zh.com/github_avatars/cxcscmu?size=40
cxcscmu / Craw4LLM

#网络爬虫#Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

爬虫crawlinglarge-language-models大语言模型pre-trainingpretrainingweb-crawlerweb-crawling
Python 631
4 个月前
https://static.github-zh.com/github_avatars/scrapehero-code?size=40
scrapehero-code / amazon-scraper

A simple web scraper to extract Product Data and Pricing from Amazon

web-scrapingweb-crawling
Python 393
2 年前
https://static.github-zh.com/github_avatars/crwlrsoft?size=40
crwlrsoft / crawler

#网络爬虫#Library for Rapid (Web) Crawler and Scraper Development

crawlingPHPscraperscrapingscraping-websitesweb-crawlerweb-crawlingweb-scrapingHacktoberfest爬虫web-scraper
PHP 365
1 个月前
https://static.github-zh.com/github_avatars/godkingjay?size=40
godkingjay / selenium-twitter-scraper

#网络爬虫#This is a Twitter Scraper which uses Selenium for scraping tweets. It is capable of scraping tweets from home, user profile, hashtag, query or search, and advanced searches.

scraperX (Twitter)web-crawlingHacktoberfesthacktoberfest-acceptedcollaborateSelenium
Jupyter Notebook 276
3 个月前
https://static.github-zh.com/github_avatars/spyboy-productions?size=40
spyboy-productions / omnisci3nt

Omnisci3nt – See What They’ve Tried to Hide Extract deep intelligence from any domain. From subdomains to SSL certs, archived secrets to exposed ports — Omnisci3nt gives you the full picture in second...

ip-lookupport-scanningssl-certificatesubdomain-enumerationweb-crawlingweb-reconnaissancewhoisOSINTpentesting-tools
Python 265
3 个月前
https://static.github-zh.com/github_avatars/jrbadiabo?size=40
jrbadiabo / Bet-on-Sibyl

#算法刷题#Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)

机器学习sports-stats算法SeleniumbeautifulsoupPythonscikit-learnweb-scrapingweb-crawling
Jupyter Notebook 264
8 年前
https://static.github-zh.com/github_avatars/TurnerSoftware?size=40
TurnerSoftware / InfinityCrawler

#网络爬虫#A simple but powerful web crawler library for .NET

爬虫web-crawlerweb-crawlingrobots-txtspider
C# 253
2 年前
https://static.github-zh.com/github_avatars/ayakashi-io?size=40
ayakashi-io / ayakashi

⚡ Ayakashi.io - The next generation web scraping framework

web-scraping自动化headless-chromedata-miningweb-crawling
TypeScript 214
2 年前
https://static.github-zh.com/github_avatars/serpapi?size=40
serpapi / clauneck

A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.

自动化命令行界面command-line-tooldata-extractiondata-extractoremailemail-extract-with-proxyemail-extractionemail-extractoremail-marketingOpen SourceRubyrubygemweb-crawlingwebscraping
Ruby 181
1 年前
https://static.github-zh.com/github_avatars/scrapinghub?size=40
scrapinghub / scrapy-training

Scrapy Training companion code

scrapyPythontrainingweb-scrapingweb-crawling
Python 174
6 年前
https://static.github-zh.com/github_avatars/brianmadden?size=40
brianmadden / krawler

A web crawling framework written in Kotlin

webcrawlerKotlin框架link-checkerweb-crawlerweb-crawling
Kotlin 130
4 年前
https://static.github-zh.com/github_avatars/leogregianin?size=40
leogregianin / bancocentralbrasil

💵 💰 :brazil: Informações sobre taxas oficiais diárias de Inflação, Selic, Poupança, Dólar, Dólar PTAX, Euro e Euro PTAX pelo site do Banco Central do Brasil

moneyweb-scrapingweb-crawlingbrasilbrazil
Python 125
4 年前
https://static.github-zh.com/github_avatars/my8100?size=40
my8100 / scrapyd-cluster-on-heroku

Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO 👉

scrapyscrapydclusterHerokuPythonweb-crawlingweb-scraping
Python 123
5 年前
https://static.github-zh.com/github_avatars/MaxValue?size=40
MaxValue / Terpene-Profile-Parser-for-Cannabis-Strains

#网络爬虫#Parser and database to index the terpene profile of different strains of Cannabis from online databases

数据科学web-crawlerweb-crawlingPythonplantsscrapyhealth爬虫Bioinformaticsanalysis数据库
Python 121
2 年前
https://static.github-zh.com/github_avatars/maxmindlin?size=40
maxmindlin / scout-lang

#网络爬虫#A web crawling programming language

dsl编程语言web-crawlingweb-scrapingscraperscrapingscraping-websites
Rust 113
1 年前
loading...