web-scraping · GitHub Topics

scrapy / scrapy

#爬虫框架#一款流行，高效，生态丰富的Python爬虫框架

Python scraping crawling 框架爬虫 Hacktoberfest web-scraping web-scraping-python

Python 58.27 k

4 天前

Mintplex-Labs / anything-llm

#大语言模型#The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

rag lmstudio localai vector-database ollama local-llm llama3 大语言模型 ai-agents multimodal custom-ai-agents deepseek mcp mcp-servers 无代码 qwen3 web-scraping kimi moonshot

JavaScript 49.18 k

4 小时前

dgtlmoon / changedetection.io

changedetection.io 是一个用于监控网页内容修改的工具，并支持通过API、邮件、消息等多种方式发送通知

Python 26.58 k

14 小时前

ScrapeGraphAI / Scrapegraph-ai

#网络爬虫#Python scraper based on AI

scraping scraping-python automated-scraper 大语言模型人工智能 web-crawler web-scraping ai-scraping 爬虫 html-to-markdown Markdown rag

Python 21.33 k

1 个月前

apify / crawlee

#网络爬虫#Crawlee - 一个用于Node.js 开发的网页爬虫和浏览器自动化库

web-scraping web-crawling npm headless-chrome Puppeteer 自动化 apify scraping crawling 爬虫 headless scraper web-crawler JavaScript Node.js Playwright TypeScript

TypeScript 19.51 k

1 天前

Evil0ctal / Douyin_TikTok_Download_API

#网络爬虫#🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具，支持API调用，在线批量解析及下载。

Python 14.31 k

6 个月前

getmaxun / maxun

#网络爬虫#一个可视化，通过鼠标点击完成数据采集的爬虫平台

自动化无代码 scraper web-automation web-scraper web-scraping API browser browser-automation Playwright 自托管 website-to-api robotic-process-automation rpa no-code-web-scraper agents data-extraction webscraping

TypeScript 13.62 k

11 小时前

seleniumbase / SeleniumBase

SeleniumBase 是一个 Python 浏览器自动化的库，用于web自动化，测试，验证码绕过

Python 11.64 k

2 小时前

mherrmann / helium

helium 是一个用于浏览器自动化如 Chrome/Firebox 的Python库

Selenium selenium-python Python webdriver Chrome Firefox web-automation web-scraping helium

Python 8.02 k

5 个月前

lorien / awesome-web-scraping

#网络爬虫#List of libraries, tools and APIs for web scraping and data processing.

web-scraping captcha-recaptcha crawling crawling-python scraping scraping-framework scraping-python scraping-tool webscraping 爬虫 spider

Makefile 7.32 k

9 个月前

D4Vinci / Scrapling

#网络爬虫#🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

爬虫 crawling crawling-python Playwright Python scraping selectors stealth-game web-scraper web-scraping web-scraping-python webscraping xpath 自动化人工智能 ai-scraping data data-extraction mcp mcp-server

Python 7.32 k

13 小时前

alirezamika / autoscraper

#网络爬虫#A Smart, Automatic, Fast and Lightweight Web Scraper for Python

scraping scraper scrape webscraping 爬虫 web-scraping 人工智能 Python webautomation 自动化机器学习

Python 6.93 k

3 个月前

apify / crawlee-python

#网络爬虫#Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

apify 自动化 beautifulsoup 爬虫 crawling headless headless-chrome pip Playwright Python scraper scraping web-crawler web-crawling web-scraping Hacktoberfest

Python 6.31 k

9 小时前

go-rod / rod

#网络爬虫#Rod 是一个直接基于 DevTools Protocol 高级驱动程序。它是为网页自动化和爬虫而设计的，既可用于高级应用开发也可用于低级应用开发，高级开发人员可以使用低级包和函数来轻松地定制或建立他们自己的Rod版本，高级函数只是建立Rod默认版本的例子。

cdp chrome-headless chrome-devtools chrome-devtools-protocol headless web-scraping 自动化 scraper devtools devtools-protocol rod Go Testing Web gorod crawling

Go 6.26 k

17 小时前

autoscrape-labs / pydoll

#网络爬虫#Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.

cdp Chromium Playwright Puppeteer Selenium webdriver browser-automation anti-detection 自动化爬虫 e2e-tests headless scraping Testing fingerprinting web-scraping

Python 5.25 k

22 天前

adbar / trafilatura

#网络爬虫#Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

web-scraping text-extraction 自然语言处理 text-mining 爬虫 text-preprocessing article-extractor readability scraping html-to-markdown corpus-tools rss-feed news-aggregator rag 大语言模型

Python 4.68 k

6 天前

firecrawl / firecrawl-mcp-server

🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.

batch-processing claude content-extraction data-collection firecrawl firecrawl-ai llm-tools mcp-server model-context-protocol search-api web-crawler web-scraping javascript-rendering mcp

JavaScript 4.54 k

1 天前

jaypyles / Scraperr

#网络爬虫#Self-hosted webscraper.

Open Source 自托管 webscraper Docker helm Kubernetes Playwright Python scraping web-scraper web-scrapers web-scraping webscraping

TypeScript 4.27 k

2 个月前

lexiforest / curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

cURL http-client curl-impersonate HTTP ja3 ja3-fingerprint tls-fingerprint fingerprinting web-scraping akamai-fingerprint

Python 4.18 k

12 小时前

snooppr / snoop

#网络爬虫#Snoop — инструмент разведки на основе открытых данных (OSINT world)

OSINT Termux username-search username-checker pentest web-scraping ctf scanner redteam blueteam Cybersecurity 安全 nickname ip geo police Parser scraping geocoder username

Python 3.45 k

6 天前