webscraping · GitHub Topics

firecrawl / firecrawl

#网络爬虫#Firecrawl 是一种 API 服务，它爬取URL并将其转换为清洗过的 markdown 或结构化数据

人工智能爬虫 data Markdown scraper html-to-markdown 大语言模型 rag scraping web-crawler ai-scraping webscraping

TypeScript 58.17 k

2 小时前

huginn / huginn

#自动化#你的代理人，随时待命。Huginn 是一个用于构建自动化任务的web平台。

自动化 notifications scraper webscraping feedgenerator RSS agent 监控 feed twitter-streaming huginn X (Twitter)

Ruby 47.45 k

4 小时前

assafelovic / gpt-researcher

LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations.

人工智能 Python agent 自动化 research search webscraping 大语言模型 deepresearch mcp mcp-server

Python 23.51 k

1 天前

getmaxun / maxun

#网络爬虫#一个可视化，通过鼠标点击完成数据采集的爬虫平台

自动化无代码 scraper web-automation web-scraper web-scraping API browser browser-automation Playwright 自托管 website-to-api robotic-process-automation rpa no-code-web-scraper agents data-extraction webscraping

TypeScript 13.62 k

2 天前

pystardust / ani-cli

A cli tool to browse and play anime

Shell 命令行界面 Anime posix steamdeck Termux webscraping fzf Linux macOS rofi 终端 Windows

Shell 9.78 k

3 天前

lorien / awesome-web-scraping

#网络爬虫#List of libraries, tools and APIs for web scraping and data processing.

web-scraping captcha-recaptcha crawling crawling-python scraping scraping-framework scraping-python scraping-tool webscraping 爬虫 spider

Makefile 7.32 k

9 个月前

D4Vinci / Scrapling

#网络爬虫#🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

爬虫 crawling crawling-python Playwright Python scraping selectors stealth-game web-scraper web-scraping web-scraping-python webscraping xpath 自动化人工智能 ai-scraping data data-extraction mcp mcp-server

Python 7.31 k

16 小时前

alirezamika / autoscraper

#网络爬虫#A Smart, Automatic, Fast and Lightweight Web Scraper for Python

scraping scraper scrape webscraping 爬虫 web-scraping 人工智能 Python webautomation 自动化机器学习

Python 6.93 k

3 个月前

niespodd / browser-fingerprinting

#网络爬虫#Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

Bot detection Chromium stealth-game Puppeteer scraper webscraping Web 自动化 chromium-browser bot-detection chromedriver fingerprinting 爬虫 recaptcha spider browser-fingerprinting

JavaScript 4.86 k

1 年前

jaypyles / Scraperr

#网络爬虫#Self-hosted webscraper.

Open Source 自托管 webscraper Docker helm Kubernetes Playwright Python scraping web-scraper web-scrapers web-scraping webscraping

TypeScript 4.27 k

2 个月前

daijro / camoufox

#网络爬虫#🦊 Anti-detect browser

antidetect antidetect-browser fingerprint Firefox Playwright webscraping Network scraping

C++ 3.34 k

6 个月前

scrapoxy / scrapoxy

Scrapoxy is a super proxies manager that orchestrates all your proxies into one place, rather than spreading management across multiple scrapers. It manages IP rotation and fingerprinting, and smartly...

antibot proxies webscraping

TypeScript 2.36 k

1 个月前