web-crawling · GitHub Topics

#网络爬虫#Crawlee - 一个用于Node.js 开发的网页爬虫和浏览器自动化库

web-scraping web-crawling npm headless-chrome Puppeteer 自动化 apify scraping crawling 爬虫 headless scraper web-crawler JavaScript Node.js Playwright TypeScript

TypeScript 18.23 k

1 天前

apify / crawlee-python

#网络爬虫#Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

apify 自动化 beautifulsoup 爬虫 crawling headless headless-chrome pip Playwright Python scraper scraping web-crawler web-crawling web-scraping Hacktoberfest

Python 5.79 k

12 小时前

omkarcloud / botasaurus

The All in One Framework to Build Undefeatable Scrapers

anti-bot anti-detection cloudflare-bypass cloudflare-scrape antidetect-browser undetected bypass-cloudflare scraping-framework scraping-tool undetectable web-scraping-python bot-detection scraping-python web-crawling python-scraper

Python 2.04 k

1 个月前

platonai / PulsarRPA

#大语言模型#PulsarRPA: An AI-Enabled, Super-Fast, Thread-Safe Browser Automation Solution! 💖

ai-agents browser-automation 大语言模型 browser-use dom-api dom-manipulation rpa web-crawler web-crawling web-scraper web-scraping

Kotlin 892

10 天前

brightdata / brightdata-mcp

#网络爬虫#A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.

大语言模型 mcp modelcontextprotocol scraping ai-agents browser-automation data-collection data-extraction mcp-server structured-data web-crawling web-scraping

JavaScript 862

2 天前

cxcscmu / Craw4LLM

#网络爬虫#Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

爬虫 crawling large-language-models 大语言模型 pre-training pretraining web-crawler web-crawling

Python 631

4 个月前

scrapehero-code / amazon-scraper

A simple web scraper to extract Product Data and Pricing from Amazon

web-scraping web-crawling

Python 393

2 年前

crwlrsoft / crawler

#网络爬虫#Library for Rapid (Web) Crawler and Scraper Development

crawling PHP scraper scraping scraping-websites web-crawler web-crawling web-scraping Hacktoberfest 爬虫 web-scraper

PHP 365

1 个月前

godkingjay / selenium-twitter-scraper

#网络爬虫#This is a Twitter Scraper which uses Selenium for scraping tweets. It is capable of scraping tweets from home, user profile, hashtag, query or search, and advanced searches.

scraper X (Twitter)web-crawling Hacktoberfest hacktoberfest-accepted collaborate Selenium

Jupyter Notebook 276

3 个月前

spyboy-productions / omnisci3nt

Omnisci3nt – See What They’ve Tried to Hide Extract deep intelligence from any domain. From subdomains to SSL certs, archived secrets to exposed ports — Omnisci3nt gives you the full picture in second...

ip-lookup port-scanning ssl-certificate subdomain-enumeration web-crawling web-reconnaissance whois OSINT pentesting-tools

Python 265

3 个月前

jrbadiabo / Bet-on-Sibyl

#算法刷题#Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)

机器学习 sports-stats 算法 Selenium beautifulsoup Python scikit-learn web-scraping web-crawling

Jupyter Notebook 264

8 年前

TurnerSoftware / InfinityCrawler

#网络爬虫#A simple but powerful web crawler library for .NET

爬虫 web-crawler web-crawling robots-txt spider

C# 253

2 年前

ayakashi-io / ayakashi

⚡ Ayakashi.io - The next generation web scraping framework

web-scraping 自动化 headless-chrome data-mining web-crawling

TypeScript 214

2 年前

serpapi / clauneck

A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.

自动化命令行界面 command-line-tool data-extraction data-extractor email email-extract-with-proxy email-extraction email-extractor email-marketing Open Source Ruby rubygem web-crawling webscraping

Ruby 181

1 年前

scrapinghub / scrapy-training

Scrapy Training companion code

scrapy Python training web-scraping web-crawling

Python 174

6 年前

brianmadden / krawler

A web crawling framework written in Kotlin

webcrawler Kotlin 框架 link-checker web-crawler web-crawling

Kotlin 130

4 年前

leogregianin / bancocentralbrasil

💵 💰 :brazil: Informações sobre taxas oficiais diárias de Inflação, Selic, Poupança, Dólar, Dólar PTAX, Euro e Euro PTAX pelo site do Banco Central do Brasil

money web-scraping web-crawling brasil brazil

Python 125

4 年前