crawlers · GitHub Topics

ai-robots-txt / ai.robots.txt

#网络爬虫#A list of AI agents and robots to block.

人工智能 crawlers crawling 隐私

Python 2.38 k

3 天前

omrilotan / isbot

🤖/👨‍🦰 Detect bots/crawlers/spiders using the user agent string

user-agent user-agent-parser crawlers

TypeScript 1.01 k

4 天前

salimk / Rcrawler

#网络爬虫#An R web crawler and scraper

R 爬虫 scraper webcrawler webscraping webscraper webscrapping crawlers

R 354

3 年前

StJudeWasHere / seonaut

#网络爬虫#Open source SEO audit tool.

搜索引擎优化 (SEO)Go 爬虫 audit crawlergo crawlers crawling Docker Docker Compose multiuser seotools Web

Go 342

5 天前

Norconex / crawlers

#搜索#Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.

搜索引擎 web-crawler Java flexible 爬虫 crawlers

Java 188

1 天前

SiENcE / astray

Astray is a lua based maze, room and dungeon generation library for dungeon crawlers and rougelike video games

Lua Mazes room video-game LÖVE crawlers dungeon procedural-generation

Lua 155

4 个月前

ArchiveTeam / wget-lua

#网络爬虫#Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

warc wget Lua archiving 爬虫 crawl crawling spider zstd ftp scraper scraping crawlers 下载器

C 119

3 个月前

narkhedesam / Proxy-List-Scrapper

#网络爬虫#Proxy List Scrapper

proxy freeproxy proxyscrape proxies scrapper data-mining 爬虫 crawlers proxy-list proxypool HTTP socks socks4 socks5

Python 102

2 年前

hhuayuan / spiderbuf

#网络爬虫#Spiderbuf 是一个专注于 Python 爬虫练习的网站。提供丰富的爬虫教程、爬虫案例解析和爬虫练习题。Python爬虫开发强化练习，在矛与盾的攻防中不断提高技术水平，通过大量的爬虫实战掌握常见的爬虫与反爬套路。引导式爬虫案例 + 免费爬虫视频教程，以闯关的形式挑战各个爬虫任务，培养爬虫开发的直觉及经验，验证自身爬虫开发与反爬虫实力的时候到了。

爬虫 Python spider requests xpath scraping scraping-python crawlers scraping-websites Selenium

Python 87

1 个月前

jonasjacek / robots.txt

#搜索#Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.

robots-txt user-agent 搜索引擎优化 (SEO)搜索引擎 whitelist crawlers web-crawling crawling

2 个月前

behitek / social-scraper

#网络爬虫#Vietnamese text data crawler scripts for various sites (including Youtube, Facebook, 4rum, news, ...)

Instagram YouTube scraping-websites scraper selenium-python requests 爬虫 crawlers

Python 74

2 年前

howie6879 / hproxy

#网络爬虫#hproxy - Asynchronous IP proxy pool, aims to make getting proxy as convenient as possible.(异步爬虫代理池)

proxy sanic proxy-pool 爬虫 schedule crawlers asyncio

Python 66

3 年前

Potelo / laravel-block-bots

Block crawlers and high traffic users on your site by IP using Redis

crawlers Laravel Bot scrapper

PHP 48

2 年前

Symbolexe / Raven

#网络爬虫#Raven is a powerful and customizable web crawler written in Go.

Bug Bounty 爬虫 crawlers crawling Go pentesting

Go 41

7 个月前

BaseMax / GooglePlayWebServiceAPI

#网络爬虫#Tiny script to crawl information of a specific application in the Google play/store base on PHP.

PHP google-play API 爬虫 crawlers Hacktoberfest hacktoberfest2020

PHP 37

2 年前

flulemon / sneakpeek

#网络爬虫#Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex scraping logic that needs to be run on a constant ba...

爬虫 crawlers crawling Python scraper scraper-engine scraping scraping-framework Vue.js

Python 37

2 年前

peterbencze / serritor

#网络爬虫#Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to render data.