scraping

#网络爬虫#Firecrawl 是一种 API 服务，它爬取URL并将其转换为清洗过的 markdown 或结构化数据

人工智能爬虫 Markdown scraper html-to-markdown 大语言模型 scraping web-crawler ai-scraping webscraping web-scraping web-data web-data-extraction ai-agents data-extraction ai-crawler ai-search web-scraper web-search

TypeScript 63 k

12 小时前

scrapy / scrapy

#爬虫框架#一款流行，高效，生态丰富的Python爬虫框架

Python scraping crawling 框架爬虫 Hacktoberfest web-scraping web-scraping-python

Python 58.62 k

6 天前

feder-cr / Jobs_Applier_AI_Agent_AIHawk

#网络爬虫#AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.

自动化 Bot ChatGPT gpt job jobsearch jobseeker opeai Python resume scraper scraping application-resume Selenium Chrome human-resources jobs agent 人工智能

Python 28.98 k

5 个月前

gocolly / colly

#爬虫框架#一个快速优雅的Golang爬虫框架

Go scraper 框架爬虫 scraping crawling spider

Go 24.73 k

2 天前

ScrapeGraphAI / Scrapegraph-ai

#网络爬虫#Python scraper based on AI

scraping scraping-python automated-scraper 大语言模型人工智能 web-crawler web-scraping ai-scraping 爬虫 html-to-markdown Markdown rag web-crawlers

Python 21.57 k

11 天前

apify / crawlee

#网络爬虫#Crawlee - 一个用于Node.js 开发的网页爬虫和浏览器自动化库

web-scraping web-crawling npm headless-chrome Puppeteer 自动化 apify scraping crawling 爬虫 headless scraper web-crawler JavaScript Node.js Playwright TypeScript

TypeScript 19.86 k

11 小时前

soxoj / maigret

#网络爬虫#Maigret 是一个OSINT用户名检查器。输入目标用户名，即可从各大社交网站采集该用户信息的工具。fork自sherlock开源项目

OSINT social-network identification socmint sherlock investigation namechecker Python Open Source Cybersecurity scraping osint-python redteam blueteam osint-framework 命令行界面 reconnaissance pentesting

Python 17.74 k

18 小时前

psf / requests-html

#网络爬虫#Pythonic HTML Parsing for Humans™

HTML scraping Python requests HTTP kennethreitz lxml pyquery css-selectors beautifulsoup

Python 13.86 k

1 年前

ultrafunkamsterdam / undetected-chromedriver

#网络爬虫#Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

chromedriver Selenium webdriver Chrome anti-detection anti-bot distil browser 自动化 scraping Python captcha navigator Testing Cloudflare cloudflare-bypass bot-detection

Python 11.86 k

3 个月前

code4craft / webmagic

#网络爬虫#webmagic是一个开源的Java垂直爬虫框架，目标是简化爬虫的开发流程，让开发者专注于逻辑功能的开发。webmagic的核心非常简单，但是覆盖爬虫的整个流程，也是很好的学习爬虫开发的材料。

爬虫 Java scraping 框架

Java 11.65 k

2 个月前

D4Vinci / Scrapling

#网络爬虫#🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

爬虫 crawling crawling-python Playwright Python scraping selectors stealth-game web-scraper web-scraping web-scraping-python webscraping xpath 自动化人工智能 ai-scraping data data-extraction mcp mcp-server

Python 7.47 k

2 天前

lorien / awesome-web-scraping

#网络爬虫#List of libraries, tools and APIs for web scraping and data processing.

web-scraping captcha-recaptcha crawling crawling-python scraping scraping-framework scraping-python scraping-tool webscraping 爬虫 spider

Makefile 7.37 k

2 天前

tabulapdf / tabula

#网络爬虫#Tabula is a tool for liberating data tables trapped inside PDF files

pdf CSV excel tables scraping

CSS 7.22 k

7 个月前

alirezamika / autoscraper

#网络爬虫#A Smart, Automatic, Fast and Lightweight Web Scraper for Python

scraping scraper scrape webscraping 爬虫 web-scraping 人工智能 Python webautomation 自动化机器学习

Python 6.99 k

4 个月前

apify / crawlee-python

#网络爬虫#Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...