text-mining · GitHub Topics

#自然语言处理#📖 A curated list of resources dedicated to Natural Language Processing (NLP)

自然语言处理深度学习机器学习 language Awesome Lists text-mining

17.09 k

1 年前

#网络爬虫#Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

web-scraping text-extraction 自然语言处理 text-mining 爬虫 text-preprocessing article-extractor readability scraping html-to-markdown corpus-tools rss-feed news-aggregator rag 大语言模型

Python 4.12 k

1 个月前

deanmalmgren / textract

#自然语言处理#extract text from any document. no muss. no fuss.

翻译 - 从任何文档中提取文本。不要糊涂别大惊小怪。

Python 自然语言处理 data-mining text-mining

HTML 4.06 k

4 个月前

jbesomi / texthero

#自然语言处理#Text preprocessing, representation and visualization from zero to hero.

翻译 - 从零到英雄的文本预处理，表示和可视化。

text-preprocessing text-representation text-visualization 自然语言处理 word-embeddings 机器学习 text-mining nlp-pipeline text-clustering

Python 2.9 k

2 年前

JasonKessler / scattertext

#自然语言处理#Beautiful visualizations of how language differs among document types.

自然语言处理 d3 word-embeddings 机器学习可视化 word2vec text-visualization text-mining japanese-language computational-social-science sentiment eda exploratory-data-analysis scatter-plot topic-modeling

Python 2.29 k

7 个月前

chiphuyen / lazynlp

#自然语言处理#Library to scrape and clean web pages to create massive datasets.

翻译 - 用于刮擦和清理网页以创建大量数据集的库。

人工智能自然语言处理 text-mining language-model Python open 数据科学

Python 2.18 k

4 年前

ujjwalkarn / DataScienceR

a curated list of R tutorials for Data Science, NLP and Machine Learning

datascience 数据科学 R text-mining

R 2.04 k

2 年前

konlpy / konlpy

#自然语言处理#Python package for Korean natural language processing.

Python 自然语言处理 text-mining korean Hacktoberfest

Python 1.44 k

2 年前

juliasilge / tidy-text-mining

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson

book text-mining tidyverse bookdown R

TeX 1.34 k

6 天前

juliasilge / tidytext

#自然语言处理#Text mining using tidy tools ✨📄✨

text-mining R tidyverse 自然语言处理

R 1.19 k

1 年前

shangjingbo1226 / AutoPhrase

AutoPhrase: Automated Phrase Mining from Massive Text Corpora

text-mining multi-language automatic phrase

C++ 1.18 k

3 年前

kavgan / nlp-in-practice

#自然语言处理#Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre...

自然语言处理 word2vec text-classification gensim 机器学习 text-mining

Jupyter Notebook 1.17 k

4 年前

DemonDamon / FinnewsHunter

#计算机科学#从新浪财经、每经网、金融界、中国证券网、证券时报网上，爬取上市公司（个股）的历史新闻文本数据进行文本分析、提取特征集，然后利用SVM、随机森林等分类器进行训练，最后对实施抓取的新闻数据进行分类预测

机器学习 text-mining

Python 1.11 k

4 个月前

csurfer / rake-nltk

#算法刷题#Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

nltk 算法 Python text-mining keyword-extraction

Python 1.07 k

2 年前

opensemanticsearch / open-semantic-search

#搜索#Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document...