#自然语言处理#📖 A curated list of resources dedicated to Natural Language Processing (NLP)
#网络爬虫#Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
#自然语言处理#extract text from any document. no muss. no fuss.
翻译 - 从任何文档中提取文本。不要糊涂别大惊小怪。
#自然语言处理#Text preprocessing, representation and visualization from zero to hero.
翻译 - 从零到英雄的文本预处理,表示和可视化。
#自然语言处理#Beautiful visualizations of how language differs among document types.
#自然语言处理#Library to scrape and clean web pages to create massive datasets.
翻译 - 用于刮擦和清理网页以创建大量数据集的库。
a curated list of R tutorials for Data Science, NLP and Machine Learning
#自然语言处理#Python package for Korean natural language processing.
Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
AutoPhrase: Automated Phrase Mining from Massive Text Corpora
#自然语言处理#Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre...
#计算机科学#从新浪财经、每经网、金融界、中国证券网、证券时报网上,爬取上市公司(个股)的历史新闻文本数据进行文本分析、提取特征集,然后利用SVM、随机森林等分类器进行训练,最后对实施抓取的新闻数据进行分类预测
#算法刷题#Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
#搜索#Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document...
#网络爬虫#A configurable web spider with a easy-to-use web console
#自然语言处理#A collection of notebooks for Natural Language Processing from NLP Town
#自然语言处理#Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
#计算机科学#Fast topic modeling platform
#Awesome#A list of awesome resources for Computational Social Science