corpus-processing · GitHub Topics

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

翻译 - 具有多语言支持的集成语料库工具，用于语言，文学和翻译研究

corpus corpus-linguistics corpus-tools corpus-processing literature translation Parsing tagger lemmatizer dependency-parser

Python 716

17 天前

bitextor / bitextor

#网络爬虫#Bitextor generates translation memories from multilingual websites

dictionaries 爬虫 wget Parsing warc corpus-tools corpus-processing machine-translation neural-machine-translation statistical-machine-translation

Python 292

5 个月前

hankcs / TreebankPreprocessing

#自然语言处理# Python scripts preprocessing Penn Treebank and Chinese Treebank

自然语言处理 corpus-processing

Python 161

5 年前

Helsinki-NLP / OpusFilter

#自然语言处理#OpusFilter - Parallel corpus processing toolkit

corpus-tools corpus-processing 自然语言处理 machine-translation

Python 104

18 天前

NathanDuran / Switchboard-Corpus

Utilities for Processing the Switchboard Dialogue Act Corpus

corpus corpus-processing corpus-data corpus-tools dialogue

Python 68

4 年前

OHNLP / MedTator

#自然语言处理#A Serverless Text Annotation Tool for Corpus Development

corpus-processing 自然语言处理 Serverless

JavaScript 55

2 个月前

johentsch / ms3

A parser for annotated MuseScore 3 files.

corpus corpus-data corpus-processing corpus-tools musescore Parser sheet-music tsv xml-parser xml-parser-library xml-parsing

Python 47

19 天前

uma-pi1 / OPIEC

#自然语言处理#Reading the data from OPIEC - an Open Information Extraction corpus

information-extraction corpus corpus-data corpus-tools 自然语言处理 natural-language-understanding wikipedia Wiki corpus-processing dataset

Java 37

6 年前

NathanDuran / MRDA-Corpus

Utilities for Processing the Meeting Recorder Dialogue Act Corpus

corpus corpus-data corpus-processing corpus-tools dialogue

Python 32

4 年前

versotym / rhymetagger

A simple collocation-driven recognition of rhymes. Contains pre-trained models for Czech, Dutch, English, French, German, Russian, and Spanish poetry

corpus-processing language-processing

Python 29

3 年前

notesjor / corpusexplorer2.0

#自然语言处理#Korpuslinguistik war noch nie so einfach...

corpus-linguistics 数据科学 text-mining text-processing text-analysis 自然语言处理 data-mining SDK corpus-processing natural-language-understanding big-data tagger 可视化 journalism datajournalism

C# 23

1 个月前

jaytimm / corpuslingr

A library of functions enabling complex corpus search in context (KWIC), search aggregation, bag-of-words building & keyphrase extraction.

corpus-tools corpus-processing

R 20

6 年前

zgornel / StringAnalysis.jl

Hard-Forked from JuliaText/TextAnalysis.jl

corpus-processing text-processing text-analysis

Julia 17

2 年前

Bibliome / alvisnlp

#自然语言处理#ALvisNLP corpus processing engine

自然语言处理 pipeline corpus-processing workflow Java workflow-engine 机器学习

Java 17

5 个月前

uma-pi1 / OPIEC-pipeline

#自然语言处理#

text-processing corpus-data corpus-tools corpus-linguistics corpus-processing wikipedia Wiki information-extraction big-data bigdata 自然语言处理 natural-language-understanding

Java 14

3 年前

jonathandunn / corpus_similarity

#自然语言处理#Measure the similarity of text corpora for 74 languages

corpus text corpus-linguistics corpus-processing corpus-tools language 自然语言处理

Python 13

1 年前

kennedyCzar / NLP-PROJECT-BOOK-INSIGHTS-WITH-PLOTLY

#自然语言处理#Plotly-Dash NLP project. Document similarity measure using Latent Dirichlet Allocation, principal component analysis and finally follow with KMeans clustering. Project is completed with dynamic visual...

pca plotly-dash 自然语言处理 corpus-processing dash plotly plotly-python callbacks

Python 12

3 年前

jonathandunn / common_crawl_corpus

Scripts for building a geo-located web corpus using Common Crawl data

corpus-linguistics corpus-processing corpus-tools web-crawling

Python 11

11 天前

felipetovarhenao / exquisitecorpus

A set of corpus-based sampling & analysis M4L devices

maxforlive corpus-processing sampling

Max 11

3 年前

Linguista / CQPweb-Instabox

Script that sets up and configures an entire CQPweb server installation

corpus-linguistics corpus-tools corpus-processing cqp

Shell 11

5 年前