ngram · GitHub Topics

zhezhaoa / ngram2vec

Four word embedding models implemented in Python. Supporting arbitrary context features

ngram word2vec embedding 中文 glove svd word word-embedding

Python 849

6 年前

lonePatient / albert_pytorch

#自然语言处理#A Lite Bert For Self-Supervised Learning Language Representations

albert bert PyTorch ngram mask 自然语言处理 language-model

Python 716

5 年前

wintermute-cell / ngrrram

A TUI tool to help you type faster and learn new layouts. Includes a free cat.

cat 命令行界面 colemak dvorak layout ngram Rust touchtyping tui typing

Rust 662

5 个月前

ranelpadon / ngram-type

Touch typing trainer using N-grams as data source, with options to customize the auto-generated lessons and specify the minimum typing performance needed. There are sound/color effects as well.

ngram colemak dvorak Vue.js monkeytype

JavaScript 225

8 个月前

lonePatient / daguan_2019_rank9

datagrand 2019 information extraction competition rank9

bert ie information-extraction ner lstm crf span dropout lookahead PyTorch ngram

Python 130

5 年前

proycon / colibri-core

#自然语言处理#Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dy...

C++Python 自然语言处理 ngrams skipgram ngram corpus Library text-processing computational-linguistics pattern-recognition

C++ 126

4 个月前

ChrisMuir / refinr

Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms

openrefine fuzzy-matching ngram approximate-string-matching data-cleaning clustering R rstats

C++ 104

1 年前

joshualoehr / ngram-language-model

#自然语言处理#Python implementation of an N-gram language model with Laplace smoothing and sentence generation.

ngram perplexity 自然语言处理 language-model Python ngrams language-models

Python 83

7 年前

words / n-gram

Get n-grams from text

ngram unigram

JavaScript 79

2 年前

vickumar1981 / stringdistance

A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard si...

levenshtein-distance levenshtein ngram jaro jaro-winkler dice-coefficient hamming-distance string-similarity cosine-similarity fuzzy-matching Hacktoberfest

Scala 78

3 年前

wrathematics / ngram

Fast n-Gram Tokenization

R ngram text text-mining

C 71

1 年前

suggest-go / suggest

#搜索#Top-k Approximate String Matching.

golang-library ngram fuzzy-search 搜索引擎 language-model spellchecker autocomplete

Go 67

3 年前

jiangnanboy / llm_corpus_quality

#大语言模型#大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning

Java 大语言模型 ngram

Java 57

9 个月前

BitSpeech / SRILM

Mirror of SRILM

language-model ngram

Roff 55

5 年前

myazi / NLP

natural language processing

ngram crf

C++ 36

7 年前

JackHCC / Chinese-Tokenization

#自然语言处理#利用传统方法（N-gram，HMM等）、神经网络方法（CNN，LSTM等）和预训练方法（Bert等）的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre tr...

hmm-viterbi-algorithm ngram 自然语言处理 tokenization

Python 32

3 年前