Four word embedding models implemented in Python. Supporting arbitrary context features
#自然语言处理#A Lite Bert For Self-Supervised Learning Language Representations
Touch typing trainer using N-grams as data source, with options to customize the auto-generated lessons and specify the minimum typing performance needed. There are sound/color effects as well.
#自然语言处理#Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dy...
Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms
#自然语言处理#Python implementation of an N-gram language model with Laplace smoothing and sentence generation.
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard si...
#搜索#Top-k Approximate String Matching.
#大语言模型#大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning
#自然语言处理#利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre tr...
Create n-grams of wordlists based on words, characters, or charsets to use in offline password attacks and data analysis
#自然语言处理#Calculating Ngram with PySpark for wikipedia text
multiprocess unsupervised chinese_detect_words ngram_combination