#安卓#Open source real-time translation app for Android that runs locally
#自然语言处理#Fast and customizable text tokenization library with BPE and SentencePiece support
#自然语言处理#🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
使用sentencepiece中BPE训练中文词表,并在transformers中进行使用。
#自然语言处理#Free and open source pre-trained translation models, including Kurdish, Samoan, Xhosa, Lao, Corsican, Cebuano, Galician, Yiddish, Swahili, Russian, Belarusian and Yoruba.
#自然语言处理#Minimal example of using a traced huggingface transformers model with libtorch
#自然语言处理#A Robustly Optimized BERT Pretraining Approach for Vietnamese
#大语言模型#Go implementation of the SentencePiece tokenizer
#自然语言处理#R package for Byte Pair Encoding / Unigram modelling based on Sentencepiece
Extremely simple and understandable GPT2 implementation with minor tweaks
Learning BPE embeddings by first learning a segmentation model and then training word2vec
#自然语言处理#Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.
sentencepiece port to webassembly with browser compatibility
#自然语言处理#BERT implementation of PyTorch
#自然语言处理#To investigate various DNN text classifiers including MLP, CNN, RNN, BERT approaches.
Use SentencePiece in Swift for tokenization and detokenization.
Bengali language Tokenizer (SentencePiece)