Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
翻译 - 用于神经机器翻译和文本生成的无监督分词
#自然语言处理#Unsupervised text tokenizer focused on computational efficiency
翻译 - 无监督文本令牌生成器专注于计算效率
#自然语言处理#Fast and customizable text tokenization library with BPE and SentencePiece support
Ready-made tokenizer library for working with GPT and tiktoken
#自然语言处理#Explains nlp building blocks in a simple manner.
nfelib - bindings Python para e ler e gerir XML de NF-e, NFS-e nacional, CT-e, MDF-e, BP-e
#计算机科学#Machine Learning for Phishing Website Detection
Subword Encoding in Lattice LSTM for Chinese Word Segmentation
Simple-to-use scoring function for arbitrarily tokenized texts.
#大语言模型#GPT3 encoder & decoder tool written in Swift
High performance unsupervised text tokenization for Ruby
Learning BPE embeddings by first learning a segmentation model and then training word2vec
Sentiment-based classification for stock article title using PhoBert