The community maintained Solana token registry
翻译 - 社区维护 Solana 代币注册表
Ethereum Token Contracts
翻译 - 以太坊代币合约
Token-based AngularJS Authentication
翻译 - 基于令牌的AngularJS身份验证
#自然语言处理#CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
翻译 - Stanford CoreNLP:核心NLP工具的Java套件。
Blockchain coin and token profile collection
Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Fast n-Gram Tokenization
High speed text tokenization for Ruby
翻译 - Ruby的高速文本标记化
利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre tr...
Data cleaning, Tokenization, Regular Expressions and Pandas guide.
Fast and customizable text tokenization library with BPE and SentencePiece support
Isomorphic utilities for GPT-3 tokenization and prompt building.
This is a java version of Chinese tokenization descried in BERT.
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
A full text search engine with tokenization, stemming, typo tolerance, filters and geo support based on only PHP and SQLite.
This repo contains the data preparation, tokenization, training and inference code for BLOOMChat. BLOOMChat is a 176 billion parameter multilingual chat model based on BLOOM.
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...