Ethereum Token Contracts
Token-based AngularJS Authentication
#自然语言处理#CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
A comprehensive deep dive into the world of tokens
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/
#自然语言处理#Fast, Consistent Tokenization of Natural Language Text
High speed text tokenization for Ruby
#自然语言处理#利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre tr...
#计算机科学#A Thai word tokenization library using Deep Neural Network
#大语言模型#[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Data cleaning, Tokenization, Regular Expressions and Pandas guide.
#自然语言处理#Fast and customizable text tokenization library with BPE and SentencePiece support
#自然语言处理#Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
Isomorphic utilities for GPT-3 tokenization and prompt building.
This is a java version of Chinese tokenization descried in BERT.