#自然语言处理#💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
翻译 - optimized针对研究和生产进行了优化的快速最先进的分词器
high performance tokenizer for Vietnamese language
翻译 - 越南语的高性能标记器
The community maintained Solana token registry
翻译 - 社区维护 Solana 代币注册表
A small library for converting tokenized PHP source code into XML (and potentially other formats)
翻译 - 一个小型库,用于将标记化的PHP源代码转换为XML(以及可能的其他格式)
Fast and customizable text tokenization library with BPE and SentencePiece support
Ethereum Token Contracts
翻译 - 以太坊代币合约
Token-based AngularJS Authentication
翻译 - 基于令牌的AngularJS身份验证
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Universal cross-platform tokenizers binding to HF and sentencepiece
A library that helps tokenize text using Text Mate grammars.
A wrapper around the stdlib `tokenize` which roundtrips.
An Arduino library to tokenize and parse commands received over a serial port.
A Wiring/Arduino library to tokenize and parse commands received over a serial port.
Tokenize tweets to determine net sentiments and locations, generate Viz for states mean sentiment
Collect news feeds from RSS and tokenize it, preparing it for some textual statistical analysis.
Twitter Text Libraries. This code is used at Twitter to tokenize and parse text to meet the expectations for what can be used on the platform.
There will be 2 projects an interactive Arduino interpreter and one that is not interactive but uses flash memory instead of RAM. I would appreciate if someone helped tokenize the code. The code is ...