YAYI 2 是中科闻歌研发的新一代开源大语言模型,采用了超过 2 万亿 Tokens 的高质量、多语言语料进行预训练。(Repo for YaYi 2 Chinese LLMs)
#自然语言处理#Foundation Architecture for (M)LLMs
#自然语言处理# A curated list of pretrained sentence and word embedding models
#自然语言处理#An optimized deep prompt tuning strategy comparable to fine-tuning across scales and tasks
#自然语言处理#A plug-and-play library for parameter-efficient-tuning (Delta Tuning)
#自然语言处理#Summarization Papers
#自然语言处理#中文法律LLaMA (LLaMA for Chinese legel domain)
word2vec, sentence2vec, machine reading comprehension, dialog system, text classification, pretrained language model (i.e., XLNet, BERT, ELMo, GPT), sequence labeling, information retrieval, informati...
#自然语言处理#Code associated with the Don't Stop Pretraining ACL 2020 paper
#自然语言处理#Live Training for Open-source Big Models
#数据仓库#Papers and Datasets on Instruction Tuning and Following. ✨✨✨
ACL'2023: DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models
#计算机科学#MWPToolkit is an open-source framework for math word problem(MWP) solvers.
#自然语言处理#[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.
Worth-reading papers and related resources on attention mechanism, Transformer and pretrained language model (PLM) such as BERT. 值得一读的注意力机制、Transformer和预训练语言模型论文与相关资源集合
#自然语言处理#EMNLP'23 survey: a curation of awesome papers and resources on refreshing large language models (LLMs) without expensive retraining.
#自然语言处理#[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
[ACM Computing Surveys 2025] This repository collects awesome survey, resource, and paper for Lifelong Learning with Large Language Models. (Updated Regularly)
#自然语言处理#On Transferability of Prompt Tuning for Natural Language Processing
#区块链#BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection (WWW23)