llama-4bit-colab
simple 4-BIT CPU with 74-serials chip,origin by Kaoru Tonami in his book “How to build a CPU”
#大语言模型#FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
使用peft库,对chatGLM-6B/chatGLM2-6B实现4bit的QLoRA高效微调,并做lora model和base model的merge及4bit的量化(quantize)。
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
A finetuning pipeline for instruct tuning Raven 14bn using QLORA 4bit and the Ditty finetuning library
MikroLeo project files (schematic, PCB, assembler, emulator/debugger, circuit simulation file, documentation, example of programs etc). MikroLeo is a 4-bit microcomputer developed mainly for education...