Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
#计算机科学#🐢 Open-Source Evaluation & Testing for ML & LLM systems
#大语言模型#Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
Evaluation tool for LLM QA chains
#大语言模型#The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place.
The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.
An open-source visual programming environment for battle-testing prompts to LLMs.
⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡
#大语言模型#OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Universal and Transferable Attacks on Aligned Language Models
Arbitrary expression evaluation for golang
翻译 - golang的任意表达式求值
the LLM vulnerability scanner
#大语言模型#LLM Finetuning with peft
#大语言模型#[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Neutralinojs vs Electron vs Nw.js
LLM as a Chatbot Service
Evaluation of Deep Learning Frameworks