The LLM Evaluation Framework
#大语言模型#Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including OpenAI Agents SDK, CrewAI, Langchain, Autogen, AG2, and CamelAI
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
#计算机科学#(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
翻译 - 提交中的“ 3D多对象跟踪基准”的官方Python实现
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
#大语言模型#Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
#计算机科学#[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
#自然语言处理#OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
#计算机科学#📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.
#自然语言处理# A Neural Framework for MT Evaluation
⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
#自然语言处理#PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks su...
Data-Driven Evaluation for LLM-Powered Applications
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
[RAL' 2025] MapEval: Towards Unified, Robust and Efficient SLAM Map Evaluation Framework.
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper
#大语言模型#Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
#大语言模型#[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Python SDK for running evaluations on LLM generated responses