The LLM Evaluation Framework
#大语言模型#Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including OpenAI Agents SDK, CrewAI, Langchain, Autogen, AG2, and CamelAI
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
#计算机科学#(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
翻译 - 提交中的“ 3D多对象跟踪基准”的官方Python实现
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
#大语言模型#Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
#计算机科学#[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
#自然语言处理#OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
#计算机科学#📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.
#自然语言处理# A Neural Framework for MT Evaluation
⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
Data-Driven Evaluation for LLM-Powered Applications
#自然语言处理#PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks su...
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
[RAL' 2025] MapEval: Towards Unified, Robust and Efficient SLAM Map Evaluation Framework.
#大语言模型#Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper
Python SDK for running evaluations on LLM generated responses
#大语言模型#[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning