#计算机科学#Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
An example of applying LLM evaluation metrics using PromptFlow and Azure AI Studio.
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found here
Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation met...
Repo for LLM evaluation metrics code
Unlock LLM evaluation power! This comprehensive toolkit offers diverse metrics for analyzing and comparing large language model outputs. Ideal for developers, researchers, and AI enthusiasts aiming to...
Evaluation metrics for NLP tasks and LLM performance
Evaluation tool for LLM QA chains
HOTA (and other) evaluation metrics for Multi-Object Tracking (MOT).
An empirical study on evaluation metrics of generative adversarial networks.
Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave
An open-source visual programming environment for battle-testing prompts to LLMs.
Evaluation Metrics for the Hewlett Foundation's Automated Essay Scoring competition
A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets
Evaluation metrics for image segmentation inspired by paper Fully Convolutional Networks for Semantic Segmentation
⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡
#计算机科学#(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
翻译 - 提交中的“ 3D多对象跟踪基准”的官方Python实现
Simple Tensorflow implementation of metrics for GAN evaluation (Inception score, Frechet-Inception distance, Kernel-Inception distance)
#大语言模型#The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place.
ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.
各种算法评价指标的实现(mAP/Flops/params/fps/error-rate/accuracy)