Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
#大语言模型#A framework to evaluate the generalization capability of safety alignment for LLMs
#计算机科学#Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Framework for LLM evaluation, guardrails and security
Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced anal...
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorD...
Framework to evaluate LLM generated ReactJS code.
Evaluation tool for LLM QA chains
An open-source visual programming environment for battle-testing prompts to LLMs.
SLAM performance evaluation framework
#大语言模型#LlamaIndex is a data framework for your LLM applications
a lightweight LLM model inference framework
PROJECT DELTA: SDN SECURITY EVALUATION FRAMEWORK
#大语言模型#A unified evaluation framework for large language models
A framework for few-shot evaluation of language models.
⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡
Benchmark Framework for fair evaluation of rPPG
Python Framework for Saliency Modeling and Evaluation
#计算机科学#🐢 Open-Source Evaluation & Testing for ML & LLM systems
Data framework for your LLM applications. Focus on server side solution
Well tested & Multi-language evaluation framework for text summarization.