#大语言模型#🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
#大语言模型#Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
#大语言模型#OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Arbitrary expression evaluation for golang
翻译 - golang的任意表达式求值
Python package for the evaluation of odometry and SLAM
翻译 - Python package for the evaluation of odometry and SLAM
#大语言模型#AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
#大语言模型#🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓
#大语言模型#A unified evaluation framework for large language models
An open-source visual programming environment for battle-testing prompts to LLMs.
#计算机科学#🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
#计算机科学#(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
翻译 - 提交中的“ 3D多对象跟踪基准”的官方Python实现
#自然语言处理#An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
FuzzBench - Fuzzer benchmarking as a service.
翻译 - FuzzBench-Fuzzer基准测试即服务。
NCalc is a fast and lightweight expression evaluator library for .NET, designed for flexibility and high performance. It supports a wide range of mathematical and logical operations.
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".