#大语言模型#Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
#数据仓库#AI Observability & Evaluation
#大语言模型#🐢 Open-Source Evaluation & Testing for AI & LLM systems
#大语言模型#ETL, Analytics, Versioning for Unstructured Data
#计算机科学#UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...
Python SDK for running evaluations on LLM generated responses
Generate ideal question-answers for testing RAG
#大语言模型#A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
#大语言模型#Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
#大语言模型#🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.
#大语言模型#Develop reliable AI apps
#自然语言处理#An open source library for asynchronous querying of LLM endpoints
Realign is a testing and simulation framework for AI applications.
Code for "Prediction-Powered Ranking of Large Language Models", NeurIPS 2024.
#大语言模型#Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.
#大语言模型#The prompt engineering, prompt management, and prompt evaluation tool for Python
#大语言模型#The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.