#大语言模型#🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
#大语言模型#Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
#大语言模型#Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
The LLM Evaluation Framework
#数据仓库#AI Observability & Evaluation
#大语言模型#🐢 Open-Source Evaluation & Testing for AI & LLM systems
the LLM vulnerability scanner
#大语言模型#AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
#大语言模型#🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓
#大语言模型#The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place.
Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.
Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪
Prompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandab...
#自然语言处理#Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
Data-Driven Evaluation for LLM-Powered Applications
Awesome papers involving LLMs in Social Science.
#Awesome#A curated list of 🌌 Azure OpenAI, 🦙 Large Language Models (incl. RAG, Agent), and references with memos.
#大语言模型#Build, Improve Performance, and Productionize your LLM Application with an Integrated Framework
Python SDK for running evaluations on LLM generated responses