llm-eval · GitHub Topics

#大语言模型#Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...

大语言模型 prompt-engineering prompts llmops prompt-testing Testing rag evaluation evaluation-framework llm-eval llm-evaluation llm-evaluation-framework 持续集成 CI/CD pentesting red-teaming vulnerability-scanners

TypeScript 6.17 k

7 小时前

Arize-ai / phoenix

#数据仓库#AI Observability & Evaluation

llmops ai-monitoring ai-observability llm-eval 数据集 agents llms prompt-engineering anthropic evals llm-evaluation openai langchain llamaindex

Jupyter Notebook 5.35 k

1 天前

Giskard-AI / giskard

#大语言模型#🐢 Open-Source Evaluation & Testing for AI & LLM systems

mlops ml-validation ml-testing llmops responsible-ai fairness-ai llm-eval llm-evaluation rag-evaluation ai-security llm-security ai-red-team red-team-tools 大语言模型

Python 4.47 k

1 天前

iterative / datachain

#大语言模型#ETL, Analytics, Versioning for Unstructured Data

人工智能 cv data-wrangling 大语言模型 llm-eval multimodal data-analytics embeddings mlops 机器学习

Python 2.5 k

1 天前

uptrain-ai / uptrain

#计算机科学#UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...

机器学习 experimentation llm-prompting llmops 监控 prompt-engineering evaluation llm-eval

Python 2.26 k

8 个月前

athina-ai / athina-evals

Python SDK for running evaluations on LLM generated responses

evaluation evaluation-framework evaluation-metrics llm-eval llm-evaluation llm-ops llmops

Python 276

8 天前

fiddlecube / fiddlecube-sdk

Generate ideal question-answers for testing RAG

llm-eval llm-training synthetic-data

Python 126

2 个月前

Re-Align / just-eval

#大语言模型#A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

evaluation gpt4 大语言模型 llm-eval llm-evaluation

Python 85

1 年前

parea-ai / parea-sdk-py

#大语言模型#Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

大语言模型 llm-evaluation llm-tools llmops llm-eval llm-evaluation-framework prompt-engineering generative-ai good-first-issue 监控

Python 76

2 个月前

kuk / rulm-sbs2

Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat

llm-eval

Jupyter Notebook 60

2 年前

multinear / multinear

#大语言模型#Develop reliable AI apps

evaluation 大语言模型 llms reliability llm-eval llm-evaluation llm-evaluation-framework

Svelte 36

3 天前

Auto-Playground / ragrank

#大语言模型#🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.

evaluation language-model 大语言模型 llm-eval llmops 机器学习 prompt-engineering rag

Python 33

3 个月前

alan-turing-institute / prompto

#自然语言处理#An open source library for asynchronous querying of LLM endpoints

hut23 large-language-models llm-eval llm-evaluation llms transformers 深度学习机器学习自然语言处理 Python transformer

Python 26

1 个月前

Supahands / llm-comparison-backend

#大语言模型#This is an opensource project allowing you to compare two LLM's head to head with a given prompt, this section will be regarding the backend of this project, allowing for llm api's to be incorporated ...

人工智能 ChatGPT 大语言模型 llm-eval

Python 19

1 个月前