evaluation · GitHub Topics

:metal: awesome-semantic-segmentation

semantic-segmentation benchmark evaluation 深度学习

10.64 k

4 年前

#大语言模型#🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

analytics 大语言模型 llmops large-language-models openai 自托管 ycombinator 监控 observability Open Source langchain llama-index evaluation prompt-engineering prompt-management playground llm-evaluation llm-observability autogen

TypeScript 10.31 k

10 小时前

explodinggradients / ragas

#大语言模型#Supercharge Your LLM Application Evaluations 🚀

大语言模型 llmops evaluation

Python 8.78 k

4 天前

promptfoo / promptfoo

#大语言模型#Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...

大语言模型 prompt-engineering prompts llmops prompt-testing Testing rag evaluation evaluation-framework llm-eval llm-evaluation llm-evaluation-framework 持续集成 CI/CD pentesting red-teaming vulnerability-scanners

TypeScript 6.17 k

7 小时前

open-compass / opencompass

#大语言模型#OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

evaluation benchmark large-language-model ChatGPT 大语言模型 llama2 openai llama3

Python 5.16 k

2 天前

Knetic / govaluate

Arbitrary expression evaluation for golang

翻译 - golang的任意表达式求值

Go evaluation Parsing expression

Go 3.86 k

19 天前

Marker-Inc-Korea / AutoRAG

#大语言模型#AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

analysis automl benchmarking document-parser embeddings evaluation 大语言模型 llm-evaluation llm-ops Open Source ops optimization pipeline Python qa rag rag-evaluation retrieval-augmented-generation

Python 3.8 k

1 个月前

MichaelGrupp / evo

Python package for the evaluation of odometry and SLAM

翻译 - Python package for the evaluation of odometry and SLAM

slam odometry evaluation 监控 Robotics trajectory benchmark ros kitti tum mapping ros2 trajectory-analysis

Python 3.73 k

23 天前

Helicone / helicone

#大语言模型#🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

large-language-models prompt-engineering agent-monitoring analytics evaluation gpt langchain llama-index 大语言模型 llm-cost llm-evaluation llm-observability llmops 监控 Open Source openai playground prompt-management ycombinator

TypeScript 3.59 k

1 天前

sdiehl / write-you-a-haskell

Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)

编译器 book evaluation lambda-calculus type type-checking type-system 函数式编程 functional-language type-inference type-theory intermediate-representation

Haskell 3.38 k

4 年前

Kiln-AI / Kiln

#计算机科学#The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

人工智能 chain-of-thought collaboration fine-tuning 机器学习 macOS ollama openai prompt prompt-engineering Python rlhf synthetic-data Windows evals evaluation

Python 3.36 k

1 天前

CLUEbenchmark / SuperCLUE

#大语言模型#SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

ChatGPT 中文 evaluation foundation-models gpt-4

3.15 k

1 年前

viebel / klipse

Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.

Clojure ClojureScript JavaScript Ruby scheme prolog React codemirror-editor evaluation Python brainfuck Lua OCaml Reason Common Lisp

HTML 3.13 k

6 个月前

zzw922cn / Automatic_Speech_Recognition

#计算机科学#End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

翻译 - Tensorflow中的英语和英语的端到端自动语音识别

automatic-speech-recognition Tensorflow timit-dataset feature-vector phonemes data-preprocessing rnn audio 深度学习 lstm end-to-end cnn evaluation Bukkit speech-recognition chinese-speech-recognition

Python 2.84 k

2 年前

microsoft / promptbench

#大语言模型#A unified evaluation framework for large language models

adversarial-attacks ChatGPT evaluation large-language-models robustness prompt prompt-engineering benchmark

Python 2.59 k

2 个月前

ianarawjo / ChainForge

An open-source visual programming environment for battle-testing prompts to LLMs.

人工智能 evaluation large-language-models llmops llms prompt-engineering

TypeScript 2.56 k

11 天前

EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

agi evaluation large-language-models multimodal

Python 2.32 k

20 小时前

uptrain-ai / uptrain

#计算机科学#UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...

机器学习 experimentation llm-prompting llmops 监控 prompt-engineering evaluation llm-eval

Python 2.26 k

8 个月前

open-compass / VLMEvalKit

#大语言模型#Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks