:metal: awesome-semantic-segmentation
#大语言模型#🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
#大语言模型#Supercharge Your LLM Application Evaluations 🚀
#大语言模型#Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command ...
#大语言模型#OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Arbitrary expression evaluation for golang
翻译 - golang的任意表达式求值
Python package for the evaluation of odometry and SLAM
翻译 - Python package for the evaluation of odometry and SLAM
#大语言模型#AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
#大语言模型#🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓
Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)
Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.
#大语言模型#SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
#计算机科学#End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
翻译 - Tensorflow中的英语和英语的端到端自动语音识别
#大语言模型#A unified evaluation framework for large language models
An open-source visual programming environment for battle-testing prompts to LLMs.
#计算机科学#UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
#计算机科学#🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
#大语言模型#Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
#计算机科学#Avalanche: an End-to-End Library for Continual Learning based on PyTorch.