#自然语言处理#学习如何设计、开发、部署、和迭代生产级机器学习应用
#自然语言处理#精选大公司分享他们在生产中关于数据科学 & 机器学习的论文和技术博客等资源
#计算机科学#1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
翻译 - 从pandas DataFrame对象创建HTML分析报告
Always know what to expect from your data.
翻译 - 永远知道您对数据的期望。
#数据仓库#The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
翻译 - 在数据集中查找标签错误并使用嘈杂的标签进行学习。
#计算机科学#Refine high-quality datasets and visual AI models
翻译 - 用于构建高质量数据集和计算机视觉模型的开源工具
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team colla...
翻译 - 元数据开放标准。发现、协作和正确获取数据的单一场所。
#计算机科学#The Open Source Feature Store for AI/ML
翻译 - 机器学习功能库
#大语言模型#Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
lakeFS - Data version control for your data lake | Git for data
翻译 - 一个开源平台,可为基于对象存储的数据湖提供弹性和可管理性
#自然语言处理#Learn how to design, develop, deploy and iterate on production-grade ML applications.
Compare tables within or across databases
#计算机科学#An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collect...
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
#计算机科学#Feathr – A scalable, unified data and AI engineering platform for enterprise
#计算机科学#The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
翻译 - 矢量机器学习嵌入的存储引擎。
re_data - fix data issues before your users & CEO would discover them 😊
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
#计算机科学#A curated, but incomplete, list of data-centric AI resources.
#计算机科学#Automatically find issues in image datasets and practice data-centric computer vision.