Always know what to expect from your data.
翻译 - 永远知道您对数据的期望。
#数据仓库#The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
翻译 - 在数据集中查找标签错误并使用嘈杂的标签进行学习。
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team colla...
翻译 - 元数据开放标准。发现、协作和正确获取数据的单一场所。
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Compare tables within or across databases
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
re_data - fix data issues before your users & CEO would discover them 😊
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
翻译 - 用于数据掌握、重复数据删除和实体解析的可扩展模糊匹配。
#计算机科学#ML powered analytics engine for outlier detection and root cause analysis.
The premier open source Data Quality solution
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
#计算机科学#Library for Semi-Automated Data Science
Possibly the fastest DataFrame-agnostic quality check library in town.
Open Source Data Quality Monitoring.
#计算机科学#Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility acr...
#大语言模型#Dingo: A Comprehensive Data Quality Evaluation Tool
DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling, new dataset...