⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
re_data - fix data issues before your users & CEO would discover them 😊
Various files useful for manual testing and test automation etc.
Great Expectations Airflow operator
re_data - fix data issues before your users & CEO would discover them 😊
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling, new dataset...
A simple and easy to use Data Validation library for Python.
Find out if your data is what you think it is
⚡ Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.
This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.
Develop a data science project using historical sales data to build a regression model that accurately predicts future sales. Preprocess the dataset, conduct exploratory analysis, select relevant feat...
Data generation and validation tool for any data source
data and pipeline testing with and for SQL
Software Testing in Open Source and Data Science: A talk delivered at the Data Umbrella speaker series
Example API implementation for Data Caterer
Documentation for Data Caterer
A sample repository showcasing, implementation of testing for ETL pipeline developed with Apache Spark
Credit Risk Classification