🌀 𝗧𝗵𝗲 𝗙𝘂𝗹𝗹 𝗦𝘁𝗮𝗰𝗸 𝟳-𝗦𝘁𝗲𝗽𝘀 𝗠𝗟𝗢𝗽𝘀 𝗙𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 | 𝗟𝗲𝗮𝗿𝗻 𝗠𝗟𝗘 & 𝗠𝗟𝗢𝗽𝘀 for free by designing, building and deploying an end-to-end ML batch system ~ 𝘴𝘰𝘶𝘳𝘤𝘦 𝘤...
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Prod...
Sample project to demonstrate data engineering best practices
Nyc_Taxi_Data_Pipeline - DE Project
#计算机科学#Learn how to create reliable ML systems by testing code, data and models.
Data Quality Gate based on AWS
#计算机科学#Tutorial for implementing data validation in data science pipelines
Prefect integrations for interacting with Great Expectations
How to evaluate the Quality of your Data with Great Expectations and Spark.
This repository serves as a comprehensive guide to effective data modeling and robust data quality assurance using popular open-source tools
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.
BirdiDQ leverages the power of the Python Great Expectations open-source library and combines it with the simplicity of natural language queries to effortlessly identify and report data quality issues...
📈 A scalable, production-ready data pipeline for real-time streaming & batch processing, integrating Kafka, Spark, Airflow, AWS, Kubernetes, and MLflow. Supports end-to-end data ingestion, transforma...
Code to demonstrate data engineering metadata & logging best practices
Run greatexpectations.io on ANY SQL Engine using REST API. Supported by FastAPI, Pydantic and SQLAlchemy as best data quality tool
Integrating Apache Airflow, dbt, Great Expectations and Apache Superset to develop a modern open source data stack.
This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.
Using Great Expectations and Notion's API, this repo aims to provide data quality for our databases in Notion.