#计算机科学#Kedro 是一个用于构建可重现、可维护和模块化的数据科学代码的Python框架。它借鉴了软件工程的理念并将其应用于机器学习
Template for Data Engineering and Data Pipeline projects
To show the usefulness of data engineering and ML pipelines.
DBND is an agile pipeline framework that helps data engineering teams track and orchestrate their data processes.
Data Engineering Capstone Project: ETL Pipelines and Data Warehouse Development
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift, Data Lake with Spark and Data Pipeline with Airflow.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
A high-performance observability data pipeline.
翻译 - 用于构建可观察性管道的轻量级和超快速工具
Netflix's distributed Data Pipeline
Data Engineering Practice Problems
翻译 - Data Engineering Practice Problems
Personal Data Engineering Projects
The best place to learn data engineering. Built and maintained by the data engineering community.
Automated Payload Reverse Engineering Pipeline for the Controller Area Network (CAN) protocol
Getting Started with Data Enngineering
翻译 - 数据工程入门
Easy Amplicon data analysis pipeline
Roadmap for Data Engineering
Ringbuffer-backed interactive data pipeline
Data for Scriptable Render Pipeline
A data pipeline framework for machine learning
A distributed, fault-tolerant pipeline for observability data
A Data Engineering & Machine Learning Knowledge Hub
Data Engineering on Google Cloud Platform
Example end to end data engineering project.