Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
翻译 - 与数据工程相关的项目很少,包括数据建模,云上的基础架构设置,数据仓库和数据湖开发。
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
翻译 - 用于构建数据湖,数据仓库和分析平台的端到端GoodReads数据管道。
One framework to develop, deploy and operate data workflows with Python and SQL.
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
Code examples showing flow deployment to various types of infrastructure
Classwork projects and home works done through Udacity data engineering nano degree
Data Engineering Project with Hadoop HDFS and Kafka
Let your pipe lines flow thru the Python code in xonsh.
Deploy a Prefect flow to serverless AWS Lambda function
A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.
F1 Data Pipeline
Analysis of 311 Service Requests for the City of NYC (from 2010 to 2023) Tech: Prefect cloud, dbt core, BigQuery, Compute Engine, CloudRun, Artifact Registry, Terraform, Docker
A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api
ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event
A batch Data Pipeline that retrieves data from a user purchase table and a movie review table and is transformed to form a user behaviour metric table.
Reusable data engineering toolkit
An end-to-end Twitter Data Pipeline that extracts data from Twitter and loads it into AWS S3.