An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
翻译 - 用于构建数据湖,数据仓库和分析平台的端到端GoodReads数据管道。
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Reference Architectures for Datalakes on AWS
Classwork projects and home works done through Udacity data engineering nano degree
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/AdventureWorks.zip, it's a zipped file with some .c...
Apache Spark TPC-DS benchmark setup with EMR launch setup
A Cassandra Architecture for GDELT Database 🌍
Uses EMR clusters to export dynamoDB tables to S3 and generates import steps
An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.
A boilerplate for spark projects with docker support for local development and scripts for emr support.
Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extracts data from S3, transform data using spark, load transformed d...
This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS com...
A large-scale data framework that will enable us to store and analyze financial market data and drive future predictions for investment.
Generic python library that enables to provision emr clusters with yaml config files (Configuration as Code)
Collection of code for submitting Spark/Hadoop/Hive/Pig tasks to EMR (AWS Elastic MapReduce) | #DE
Event driven EMR via Serverless