etl-job · GitHub Topics

AlexIoannides / pyspark-example-project

Implementing best practices for PySpark ETL jobs and applications.

pyspark etl-job Python data-engineering Apache Spark 数据科学 etl etl-pipeline

Python 1.88 k

2 年前

san089 / goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

翻译 - 用于构建数据湖，数据仓库和分析平台的端到端GoodReads数据管道。

etl-pipeline etl-framework Apache Spark apache-airflow airflow redshift emr-cluster livy s3 data-lake scheduler data-migration data-engineering data-engineering-pipeline Python etl-job

Python 1.36 k

5 年前

paillave / Etl.Net

Mass processing data with a complete ETL for .net developers

翻译 - .net开发人员使用完整的ETL批量处理数据

etl .NET dotnet-standard business-intelligence extract transform load CSV csv-parser csv-reader entity-framework etl-job sftp

C# 732

10 天前

jbogard / bulk-writer

Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.

etl SQL sqlbulkcopy pipeline etl-job

C# 241

10 个月前

DataWithBaraa / sql-data-warehouse-project

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

数据分析 data-analytics data-cleaning data-engineering 数据科学 data-warehouse data-warehousing datalake datascience datawarehouse etl etl-job etl-pipeline SQL sql-query sql-server

TSQL 110

12 天前

visiologyofficial / vixtract

etl etl-pipeline etl-framework etl-job

HTML 45

6 个月前

cloudposse / terraform-aws-glue

Terraform modules for provisioning and managing AWS Glue resources

Amazon Web Services etl etl-job glue workflow

HCL 29

2 个月前

felipefrizzo / terraform-aws-kinesis-firehose

This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.

Terraform terraform-aws terraform-provider parquet big-data etl-job analytics

HCL 26

4 年前

ktnsh24 / DataModelling

This repo will guide you step-by-step method to create star schema dimensional model.

etl-job SQL MySQL

Python 25

4 年前

nsphung / pyspark-template

A Python PySpark Projet with Poetry

poetry pyspark Project Python template black pytest isort Jupyter Notebook Apache Spark spark-sql data-engineering 数据科学 etl-job etl etl-pipeline

Jupyter Notebook 23

7 个月前

michaelbironneau / analyst

A declarative, SQL-like DSL for data integration tasks.

SQL data-integration etl etl-job

Go 14

7 年前

kishlayjeet / Twitter-Data-Pipeline-using-Airflow-and-AWS-S3

An end-to-end Twitter Data Pipeline that extracts data from Twitter and loads it into AWS S3.

airflow apache-airflow boto3 data-engineering data-engineering-pipeline data-pipeline etl etl-job etl-pipeline Python s3 scheduler Twitter twitter-api

Python 13

2 年前

yennanliu / AirflowJob

#计算机科学#Airflow POC demo : 1) env set up 2) airflow DAG 3) Spark/ML pipeline | #DE

etl-job airflow Apache Spark data-engineering 机器学习 Instagram etl Docker Python travis 数据科学 infrastructure environment elt etl-pipeline dag Shell

Python 12

2 年前

Joshua-omolewa / Retailstore_ETL_pipeline_project

Built a Data Pipeline for a Retail store using AWS services that collects data from its transactional database (OLTP) in Snowflake and transforms the raw data (ETL process) using Apache spark to meet ...

airflow Docker etl-job etl-pipeline Python snowflake Apache Spark

Python 9

2 年前

TheCocoTeam / source-watcher-core

This is a PHP project which combines ETL with different strategies to extract data from multiple databases, files, and services, transform it and load it into multiple destinations.

etl etl-framework etl-pipeline etl-job CSV transformation

PHP 9

9 天前

2298-Software / Mambo

A simple in-memory, configuration driven, data processing pipeline for Apache Spark.

Apache Spark etl-framework etl-job turbine stream hadoop pipeline

Scala 5

2 年前

amantewary / Sentiment-Analysis-of-Tweets-Using-ETL-process-and-Elastic-Search

Sentiment Analysis of Tweets Using ETL process and Elastic Search

sentiment-analysis etl-job elasticsearch Azure

Python 4

7 年前

achugr / flink-comms-processing

Comms processing (ETL) with Apache Flink.

flink flink-examples etl etl-pipeline etl-job

Java 4

4 年前

ShihWen / tpe-mrt-traffic-etl

A data pipeline from source to data warehouse using Taipei Metro Hourly Traffic data

airflow Python data-engineering data-engineering-pipeline data-warehouse etl-job etl-pipeline redshift s3

Jupyter Notebook 3

2 年前

mdauthentic / ETLProject-Batch

An ETL pipeline where data is captured from REST API (Remotive, Adzuna & GitHub) and RSS feeds (StackOverflow). The data collected from the API is stored on local disk. The files are preprocessed and ...

etl-pipeline etl etl-job API Python SQL JSON data-engineering REST API

Python 3

4 年前