Implementing best practices for PySpark ETL jobs and applications.
Using Flink SQL to build ETL job
ETL jobs for Firefox Telemetry
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Go...
Spark Streaming ETL jobs for Mozilla Telemetry
This solution helps you deploy ETL jobs on data lake using CDK Pipelines.
A Pyspark job to handle upserts, conversion to parquet and create partitions on S3
ETL jobs that DoltHub maintained that load public data into DoltHub.
Configurable data bridge for permanent ETL jobs
Bigquery ETL
LinkedPipes ETL is an RDF based, lightweight ETL tool
Pentaho Data Integration ( ETL ) a.k.a Kettle
翻译 - Pentaho数据集成(ETL)水壶
Distributed scheduled job
翻译 - 分布式计划作业框架
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
visualized crawler & ETL IDE written with C#/WPF
翻译 - 用C#/ WPF编写的可视化爬虫和ETL IDE
ETL best practices with airflow, with examples
Python job scheduling for humans.
翻译 - 适用于人类的Python作业调度。
Distributed Scheduled Job Framework
翻译 - 分布式计划作业框架
Extract, Transform, and Load data with Ruby