API to the GPT4All Datalake
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
A workshop demonstrating the capabilities of S3, Athena, Glue, Kinesis, and Quicksight.
Reference Architectures for Datalakes on AWS
Datalake
This provides the contents for AWS Data Lake Handson in both Japanese and English.
This solution helps you deploy Data Lake Infrastructure on AWS using CDK Pipelines.
This solution helps you deploy ETL jobs on data lake using CDK Pipelines.
a metadata-aware file archive
Russia / Ukraine 2022 conflict related IOCs from CERT Orange Cyberdefense Threat Intelligence Datalake
⚙️ Código de manutenção do datalake (metadados e pacotes de acesso) | 📖 Docs: https://basedosdados.github.io/mais/
Data lake implementation demo, include iceberg on flink, iceberg on spark, hudi on flink, hudi on spark
Some demos of using Spark to write MySQL and Kafka data to data lake,such as Delta,Hudi,Iceberg
hadoop API package support mulit layer data layer such as blobstore (Erasure-Code) to adapte to datalake.
CCF BDCI 2022 数据湖流批一体性能挑战赛示例代码