Doris 是百度开源的支持对海量大数据进行快速分析的MPP数据库。
trino 是一个分布式大数据 SQL 查询引擎(前身 PrestoSQL)
StarRocks 是新一代极速全场景 MPP (Massively Parallel Processing) 数据库。StarRocks 的愿景是能够让用户的数据分析变得更加简单和敏捷。用户无需经过复杂的预处理,就可以用 StarRocks 来支持多种数据分析场景的极速分析。
Delta Lake 是一个开源存储框架,可以使用 Spark、PrestoDB、Flink、Trino 和 Hive 等计算引擎以及适用于 Scala、Java、Rust、Ruby 和 Python 的 API 构建 Lakehouse 架构。
#数据仓库#Create full-fledged APIs for slowly moving datasets without writing a single line of code.
A native Rust library for Delta Lake, with bindings into Python
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Postgres-native Data Warehouse
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
An open protocol for secure data sharing
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Analytical database for data-driven Web applications 🪶
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Prod...
The Internals of Delta Lake
Sample project to demonstrate data engineering best practices
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
A Minimalistic Rust Implementation of Delta Sharing Server.
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline