delta-lake · GitHub Topics

Doris 是百度开源的支持对海量大数据进行快速分析的MPP数据库。

olap 数据库 hadoop hive hudi iceberg real-time SQL BigQuery dbt delta-lake elt etl lakehouse query-engine redshift snowflake Apache Spark

Java 13.46 k

1 天前

trinodb / trino

trino 是一个分布式大数据 SQL 查询引擎（前身 PrestoSQL）

Java presto hive hadoop big-data SQL prestodb 数据库 distributed-systems distributed-database 数据科学 datalake jdbc query-engine trino analytics delta-lake iceberg

Java 11.12 k

14 小时前

StarRocks / starrocks

StarRocks 是新一代极速全场景 MPP (Massively Parallel Processing) 数据库。StarRocks 的愿景是能够让用户的数据分析变得更加简单和敏捷。用户无需经过复杂的预处理，就可以用 StarRocks 来支持多种数据分析场景的极速分析。

Java 9.81 k

14 小时前

delta-io / delta

Delta Lake 是一个开源存储框架，可以使用 Spark、PrestoDB、Flink、Trino 和 Hive 等计算引擎以及适用于 Scala、Java、Rust、Ruby 和 Python 的 API 构建 Lakehouse 架构。

Apache Spark acid big-data analytics delta-lake

Scala 7.94 k

1 天前

roapi / roapi

#数据仓库#Create full-fledged APIs for slowly moving datasets without writing a single line of code.

SQL GraphQL arrow REST API analytics Query (disambiguation)columnar Rust in-memory-database datafusion blob-storage cloud-native parquet 数据集 s3 delta-lake

Rust 3.28 k

20 天前

delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python

delta Rust delta-lake databricks Python pandas pandas-dataframe

Rust 2.67 k

12 小时前

databricks / LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Apache Spark spark-sql spark-mllib mlflow delta-lake

Scala 1.28 k

3 个月前

Mooncake-Labs / pg_mooncake

Postgres Data Warehouse, built on Iceberg

analytics columnstore delta-lake iceberg lakehouse parquet PostgreSQL

C++ 1.27 k

10 小时前

apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.

apache-hudi apache-iceberg delta-lake

Java 1.02 k

4 天前

delta-io / delta-sharing

An open protocol for secure data sharing

big-data Apache Spark pandas delta-lake

Scala 821

2 天前

Nike-Inc / koheesio

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

data-engineering delta-lake pydantic pyspark Python

Python 630

3 天前

splitgraph / seafowl

Analytical database for data-driven Web applications 🪶

数据库 HTTP SQL API Edge Serverless 可视化 Rust datafusion delta-lake

Rust 482

2 个月前

aws-samples / amazon-sagemaker-local-mode

#计算机科学#Amazon SageMaker Local Mode Examples

sagemaker amazon-sagemaker PyTorch catboost lightgbm PyCharm tensorflow-training prophet scikit-learn huggingface huggingface-transformers 机器学习 delta-lake gensim-word2vec dask Tensorflow

Python 255

2 个月前

adidas / lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Prod...

big-data configuration-driven data-engineering data-quality databricks delta-lake 框架 great-expectations lakehouse Apache Spark

Python 241

2 个月前

josephmachado / data_engineering_best_practices

Sample project to demonstrate data engineering best practices

data-engineering delta-lake etl great-expectations minio pyspark Apache Spark

Python 184

1 年前

japila-books / delta-lake-internals

The Internals of Delta Lake

deltalake book internals delta-lake books datalake

183

3 个月前

izhangzhihao / Real-time-Data-Warehouse

Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi

flink data-warehouse data-warehousing flink-sql debezium kafka elasticsearch delta-lake cdc change-data-capture hudi iceberg SQL datalake delta deltalake Apache Spark spark-sql

Dockerfile 113

1 年前

delta-incubator / delta-sharing-rs

A Minimalistic Rust Implementation of Delta Sharing Server.

axum data-engineering delta-lake Rust

Rust 89

1 个月前

anneglienke / 101_upsert-delta

This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.

delta delta-lake deltalake

Python 86

3 年前

tikal-fuseday / delta-architecture

Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline

debezium delta-lake 数据库 streams kafka data-pipeline Apache Spark

HTML 75

2 年前