集合主题趋势排行榜

apachespark

DataExpert-io / data-engineer-handbook

数据工程师学习资源清单

apachespark Awesome Lists bigdata data dataengineering SQL

Jupyter Notebook 38.21 k

4 天前

apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.

hudi apachehudi datalake bigdata apachespark incremental-processing stream-processing data-integration apacheflink

Java 5.99 k

14 小时前

holdenk / sparkProjectTemplate.g8

Template for Spark Projects

apachespark Apache Spark

Scala 102

1 年前

martandsingh / ApacheSpark

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We wil...

apachespark 数据分析 data-engineering 数据库 databricks datalake deltalake etl-pipeline hadoop hive Apache Spark spark-sql spark-streaming timetravel etl pyspark SQL

Python 102

21 天前

funkyminds / cleanframes

type-class based data cleansing library for Apache Spark SQL

Apache Spark sparksql Scala bigdata apachespark

Scala 78

6 年前

josephmachado / docker_for_data_engineers

Code for blog at: https://www.startdataengineering.com/post/docker-for-de/

apachespark Docker Docker Compose pyspark

C 40

1 年前

propelledanalytics / SparkSQL.jl

SparkSQL.jl enables Julia programs to work with Apache Spark data using just SQL.

Apache Spark Julia 语言 apachespark

Julia 25

2 年前

tspannhw / FLiPStackWeekly

FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...

apacheflink apachespark cloudera lakehouse streaming

1 天前

aravinthsci / Spark_Delta_Lake

Delta Lake Examples

Apache Spark apachespark delta-lake deltalake datalake

Jupyter Notebook 12

5 年前

SmartDataAnalytics / MA-INF-4223-DBDA-Lab

#计算机科学#Repository for Lab “Distributed Big Data Analytics” (MA-INF 4223), University of Bonn

teaching apachespark bigdata semantics 机器学习 RDF (Resource Description Framework)university

Jupyter Notebook 10

3 年前

SandeepAswathnarayana / professional-certificate-programs

This repository contains all the projects and labs I worked on while pursuing professional certificate programs, specializations, and bootcamp. [Areas: Deep Learning, Machine Learning, Applied Data Sc...

深度学习机器学习 datascience recurrent-neural-networks Python PyTorch Tensorflow pandas NumPy matplotlib SciPy scikit-learn recommender-system restricted-boltzmann-machine seaborn autoencoder image-classification apachespark

Jupyter Notebook 9

5 年前

CarolinaNicasio / APACHESPARK-PYSPARK-2023

PySpark es una biblioteca de procesamiento de datos distribuidos en Python que permite procesar grandes volúmenes de datos en clústeres utilizando el framework Apache Spark, ofreciendo un alto rendim...

apache apachespark 数据科学 dataframe Actions pyspark Python Apache Spark

2 年前

datumbrain / gossub

Trigger spark-submit in Golang. A Go implementation of famous SparkLauncher.java.

Apache Spark apachespark Go

Go 7

5 年前

sfrechette / spark-jdbc-mssql

Connect to SQL Server using Apache Spark

sql-server jdbc-driver Apache Spark Scala apachespark

Scala 7

9 年前

lensesio / lenses-jdbc-spark

Apache Spark with Kafka via JDBC !!!

kafka apachespark jdbc-driver

Java 6

7 年前

funkyminds / cleanframes-examples

Examples usages for cleanframes library

Apache Spark sparksql bigdata Scala apachespark

Scala 5

6 年前

sahith / Link-Prediction-for-Citation-Networks-using-Apache-Spark

Link Prediction is about predicting the future connections in a graph. In this project, Link Prediction is about predicting whether two authors will be collaborating for their future paper or not give...

Scala Amazon Web Services emr apachespark dataframes s3 bigdata big-data big-data-analytics databricks

Scala 5

6 年前