hadoop-hdfs

SeaweedFS 是一个用于blob、对象、文件和数据湖的分布式存储系统，可快速存储和服务数十亿个文件

distributed-storage distributed-systems s3 hdfs fuse distributed-file-system hadoop-hdfs posix tiered-file-system Kubernetes replication object-storage s3-storage seaweedfs erasure-coding blob-storage cloud-drive

Go 26.03 k

3 小时前

OBenner / data-engineering-interview-questions

#面试#More than 2000+ Data engineer interview questions.

data-engineering 面试 hadoop hadoop-hdfs Apache Spark flink SQL kafka hive impala airflow Amazon Web Services Azure Apache Cassandra flume hbase avro nifi 数据结构

1.45 k

1 个月前

Morphl-AI / MorphL-Community-Edition

#计算机科学#MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc....

人工智能机器学习用户体验(UX)front-end-development pyspark Apache Cassandra Kubernetes hadoop-hdfs pipeline

Python 259

6 年前

linkedin / dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

hadoop hadoop-filesystem hdfs Testing scale performance-testing performance-test performance-analysis performance-metrics hadoop-hdfs

Java 133

2 年前

AhmetFurkanDEMIR / Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

data data-engineer data-engineering data-engineering-pipeline Docker Docker Compose hadoop hadoop-filesystem hadoop-hdfs hdfs kafka kafka-consumer kafka-producer kafka-ui Python

Python 119

2 年前

groda / big_data

Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.

big-data bigdata Apache Spark spark-sql Docker mapreduce pyspark hadoop Jupyter Notebook hadoop-hdfs hadoop-mapreduce

Jupyter Notebook 84

3 天前

IBM / sparksql-for-hbase

Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers

hbase Apache Spark SQL NoSQL hadoop-hdfs

1 个月前

vim89 / datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...

Apache Spark spark-sql Python pyspark etl etl-pipeline etl-framework XML xml-parsing datalake big-data hadoop hadoop-mapreduce hadoop-hdfs data-pipeline

Python 55

2 年前