SeaweedFS 是一个用于blob、对象、文件和数据湖的分布式存储系统,可快速存储和服务数十亿个文件
#面试#More than 2000+ Data engineer interview questions.
#计算机科学#MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc....
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
Data Engineering Project with Hadoop HDFS and Kafka
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Big data projects implemented by Maniram yadav
旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)
HokStack - Run Hadoop Stack on Kubernetes
Open source data infrastructure platform. Designed for developers, built for speed.
Ansible Playbook For Setup Hadoop HDFS
A fully-functional Hadoop Yarn cluster as docker-compose deployment.
Twitter + Flume + Hadoop (HDFS, MapReduce) + Neo4j + Pyhton
Toy Hadoop cluster combining various SQL-on-Hadoop variants
A Java Hdfs client example and full Kerberos example for call hadoop commands directly in java code or on your local machine.
A MapReduce program to conduct sentiment analysis of a keyword from a list of comments.
Mammoth is a container based hadoop distributed system log analyzer. Sponsed by Mantech and Naver Cloud Platform.