#计算机科学# MapReduce, Spark, Java, and Scala for Data Algorithms Book
Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.
#网络爬虫#Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. Explore a variety of tutorials and demonstrations on Big Data technologies, primarily in the form of Jupyter notebooks. Most notebooks are s...
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Big data projects implemented by Maniram yadav
K-Means algorithm implementation with Hadoop and Spark for the course of Cloud Computing of the MSc AIDE at the University of Pisa.
A collection of mapreduce problems and solutions
This contain how to install Hadoop on google colab and how to run map-reduce in Hadoop
Projects done in the Cloud Computing course.
Source code for the examples in the book Cloud Computing Solutions Architect: A Hands-On Approach by Arshdeep Bahga and Vijay Madisetti
Student projects in Big Data field.
Data Engineering Course
2021 Spring (Distributed Computing Systems) 分布式系统与编程
Twitter + Flume + Hadoop (HDFS, MapReduce) + Neo4j + Pyhton
This repository contains a simple Hadoop-like (MapReduce) distributed computing platform implemented in Java. It is extended from a course project at UIUC awarded the best Java version implementation ...