lakeFS - Data version control for your data lake | Git for data
翻译 - 一个开源平台,可为基于对象存储的数据湖提供弹性和可管理性
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
翻译 - 用于 Hadoop 相关开源软件和 Google Cloud Platform 之间互操作性的库和工具。
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Kafka Connect FileSystem Connector
Data Engineering Project with Hadoop HDFS and Kafka
This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.
Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
SFTP server which works on the top of HDFS,It is based on Apache sshd to access and operate HDFS through SFTP protocol
Toy Hadoop cluster combining various SQL-on-Hadoop variants
OctopuFS library helps managing cloud storage, ADLSgen2 specifically. It allows you to operate on files (moving, copying, setting ACLs) in very efficient manner. Designed to work on databricks, but sh...
MapReduce Java Code Examples to learn Hadoop
Data pipeline to process and analyse Twitter data in a distributed fashion using Apache Spark and Airflow in AWS environment
Hadoop Filesystem Driver for Manta
In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.
Neat and Handy Place for all Hadoop codes