the portable Python dataframe library
#计算机科学#Simple and Distributed Machine Learning
#自然语言处理#State of the Art Natural Language Processing
翻译 - 最先进的自然语言处理
Linkis 在上层应用和底层引擎之间构建了一层计算中间件。通过使用Linkis 提供的REST/WebSocket/JDBC 等标准接口,上层应用可以方便地连接访问Spark, Presto, Flink 等底层引擎,同时实现跨引擎上下文共享、统一的计算任务和引擎治理与编排能力
Implementing best practices for PySpark ETL jobs and applications.
#计算机科学#Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, a...
A curated list of awesome Apache Spark packages and resources.
#计算机科学#Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Jupyter magics and kernels for working with remote Spark clusters
PySpark-Tutorial provides basic algorithms using PySpark
#计算机科学#Hopsworks - Data-Intensive AI platform with a Feature Store
#计算机科学# MapReduce, Spark, Java, and Scala for Data Algorithms Book
#计算机科学#Sparkling Water provides H2O functionality inside Spark cluster
#网络爬虫#Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as ...
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML...