dask · GitHub Topics

dask / dask

Parallel computing with task scheduling

翻译 - 任务调度的并行计算

dask Python pydata NumPy pandas scikit-learn SciPy

Python 13.12 k

4 天前

rapidsai / cudf

cuDF - GPU DataFrame Library

翻译 - cuDF-GPU数据框库

gpu rapids cudf arrow CUDA pandas dataframe dask 数据分析数据科学 pydata C++Python

C++ 8.85 k

1 天前

TDAmeritrade / stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis

翻译 - STUMPY是一个功能强大且可扩展的Python库，可用于各种时间序列数据挖掘任务

数据科学 time-series-analysis dask numba Python anomaly-detection pattern-matching pydata matrix-profile motif-discovery

Python 3.89 k

5 天前

pydata / xarray

N-D labeled arrays and datasets in Python

翻译 - Python中带有N-D标签的数组和数据集

Python netcdf NumPy pandas xarray dask

Python 3.76 k

3 天前

mars-project / mars

#计算机科学#Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

翻译 - 火星是一个基于张量的统一框架，用于大规模数据计算，可扩展Numpy，Pandas和Scikit-learn。

Python NumPy tensor pandas 机器学习 scikit-learn Tensorflow PyTorch xgboost lightgbm ray dataframe dask

Python 2.72 k

1 年前

jmcarpenter2 / swifter

A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

pandas pandas-dataframe parallel-computing parallelization dask modin

Python 2.6 k

1 年前

fugue-project / fugue

#计算机科学#A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

Apache Spark dask 机器学习 distributed-systems distributed-computing distributed SQL pandas

Python 2.07 k

14 天前

dask / distributed

A distributed task scheduler for Dask

pydata dask distributed-computing Python Hacktoberfest

Python 1.62 k

18 小时前

hi-primus / optimus

#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Apache Spark pyspark data-wrangling bigdata 数据科学 data-transformation 机器学习 data-profiling data-extraction data-exploration 数据分析 data-preparation cudf dask data-cleaning

Python 1.5 k

4 个月前

itamarst / eliot

Eliot: the logging system that tells you *why* it happened

Python Logging logging-library tracing causality twisted elasticsearch asyncio scientific-computing dask NumPy

Python 1.14 k

1 个月前

pytroll / satpy

Python package for earth-observing satellite data processing

翻译 - Python软件包，用于对地观测卫星数据处理

Python satellite weather Hacktoberfest dask xarray closember

Python 1.1 k

2 天前

Nixtla / mlforecast

#计算机科学#Scalable machine 🤖 learning for time series forecasting.

forecast forecasting 机器学习 lightgbm xgboost dask Python time-series

Python 1.01 k

13 天前

narwhals-dev / narwhals

Lightweight and extensible compatibility layer between dataframe libraries!

cudf pandas polars dask duckdb pyspark

Python 922

9 小时前

ranaroussi / pystore

Fast data store for Pandas time-series data

datastore dask parquet pandas timeseries 数据库 dataframe

Python 575

9 个月前

capitalone / datacompy

Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!

Python pandas Apache Spark data 数据科学 compare dataframes NumPy pyspark dask polars snowflake

Python 539

12 天前

polyaxon / traceml

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

pandas dataframes 数据科学 Apache Spark dask plotly 统计 matplotlib data-profiling 数据可视化 data-exploration DataOps mlops data-quality data-quality-checks explainable-ai PyTorch Tensorflow tracking

Python 515

5 天前

dask-contrib / dask-sql

Distributed SQL Engine in Python using Dask

sql-server SQL dask Python distributed 机器学习

Python 401

7 个月前

pytroll / pyresample

Geospatial image resampling in Python

Python NumPy resampling kd-tree Hacktoberfest dask xarray closember

Python 361

5 天前

Ouranosinc / xclim

Library of derived climate variables, ie climate indicators, based on xarray.

Python xarray dask

Python 353

1 天前

DataCanvasIO / HyperGBM

A full pipeline AutoML tool for tabular data

automl gbm xgboost lightgbm catboost semi-supervised-learning datacleaning preprocessing ensemble-learning tabular-data distributed-training dask gpu-acceleration rapidsai scikit-learn

Python 347

13 小时前