#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
#计算机科学#Machine learning with dataframes
#大语言模型#Scalable data pre processing and curation toolkit for LLMs
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
#大语言模型#Open source project for data preparation for GenAI applications
#计算机科学#Data Preparation for Satellite Machine Learning
#计算机科学#A New, Interactive Approach to Learning Data Science
#学习与技能提升#An open source book to learn data science, data analysis and machine learning, suitable for all ages!
#数据仓库#🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
【AAAI'2021】MVFNet: Multi-View Fusion Network for Efficient Video Recognition
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
ABAP unit testing framework, prepare in Excel, reuse in abap code
#大语言模型#Go web crawler to scrape documentation sites and convert content to clean Markdown for LLM ingestion (RAG, training data).
#计算机科学#Accelerating AI Training and Inference from Storage Perspective (Must-read Papers on Storage for AI)
“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumP...
Market Mix Modelling for an eCommerce firm to estimate the impact of various marketing levers on sales
GWAS summary statistics files QC tool
Data preparation for data science projects.
Extract and evaluate radiomics for liver cancer tumors from DICOM segmentation masks. Using SimpleITK, PyRadiomics and PyDicom.