A curated list of awesome responsible machine learning resources.
Model interpretability and understanding for PyTorch
翻译 - PyTorch的模型可解释性和理解
Book about interpretable machine learning
翻译 - 关于可解释机器学习的书
A collection of infrastructure and tools for research in neural network interpretability.
#计算机科学#Interpretability and explainability of data and machine learning models
翻译 - 数据和机器学习模型的可解释性和可解释性
#Awesome#A curated list of Large Language Model (LLM) Interpretability resources.
A library for mechanistic interpretability of GPT-style language models
深度学习近年来关于神经网络模型解释性的相关高引用/顶会论文(附带代码)
PAIR.withgoogle.com and friend's work on interpretability methods
H2O.ai Machine Learning Interpretability Resources
Code for the TCAV ML interpretability project
Interpretability for sequence generation models 🐛 🔍
Interpretability Methods for tf.keras models with Tensorflow 2.x
翻译 - 使用Tensorflow 2.0的tf.keras模型的可解释性方法
Network Dissection http://netdissect.csail.mit.edu for quantifying interpretability of deep CNNs.
#计算机科学#🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
翻译 - Shapash使机器学习模型透明且每个人都可以理解
🏥 Visualizing Convolutional Networks for MRI-based Diagnosis of Alzheimer’s Disease
A Bayesian Neural Network with a horseshoe prior for improved interpretability
#自然语言处理#The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
翻译 - 语言可解释性工具:在可扩展且框架无关的界面中交互式分析NLP模型以理解模型。
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
Interpret Community extends Interpret repository with additional interpretability techniques and utility functions to handle real-world datasets and workflows.
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.