#Awesome#A curated list of Site Reliability and Production Engineering resources.
翻译 - 站点可靠性和生产工程资源的精选列表。
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd....
翻译 - Litmus是一个使用kubernetes本机方式进行混沌工程的工具集。 Litmus为Cloud-Native开发人员和SRE提供混乱的CRD,以注入,编排和监视混乱,以发现生产中Kubernetes部署的弱点。
A checklist of anyone practicing Site Reliability Engineering
Hands on labs and code to help you learn, measure, and build using architectural best practices.
翻译 - 动手练习和编写代码,以帮助您使用体系结构最佳实践来学习,衡量和构建。
Chaos Engineering Toolkit & Orchestration for Developers
#Awesome#A curated list of Site Reliability and Production Engineering Tools
This repository provides a design methodology and approach to building highly-reliable applications on Microsoft Azure for mission-critical workloads.
翻译 - AlwaysOn provides a design methodology and approach to building highly-reliable applications on Microsoft Azure for mission-critical workloads.
Reliability engineering toolkit for Python - https://reliability.readthedocs.io/en/latest/
Serverless chaos monkey for AWS (runs on AWS Lambda) ☁️ 💥
OpenShift Guide. Learn about the Red Hat OpenShift Container Platform, Data Science, Code Ready Containers, Podman, Buildah, and Kubernetes.
Probabilistic Risk Analysis Tool (fault tree analysis, event tree analysis, etc.)
The k6 documentation website.
The Chaos Toolkit core library
An opinionated list of attributes and policies that need to be met in order to establish a stable software system.
A terraform provider for Concourse
A Python package for survival analysis. The most flexible survival analysis package available. SurPyval can work with arbitrary combinations of observed, censored, and truncated data. SurPyval can als...
A collection templates ported from the SRE Workbook