Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
翻译 - 精选的公开资源集合,介绍了世界各地的技术和精通技术的组织如何实践站点可靠性工程(SRE)
At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
翻译 - 在LinkedIn,我们正在使用此课程将入门级人才培养为SRE角色。
Terraform Pull Request Automation
翻译 - 地形拉取请求自动化
Site Reliability Engineer Interview Preparation Guide
⭐ 【开源书籍】深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。如发现错误,谢谢提issue
Compilation of public failure/horror stories related to Kubernetes
翻译 - 编写与Kubernetes相关的公共失败/恐怖故事
A checklist of anyone practicing Site Reliability Engineering
#Awesome# A curated list of awesome DevOps platforms, tools, practices and resources
Cloud Native DataOps & AIOps Platform | 云原生数智运维平台
Layerform helps engineers create reusable environment stacks using plain .tf files. Ideal for multiple "staging" environments.
A framework for gradual system automation
翻译 - 逐步系统自动化的框架
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world use Amazon Web Services (AWS)
翻译 - 精选的公开资源集合,介绍了世界各地的技术和精通技术的组织如何使用Amazon Web Services(AWS)
Kubernetes utility for exposing image versions in use, compared to latest available upstream, as metrics.
翻译 - Kubernetes实用程序用于公开使用中的映像版本(与最新的上游可用版本相比)作为指标。