Pull in some older stuff from the wiki
This commit is contained in:
@@ -0,0 +1,27 @@
|
||||
**Kaizen Handout / Cheat-sheet**
|
||||
|
||||
This handout describes the three criteria an improvement must meet to be “[[kaizen]]”. An improvement that doesn’t meet all three criteria cannot be considered kaizen.
|
||||
|
||||
**1) Other** **people,** **processes, or areas the improvement might affect have been considered.**
|
||||
|
||||
**2) The improvement is a one-time change, provides lasting value, and can be acted on right now.**
|
||||
|
||||
**3) There is a way to measure the improvement.**
|
||||
|
||||
|
||||
**_Elaboration / Whys_**
|
||||
|
||||
**1) Other people, processes, or areas the improvement might affect have been fully considered.**
|
||||
|
||||
This is to make sure the improvement does not cause new problems in other unrelated areas. Sometimes an improvement can seem great, but causes problems for other people.
|
||||
|
||||
Example of the problem: Multiple people access a supply cabinet on a regular basis, and they have trouble finding what they’re looking for. If one person reorganizes the cabinet without speaking with others first, they might make it harder for other people to find what they need.
|
||||
|
||||
|
||||
**2) The improvement is a one-time change, provides lasting value, and can be acted on right now.**
|
||||
|
||||
This is to make sure:
|
||||
|
||||
- The improvement doesn’t just add an extra step or more work, unless that extra step or work decreases the _overall_ amount of work in some lasting way. The exception can be making a schedule for something if one doesn’t already exist.
|
||||
|
||||
- The improvement won’t be delayed or forgotten because we’re saving up money, waiting for something to go on sale, or expecting some other person or group to help. It can also be a problem when the improvement has a lot of complicated steps or multiple people required to put it in place.
|
||||
@@ -0,0 +1,26 @@
|
||||
---
|
||||
author: Jason Thistlethwaite
|
||||
tags:
|
||||
- CI
|
||||
lastRevised: 2021-08-05
|
||||
---
|
||||
[Kanban](https://en.wikipedia.org/wiki/Kanban "https://en.wikipedia.org/wiki/Kanban") is a method of communicating requirements in lean manufacturing invented by Taiichi Ohno, an engineer for Toyota. It aims to improve efficiency in production environments by preventing overproduction and reducing overhead.
|
||||
![[kanban.png]]
|
||||
Kanban works by implementing a few basic concepts:
|
||||
|
||||
1. All work is accompanied by a “card” that has the instructions on it.
|
||||
|
||||
2. The cards are numbered, and there is a limited amount of them.
|
||||
|
||||
3. When a card is completed, it gets returned to the person who put the instructions on it. This let's them know to provide more instructions, and the work is completed.
|
||||
|
||||
|
||||
Often, the card isn't a physical piece of paper.
|
||||
|
||||
## Kanban Board
|
||||
|
||||
A kanban board is like a dashboard showing where all the cards are currently located in each production line, and how many exist for each step of production.
|
||||
|
||||
|Workflow|Needs Done (10)|In Progress (3)|Finished|
|
||||
|---|---|---|---|
|
||||
|Make a Pizza||||
|
||||
@@ -0,0 +1,53 @@
|
||||
---
|
||||
aliases:
|
||||
- site reliability engineering
|
||||
author: Jason Thistlethwaite
|
||||
lastRevised: 2021-12-18
|
||||
---
|
||||
# Site Reliability Engineering, aka SRE
|
||||
|
||||
A practice originating at Google from 2003, SRE is a methodology for ensuring the reliability and performance of interoperating systems.
|
||||
|
||||
Much of this article is the personal opinion of our current CEO and co-founder, Jason.
|
||||
|
||||
# Why
|
||||
|
||||
We see droves of people prepping for the end of the world in ways that probably won't help them, then they die in car crashes because they don't wear seatbelts. During COVID, we saw people panic buying household supplies they don't need, then losing their jobs.
|
||||
|
||||
Humans struggle to be honest with themselves and one another about how well they are truly prepared for disasters, as well as how probable disasters truly are.
|
||||
|
||||
SRE as a practice, although it comes from software engineering, works to identify and implement reliable solutions to probable disasters.
|
||||
|
||||
# What
|
||||
|
||||
SRE as a practice focuses on the following 4 things:
|
||||
|
||||
- Automation or elimination of anything repetitive that's also cost-effective to automate or eliminate (also known as [[Toil]]).
|
||||
|
||||
- Avoidance to pursue much more reliability than what's strictly necessary. Defining what's necessary is a practice by itself.
|
||||
|
||||
- Systems design with a bias toward reduction of risks to availability, latency, and efficiency.
|
||||
|
||||
- Observability, as in, the ability to be able to ask arbitrary questions about your system without having to know ahead of time what you wanted to ask.
|
||||
|
||||
# How
|
||||
|
||||
SRE was designed for application to software systems, but like many such practices, it can be helpful to adapt for other uses. A proper SRE implementation includes the following aspects:
|
||||
|
||||
1. [[Toil]] management as the implementation of principle 1.
|
||||
|
||||
2. Defining and measuring reliability goals, such as SLIs, SLOs, and error budgets.
|
||||
|
||||
3. Designing for and implementing observability.
|
||||
|
||||
4. Defining, testing, and running an incident management process.
|
||||
|
||||
5. Capacity planning.
|
||||
|
||||
6. Change and release management.
|
||||
|
||||
7. [[Chaos Engineering]]
|
||||
|
||||
## Toil Management
|
||||
|
||||
Toil management is the practice of regularly identifying [[Toil]] in the business and ensuring there is forward progress at reducing it. LDR implements this in a few different ways
|
||||
Reference in New Issue
Block a user