Pull in some older stuff from the wiki

This commit is contained in:
Jason Thistlethwaite
2024-01-31 16:23:01 -05:00
parent d844dafd2c
commit 7f7b840c35
37 changed files with 310 additions and 1 deletions
@@ -0,0 +1,27 @@
**Kaizen Handout / Cheat-sheet**
This handout describes the three criteria an improvement must meet to be “[[kaizen]]”. An improvement that doesnt meet all three criteria cannot be considered kaizen.
**1) Other** **people,** **processes, or areas the improvement might affect have been considered.**
**2) The improvement is a one-time change, provides lasting value, and can be acted on right now.**
**3) There is a way to measure the improvement.**
**_Elaboration / Whys_**
**1) Other people, processes, or areas the improvement might affect have been fully considered.**
This is to make sure the improvement does not cause new problems in other unrelated areas. Sometimes an improvement can seem great, but causes problems for other people.
Example of the problem: Multiple people access a supply cabinet on a regular basis, and they have trouble finding what theyre looking for. If one person reorganizes the cabinet without speaking with others first, they might make it harder for other people to find what they need.
**2) The improvement is a one-time change, provides lasting value, and can be acted on right now.**
This is to make sure:
- The improvement doesnt just add an extra step or more work, unless that extra step or work decreases the _overall_ amount of work in some lasting way. The exception can be making a schedule for something if one doesnt already exist.
- The improvement wont be delayed or forgotten because were saving up money, waiting for something to go on sale, or expecting some other person or group to help. It can also be a problem when the improvement has a lot of complicated steps or multiple people required to put it in place.
@@ -0,0 +1,26 @@
---
author: Jason Thistlethwaite
tags:
- CI
lastRevised: 2021-08-05
---
[Kanban](https://en.wikipedia.org/wiki/Kanban "https://en.wikipedia.org/wiki/Kanban") is a method of communicating requirements in lean manufacturing invented by Taiichi Ohno, an engineer for Toyota. It aims to improve efficiency in production environments by preventing overproduction and reducing overhead.
![[kanban.png]]
Kanban works by implementing a few basic concepts:
1. All work is accompanied by a “card” that has the instructions on it.
2. The cards are numbered, and there is a limited amount of them.
3. When a card is completed, it gets returned to the person who put the instructions on it. This let's them know to provide more instructions, and the work is completed.
Often, the card isn't a physical piece of paper.
## Kanban Board
A kanban board is like a dashboard showing where all the cards are currently located in each production line, and how many exist for each step of production.
|Workflow|Needs Done (10)|In Progress (3)|Finished|
|---|---|---|---|
|Make a Pizza||||
+53
View File
@@ -0,0 +1,53 @@
---
aliases:
- site reliability engineering
author: Jason Thistlethwaite
lastRevised: 2021-12-18
---
# Site Reliability Engineering, aka SRE
A practice originating at Google from 2003, SRE is a methodology for ensuring the reliability and performance of interoperating systems.
Much of this article is the personal opinion of our current CEO and co-founder, Jason.
# Why
We see droves of people prepping for the end of the world in ways that probably won't help them, then they die in car crashes because they don't wear seatbelts. During COVID, we saw people panic buying household supplies they don't need, then losing their jobs.
Humans struggle to be honest with themselves and one another about how well they are truly prepared for disasters, as well as how probable disasters truly are.
SRE as a practice, although it comes from software engineering, works to identify and implement reliable solutions to probable disasters.
# What
SRE as a practice focuses on the following 4 things:
- Automation or elimination of anything repetitive that's also cost-effective to automate or eliminate (also known as [[Toil]]).
- Avoidance to pursue much more reliability than what's strictly necessary. Defining what's necessary is a practice by itself.
- Systems design with a bias toward reduction of risks to availability, latency, and efficiency.
- Observability, as in, the ability to be able to ask arbitrary questions about your system without having to know ahead of time what you wanted to ask.
# How
SRE was designed for application to software systems, but like many such practices, it can be helpful to adapt for other uses. A proper SRE implementation includes the following aspects:
1. [[Toil]] management as the implementation of principle 1.
2. Defining and measuring reliability goals, such as SLIs, SLOs, and error budgets.
3. Designing for and implementing observability.
4. Defining, testing, and running an incident management process.
5. Capacity planning.
6. Change and release management.
7. [[Chaos Engineering]]
## Toil Management
Toil management is the practice of regularly identifying [[Toil]] in the business and ensuring there is forward progress at reducing it. LDR implements this in a few different ways