Working on the Right Things

Why Reliability Engineers Should Move to the Bottleneck, Not the Job Description

The reliability engineer’s most consequential decision in any given week is rarely a technical one. It is the decision about which problem, of the many available problems, deserves the next hour of attention. Most reliability engineers are not explicitly asked this question. They are given a job description, a department, and a set of tools, and they work on whatever the inbox delivers. The result is that competent reliability work is routinely directed at problems whose solution would deliver only marginal benefit, while much larger sources of organisational loss go unaddressed because they are someone else’s department. This paper sets out 10 principles for working on the right things: knowing what reliability work the certifications actually cover, recognising that the work is often led by people whose titles say something else, moving to the bottleneck rather than the job description, partnering with finance to defend the value of the work, distinguishing systemic change from incremental improvement, and treating the question of return on each hour as the central question of the role. The principles are drawn from a Speaking of Reliability conversation between Philip Sage and Fred Schenkelberg and have been translated here into a structured engineering doctrine in the The Mantua Group (TMG) voice.

Software Expertise

Terms & Policies

Useful Links

Follow Us