Even though there have been far larger and more recent
incidents, this version hasn’t been updated as I’m preparing a
full website dedicated to this topic. It will have several 10-15 minute
videos broken into sections covering all content from this talk as well
as updates. Check eliogrieco.com
for updates.
Who am I?
Elio Grieco
20+ years of professional programming
EDA (Electronic Design Automation) Software
Business Analytics as an Appliance
Data Audits
Code Audits
Seller Central Platform team at Amazon
What is a bug?
Undefined or Improperly Defined Behavior
Undefined/unanticipated behavior
Bugs are a break between the specification or intended behavior of a
program and the actual behavior of the program.
Programming is Deterministic
Bugs are always the programmer’s fault.
That said, programming is hard. Telling the computer
exactly what needs to be done in a way that always works is
extremely difficult.
Categories of Lethal Bugs
System Defects
The program doesn’t do what is intended or expected.
Crashes, Hangs, and Delays
Incorrect results
Interface Confusion
The program “works” but pressing the “obvious” button causes an
action to occur that is different than expected.
Interface is confusing e.g. Norman Doors
Scripting language is unclear
Sociological Interaction
The program functions “as intended”, but interacts with psychology,
sociology and social norms in ways that are harmful or unintended.
Facebook and other social media
Google and other ad networks
Slack and other distraction-ware
Failure Points and Propagation
Good systems should tolerate subsystem failures without a total
failure.
Single Points of Failure
Cascading errors
Not a Bug
This isn’t a talk about adversarial software issues. We won’t be
covering things that are intentionally harmful.
Hacking and Cyber Warfare
Dark patterns
Lethal Bugs
Therac-25
Killed 5 people
Race Condition
Designed by one engineer
No testing
No hardware interlocks
Patriot Missile System
Clock skew bug
Must restart every 24 hours
Air France 447
Stall indicator came on and then went off
No haptic feedback between control sticks
Boeing 737 Max 8
Bad aerodynamics, software used to “compensate”
No way to disable the automated system
Lack of redundancy in sensors
Telanga
Lack of testing
Deaths due to sociological interactions
Just because you are not building “critical systems” doesn’t mean
your work isn’t capable of killing
Unreported errors
There are at least a few more that are not known to the public. How
many exactly, we don’t know.
Toyota
NHS Ransomware
Why does this matter?
Software is Eating the World
Software will be in more critical systems
Systems that you don’t think are “critical” will be
Bad interactions will prove increasingly dangerous
Depending on undependable software will cost lives via lack of
safety nets
Bigger, Faster, More Fragile
Reduced “friction” in and between systems will provide less
“dampening” and lead to higher volatility
Damping, friction and firebreaks are a good thing
The more you connect, and the faster you connect it, the faster and
further failures propagate
Opaque and Unpredictable
Trendy technologies are a plague.
AI is opaque and some algorithms suffer catastrophic
interference
AI cheats
AI can be tricked
Blockchain
Tech Fatigue and the Techlash
People use what they trust
If software continues to be unreliable, people will likely stop
using it
The economics will fail, breaches
are expensive and increasingly common
Solutions for the Public
Prefer Simpler Solutions
Passive solutions beat active solutions: Tricycle > Segway
Static websites work best for most content that isn’t a simulation
or database
Prefer local compute to cloud
Prefer Open Source
It’s auditable: With enough eyes, all bugs are shallow
It can be maintained if the company goes under
Systems are made of other systems. Unmaintained subsystems make the
overall system less reliable.
Add Damping and Firebreaks
Damping should slow the spread and impact of an issue
Firebreaks should contain a failure or malicious action to one
subsystem