Doomsday Machines

Elio Grieco

1-July-2019

Preamble

This talk was originally given at SouthWest Cyber Security Forum (SWCSF) on July 1st, 2019.

Even though there have been far larger and more recent incidents, this version hasn’t been updated as I’m preparing a full website dedicated to this topic. It will have several 10-15 minute videos broken into sections covering all content from this talk as well as updates. Check eliogrieco.com for updates.

Who am I?

Elio Grieco

20+ years of professional programming

  • EDA (Electronic Design Automation) Software
  • Business Analytics as an Appliance
  • Data Audits
  • Code Audits
  • Seller Central Platform team at Amazon

What is a bug?

Undefined or Improperly Defined Behavior

Undefined/unanticipated behavior

Bugs are a break between the specification or intended behavior of a program and the actual behavior of the program.

Programming is Deterministic

Bugs are always the programmer’s fault.

That said, programming is hard. Telling the computer exactly what needs to be done in a way that always works is extremely difficult.

Categories of Lethal Bugs

System Defects

The program doesn’t do what is intended or expected.

  • Crashes, Hangs, and Delays
  • Incorrect results

Interface Confusion

The program “works” but pressing the “obvious” button causes an action to occur that is different than expected.

  • Interface is confusing e.g. Norman Doors
  • Scripting language is unclear

Sociological Interaction

The program functions “as intended”, but interacts with psychology, sociology and social norms in ways that are harmful or unintended.

  • Facebook and other social media
  • Google and other ad networks
  • Slack and other distraction-ware

Failure Points and Propagation

Good systems should tolerate subsystem failures without a total failure.

  • Single Points of Failure
  • Cascading errors

2003 Power Outage

Not a Bug

This isn’t a talk about adversarial software issues. We won’t be covering things that are intentionally harmful.

  • Hacking and Cyber Warfare
  • Dark patterns

Lethal Bugs

Therac-25

  • Killed 5 people
  • Race Condition
  • Designed by one engineer
  • No testing
  • No hardware interlocks

Therac 25

Patriot Missile System

  • Clock skew bug
  • Must restart every 24 hours

Air France 447

  • Stall indicator came on and then went off
  • No haptic feedback between control sticks

Boeing 737 Max 8

  • Bad aerodynamics, software used to “compensate”
  • No way to disable the automated system
  • Lack of redundancy in sensors

Boeing 737 MAX 8

Telanga

  • Lack of testing
  • Deaths due to sociological interactions
  • Just because you are not building “critical systems” doesn’t mean your work isn’t capable of killing

Unreported errors

There are at least a few more that are not known to the public. How many exactly, we don’t know.

  • Toyota
  • NHS Ransomware

Why does this matter?

Software is Eating the World

  • Software will be in more critical systems
  • Systems that you don’t think are “critical” will be
  • Bad interactions will prove increasingly dangerous
  • Depending on undependable software will cost lives via lack of safety nets

Bigger, Faster, More Fragile

Reduced “friction” in and between systems will provide less “dampening” and lead to higher volatility

  • Damping, friction and firebreaks are a good thing
  • The more you connect, and the faster you connect it, the faster and further failures propagate

Opaque and Unpredictable

Trendy technologies are a plague.

  • AI is opaque and some algorithms suffer catastrophic interference
  • AI cheats
  • AI can be tricked
  • Blockchain

Tech Fatigue and the Techlash

  • People use what they trust
  • If software continues to be unreliable, people will likely stop using it
  • The economics will fail, breaches are expensive and increasingly common

Solutions for the Public

Prefer Simpler Solutions

  • Passive solutions beat active solutions: Tricycle > Segway
  • Static websites work best for most content that isn’t a simulation or database
  • Prefer local compute to cloud

Prefer Open Source

  • It’s auditable: With enough eyes, all bugs are shallow
  • It can be maintained if the company goes under
  • Systems are made of other systems. Unmaintained subsystems make the overall system less reliable.

Add Damping and Firebreaks

  • Damping should slow the spread and impact of an issue
  • Firebreaks should contain a failure or malicious action to one subsystem

Political Action and Regulation

  • Educate your legislators and regulators
  • Demand regulation of software
  • …just like a real engineering discpline

Postmortems and Accountability

Solutions for Engineers

Understand the Problem

Meets The Specs

Use Better Tooling

  • Prefer systems that help you catch bugs. Languages should help you find and avoid bugs.
  • Use well defined languages: Rust, Erlang, Haskell, etc.

Powerful Type Systems Catch Obvious Errors

Rust is a language designed to prevent memory errors and data races.

Microsoft: 70 percent of all security bugs are memory safety issues

Defense in Depth

Use additional tooling to catch bugs e.g. static and dynamic analysis systems.

Thank You

Elio Grieco

grieco@egx.org