Principles behind the Speech Transmission Index
Speech is by nature a modulated signal. It contains noisy and tonal parts, covering the frequency spectrum between (roughly) 100 Hz and 10,000 Hz. This means that the so-called "phone band" between 300-3400 Hz covers only part of the relevant spectral range; this puts the upper STI limit of classical telephony systems (including GSM) at a value around 0.70.
Since the signal is indeed modulated, speech also has an associated modulation spectrum: the range of amplitude modulation frequencies applied by the human vocal system. Again roughly, this stretches from 0.5 to 30 Hz. The key fact on which the STI is based is this one:
In almost every case, loss of modulations (decrease of modulation depth) is equivalent to loss of intelligibility.
The same holds true in the optical domain. Optical systems are limited in their ability to transfer (spatial) modulation frequencies, in turn limiting the visibility of the signal. This is expressed through the Modulation Transfer Function, which became popular in optics in the 1970s. This inspired Steeneken and Houtgast to apply the same principle to speech - which turned out to produce amazingly accurate predictions of intelligibility.