Update 'content/topic/against-the-coming-world-of-listening-machines.md'

5 years ago · 4f093f9c24
--- a/content/topic/against-the-coming-world-of-listening-machines.md
+++ b/content/topic/against-the-coming-world-of-listening-machines.md
@@ -1,48 +1,48 @@
 ---
 title: "Against the coming world of listening machines"
 has_lessons: []
 ---
 # Against the coming world of listening machines

 ## Alexa, what is machine listening?

 "Machine listening" is one common term for a fast-growing interdisciplinary field of science and engineering which uses audio signal processing and machine learning to "make sense" of sound and speech. Machine listening is what enables you to be "understood" by Siri and Alexa, to Shazam a song, and to interact with many audio-assistive technologies if you are blind or vision impaired. It was also, of course, a cornerstone of the mass surveillance programs revealed by Edward Snowden in 2013: SPIRITFIRE's "speech-to-text keyword search and paired dialogue transcription"; EViTAP's "automated news monitoring"; VoiceRT's "ingestion", according to one NSA slide, of Iraqi voice data into voiceprints. Domestically, machine listening technologies underpin the vast databases of vocal biometrics now held by many prison providers and, for instance, the Australian Tax Office. And they are quickly being integrated into infrastructures of development, security and policing.

 Automatic speech recognition, transcription and translation - targeted key word detection - vocal biometrics and audio fingerprinting - speaker verification, differentiation, enumeration and location - personality and emotion recognition - accent identification - sound recognition - audio object recognition - audio scene analysis  - audio event analysis - audio context awareness - music mood analysis - music identification - music playlist generation - audio synthesis - speech synthesis - musical synthesis - adversarial music - audio brand recognition - aggression detection - depression detection - laughter detection - stress detection - distress detection - intoxication detection - scream detection - lie detection - gunshot detection - autism diagnosis - parkinson's diagnosis - covid diagnosis - machine fault diagnosis - bird sound identification - gender identification - ethnicity detection - age determination - voice likeability determination - risk assessment ...

 These applications are all either currently in use by states, corporations and other entities around the world, or under development. The list is obviously not exhaustive. Nor does it convey the real diversity of markets, cyberphysical and political contexts into which these applications are quickly embedding themselves:

 Digital voice assistants - voice user interfaces - state and corporate surveillance - profiling - border security - home security - pre-emptive policing - weapons systems - court systems - hospital systems - call centre optimisation - disability services - grocery store wayfinding[[1](https://edition.cnn.com/2020/08/27/business/amazon-fresh-first-grocery-store/index.html)] - ambient elderly monitoring - baby monitoring - house arrest monitoring - ![human rights monitoring](soundcite:static/audio/intro-to-pulse-and-radio-content-analysis.mp3)[^andre_audio_1] - remote education - school security - remote diagnostics - biomonitoring and personalised health - social distancing - music streaming - music education - composition - gaming - brand development - marketing - acoustic ecology - employee performance metrics - wearables - hearables - recruitment - banking - insurance ...

 As with all forms of machine learning, questions of efficacy, access, privacy, bias, fairness and transparency arise with every use case. But machine listening also demands to be treated as an epistemic and political system in its own right, that increasingly enables, shapes and constrains basic human possibilities, that is making our auditory worlds knowable in new ways, to new institutions, according to new logics, and is remaking (sonic) life in the process. 

 Machine listening is much more than just a new scientific discipline or vein of technical innovation then. It is also an emergent field of knowledge-power, of data extraction and colonialism, of capital accumulation, automation and control. We must make it a field of political contestation and struggle.

 ## ~~Machine listening~~

 Machine listening isn't just machinic.

 Materially, it entails enormous exploitation of both human and planetary resources: to build, power and maintain the vast infrastructures on which it depends, along with all the microphones and algorithms which are its most visible manifestations. Even these are not so visible however. One of the many political challenges machine listening presents is its tendency to disappear. 

 Scientifically, machine listening demands enormous volumes of data: exhorted, extracted and appropriated from auditory environments and cultures which, though numerous already, will never be diverse enough. This is why responding to machinic bias with a politics of inclusion is necessarily a trap. It means committing to the very system that is oppressing or occluding you: a "techno-politics of perfection".

 Because machine listening is trained on (more-than) human auditory worlds, it inevitably encodes, invisibilises and reinscribes normative listenings, along with a range of more arbitrary artifacts of the datasets, statistical models and computational systems which are at once its lifeblood and fundamentally opaque. This combination means that machine listening is simultaneously an alibi or front for the proliferation and normalisation of specific auditory practices *as* machinic, and, conversely, often irreducible to human apprehension; which is to say the worst of both worlds.

 Moreover, because machine listening is so deeply bound up with logics of automation and pre-emption, it is also recursive. It feeds its listenings back into the world - gendered and gendering, colonial and colonizing, ![raced and racializing](soundcite:static/audio/halcyon-siri-imperialism.mp3)[^halcyon_audio_1], classed and productive of class relations - as Siri's answer or failure to answer; by alerting the police, denying your claim for asylum, or continuing to play Autechre - and this incites an auditory response to which it listens in turn. The soundscape is increasingly cybernetic. Confronting machine listening means recognising that common-sense distinctions between human and machine simply fail to hold. We are all machine listeners now.

 But machine listening isn't exactly listening either.

 Technically, the methods of machine listening are diverse, but they bear little relationship to the biological processes of human audition or psychocultural processes of meaning making. Many are fundamentally imagistic. Many work by combining auditory with other forms of data and sensory inputs: machines that listen by looking, or by cross-referencing audio with geolocation data. In the field of Automatic Speech Recognition, for instance, it was only when researchers at IBM moved away from attempts to simulate human listening towards statistical data processing in the 1970s that the field began making decisive steps forward. Speech recognition needed to untether itself from "human sensory-motor phenomenon" in order to start recognising speech. Airplanes don't flap their wings.[^airplanes]

 Even if machine listening did work by analogising human audition, the question of cognition would still remain. Insofar as "listening" implies a subjectivity, machines do not (yet) listen. But this kind of anthropocentrism simply begs the question. What is at stake with machine listening is precisely a new auditory regime: an analog of Paul Virilio's "sightless vision", the possibility of a listening without hearing or comprehension, a purely correlative listening, with the human subject decentered as privileged auditor. 

 One way of responding to this possibility would be to simply bracket the question of listening and think in terms of "listening effects" instead, so that the question is no longer whether machines *are* listening, but what it means to live in a world in which they act like it, and we do too.

 Another response would be to say that when or if machines listen, they listen "operationally": not in order to understand, or even facilitate human understanding, but to perform an operation: to diagnose, to identify, to recognize, to trigger. And we could notice that as listening becomes increasingly operational sound does too. Operational acoustics: sounds made by machines for machine listeners. Adversarial acoustics: sounds made by machines *against* human listeners, and vice versa.


 # Footnotes

 [^airplanes]: ![](bib:6676af8a-7a4d-4aa8-af96-f26452f58753)
 [^andre_audio_1]: Interview with [André Dao](https://andredao.com/) on September 4, 2020
 [^halcyon_audio_1]: Interview with [Halcyon Lawrence](http://www.halcyonlawrence.com/) on August 31, 2020.
 ---
 title: "Against the coming world of listening machines"
 has_lessons: []
 ---
 # Against the coming world of listening machines

 ## Alexa, what is machine listening?

 "Machine listening" is one common term for a fast-growing interdisciplinary field of science and engineering which uses audio signal processing and machine learning to "make sense" of sound and speech. Machine listening is what enables you to be "understood" by Siri and Alexa, to Shazam a song, and to interact with many audio-assistive technologies if you are blind or vision impaired. It was also, of course, a cornerstone of the mass surveillance programs revealed by Edward Snowden in 2013: SPIRITFIRE's "speech-to-text keyword search and paired dialogue transcription"; EViTAP's "automated news monitoring"; VoiceRT's "ingestion", according to one NSA slide, of Iraqi voice data into voiceprints. Domestically, machine listening technologies underpin the vast databases of vocal biometrics now held by many prison providers and, for instance, the Australian Tax Office. And they are quickly being integrated into infrastructures of development, security and policing.

 Automatic speech recognition, transcription and translation - targeted key word detection - vocal biometrics and audio fingerprinting - speaker verification, differentiation, enumeration and location - personality and emotion recognition - accent identification - sound recognition - audio object recognition - audio scene analysis  - audio event analysis - audio context awareness - music mood analysis - music identification - music playlist generation - audio synthesis - speech synthesis - musical synthesis - adversarial music - audio brand recognition - aggression detection - depression detection - laughter detection - stress detection - distress detection - intoxication detection - scream detection - lie detection - gunshot detection - autism diagnosis - parkinson's diagnosis - covid diagnosis - machine fault diagnosis - bird sound identification - gender identification - ethnicity detection - age determination - voice likeability determination - risk assessment ...

 These applications are all either currently in use by states, corporations and other entities around the world, or under development. The list is obviously not exhaustive. Nor does it convey the real diversity of markets, cyberphysical and political contexts into which these applications are quickly embedding themselves:

 Digital voice assistants - voice user interfaces - state and corporate surveillance - profiling - border security - home security - pre-emptive policing - weapons systems - court systems - hospital systems - call centre optimisation - disability services - grocery store wayfinding[[1](https://edition.cnn.com/2020/08/27/business/amazon-fresh-first-grocery-store/index.html) "amazon fresh first grocery story"] - ambient elderly monitoring - baby monitoring - house arrest monitoring - ![human rights monitoring](soundcite:static/audio/intro-to-pulse-and-radio-content-analysis.mp3)[^andre_audio_1] - remote education - school security - remote diagnostics - biomonitoring and personalised health - social distancing - music streaming - music education - composition - gaming - brand development - marketing - acoustic ecology - employee performance metrics - wearables - hearables - recruitment - banking - insurance ...

 As with all forms of machine learning, questions of efficacy, access, privacy, bias, fairness and transparency arise with every use case. But machine listening also demands to be treated as an epistemic and political system in its own right, that increasingly enables, shapes and constrains basic human possibilities, that is making our auditory worlds knowable in new ways, to new institutions, according to new logics, and is remaking (sonic) life in the process. 

 Machine listening is much more than just a new scientific discipline or vein of technical innovation then. It is also an emergent field of knowledge-power, of data extraction and colonialism, of capital accumulation, automation and control. We must make it a field of political contestation and struggle.

 ## ~~Machine listening~~

 Machine listening isn't just machinic.

 Materially, it entails enormous exploitation of both human and planetary resources: to build, power and maintain the vast infrastructures on which it depends, along with all the microphones and algorithms which are its most visible manifestations. Even these are not so visible however. One of the many political challenges machine listening presents is its tendency to disappear. 

 Scientifically, machine listening demands enormous volumes of data: exhorted, extracted and appropriated from auditory environments and cultures which, though numerous already, will never be diverse enough. This is why responding to machinic bias with a politics of inclusion is necessarily a trap. It means committing to the very system that is oppressing or occluding you: a "techno-politics of perfection".

 Because machine listening is trained on (more-than) human auditory worlds, it inevitably encodes, invisibilises and reinscribes normative listenings, along with a range of more arbitrary artifacts of the datasets, statistical models and computational systems which are at once its lifeblood and fundamentally opaque. This combination means that machine listening is simultaneously an alibi or front for the proliferation and normalisation of specific auditory practices *as* machinic, and, conversely, often irreducible to human apprehension; which is to say the worst of both worlds.

 Moreover, because machine listening is so deeply bound up with logics of automation and pre-emption, it is also recursive. It feeds its listenings back into the world - gendered and gendering, colonial and colonizing, ![raced and racializing](soundcite:static/audio/halcyon-siri-imperialism.mp3)[^halcyon_audio_1], classed and productive of class relations - as Siri's answer or failure to answer; by alerting the police, denying your claim for asylum, or continuing to play Autechre - and this incites an auditory response to which it listens in turn. The soundscape is increasingly cybernetic. Confronting machine listening means recognising that common-sense distinctions between human and machine simply fail to hold. We are all machine listeners now.

 But machine listening isn't exactly listening either.

 Technically, the methods of machine listening are diverse, but they bear little relationship to the biological processes of human audition or psychocultural processes of meaning making. Many are fundamentally imagistic. Many work by combining auditory with other forms of data and sensory inputs: machines that listen by looking, or by cross-referencing audio with geolocation data. In the field of Automatic Speech Recognition, for instance, it was only when researchers at IBM moved away from attempts to simulate human listening towards statistical data processing in the 1970s that the field began making decisive steps forward. Speech recognition needed to untether itself from "human sensory-motor phenomenon" in order to start recognising speech. Airplanes don't flap their wings.[^airplanes]

 Even if machine listening did work by analogising human audition, the question of cognition would still remain. Insofar as "listening" implies a subjectivity, machines do not (yet) listen. But this kind of anthropocentrism simply begs the question. What is at stake with machine listening is precisely a new auditory regime: an analog of Paul Virilio's "sightless vision", the possibility of a listening without hearing or comprehension, a purely correlative listening, with the human subject decentered as privileged auditor. 

 One way of responding to this possibility would be to simply bracket the question of listening and think in terms of "listening effects" instead, so that the question is no longer whether machines *are* listening, but what it means to live in a world in which they act like it, and we do too.

 Another response would be to say that when or if machines listen, they listen "operationally": not in order to understand, or even facilitate human understanding, but to perform an operation: to diagnose, to identify, to recognize, to trigger. And we could notice that as listening becomes increasingly operational sound does too. Operational acoustics: sounds made by machines for machine listeners. Adversarial acoustics: sounds made by machines *against* human listeners, and vice versa.


 # Footnotes

 [^airplanes]: ![](bib:6676af8a-7a4d-4aa8-af96-f26452f58753)
 [^andre_audio_1]: Interview with [André Dao](https://andredao.com/) on September 4, 2020
 [^halcyon_audio_1]: Interview with [Halcyon Lawrence](http://www.halcyonlawrence.com/) on August 31, 2020.