|
@@ -8,11 +8,11 @@ has_lessons: [] |
|
|
|
|
|
|
|
|
"Machine listening" is one common term for a fast-growing interdisciplinary field of science and engineering which uses audio signal processing and machine learning to "make sense" of sound and speech [Cella, Serizel, Ellis]. Machine listening is what enables you to be "understood" by Siri and Alexa, to Shazam a song, and to interact with many audio-assistive technologies if you are blind or vision impaired [Alper]. As early as the 90s, the term was already being used in computer music to describe the analytic dimension of 'interactive music systems', whose behavior changes in response to live musical input [Rowe, Maier]. It was also, of course, a cornerstone of the mass surveillance programs revealed by Edward Snowden in 2013: SPIRITFIRE's "speech-to-text keyword search and paired dialogue transcription"; EViTAP's "automated news monitoring"; VoiceRT's "ingestion", according to one NSA slide, of Iraqi voice data into voiceprints. Domestically, machine listening technologies underpin the vast databases of vocal biometrics now held by many prison providers [ref] and, for instance, the Australian Tax Office [ref]. And they are quickly being integrated into infrastructures of development, security and policing. |
|
|
"Machine listening" is one common term for a fast-growing interdisciplinary field of science and engineering which uses audio signal processing and machine learning to "make sense" of sound and speech [Cella, Serizel, Ellis]. Machine listening is what enables you to be "understood" by Siri and Alexa, to Shazam a song, and to interact with many audio-assistive technologies if you are blind or vision impaired [Alper]. As early as the 90s, the term was already being used in computer music to describe the analytic dimension of 'interactive music systems', whose behavior changes in response to live musical input [Rowe, Maier]. It was also, of course, a cornerstone of the mass surveillance programs revealed by Edward Snowden in 2013: SPIRITFIRE's "speech-to-text keyword search and paired dialogue transcription"; EViTAP's "automated news monitoring"; VoiceRT's "ingestion", according to one NSA slide, of Iraqi voice data into voiceprints. Domestically, machine listening technologies underpin the vast databases of vocal biometrics now held by many prison providers [ref] and, for instance, the Australian Tax Office [ref]. And they are quickly being integrated into infrastructures of development, security and policing. |
|
|
|
|
|
|
|
|
Automatic speech recognition, transcription and translation [Kathy Reid audio] - targeted key word detection - vocal biometrics [[1](https://www.nice.com/engage/real-time-technology/voice-biometrics/ "NICE leverages voice biometrics for safer and more secure customer authentication")] and audio fingerprinting - speaker verification, differentiation, enumeration and location - personality and emotion recognition - accent identification - sound recognition - audio object recognition - audio scene analysis - intelligent audio analysis [![](bib:827d1f44-5a35-4278-a527-4df67e5ba321)] - audio event analysis - audio context awareness - music mood analysis - music identification - music playlist generation - audio synthesis - speech synthesis - musical synthesis - adversarial music - audio brand recognition - aggression detection - depression detection - laughter detection - stress detection - distress detection - intoxication detection - scream detection - lie detection - gunshot detection - autism diagnosis - parkinson's diagnosis - covid diagnosis - machine fault diagnosis - bird sound identification - gender identification - ethnicity detection - age determination - voice likeability determination - risk assessment [[1](https://www.clearspeed.com/ "Clearspeed: Using the Power of Voice for Good")]... |
|
|
|
|
|
|
|
|
Automatic speech recognition, transcription and translation [Kathy Reid audio] - targeted key word detection - vocal biometrics [[1](https://www.nice.com/engage/real-time-technology/voice-biometrics/ "NICE leverages voice biometrics for safer and more secure customer authentication")] and audio fingerprinting - speaker verification, differentiation, enumeration and location - personality and emotion recognition - accent identification - sound recognition - audio object recognition - audio scene analysis - intelligent audio analysis [![](bib:827d1f44-5a35-4278-a527-4df67e5ba321)] - audio event analysis - audio context awareness - music mood analysis - music identification - music playlist generation - audio synthesis - speech synthesis - musical synthesis - adversarial music - audio brand recognition - aggression detection - depression detection - laughter detection - stress detection - distress detection - intoxication detection[[1](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3872081/ "Intoxicated Speech Detection: A Fusion Framework with Speaker-Normalized Hierarchical Functionals and GMM Supervectors")] - scream detection - lie detection - hoax detection[[1](https://amp.abc.net.au/article/12568084 "University of Southern Queensland gets $300k for hoax emergency call detection technology")] - gunshot detection - autism diagnosis - parkinson's diagnosis - covid diagnosis - machine fault diagnosis - bird sound identification - gender identification - ethnicity detection - age determination - voice likeability determination - risk assessment [[1](https://www.clearspeed.com/ "Clearspeed: Using the Power of Voice for Good")]... |
|
|
|
|
|
|
|
|
These applications are all either currently in use by states, corporations and other entities around the world, or under development. The list is obviously not exhaustive. Nor does it convey the real diversity of markets, cyberphysical and political contexts into which these applications are quickly embedding themselves: |
|
|
These applications are all either currently in use by states, corporations and other entities around the world, or under development. The list is obviously not exhaustive. Nor does it convey the real diversity of markets, cyberphysical and political contexts into which these applications are quickly embedding themselves: |
|
|
|
|
|
|
|
|
Digital voice assistants - voice user interfaces - state and corporate surveillance - profiling - border security - home security - pre-emptive policing - weapons systems - court systems - hospital systems - call centre optimisation - disability services - grocery store wayfinding [[1](https://edition.cnn.com/2020/08/27/business/amazon-fresh-first-grocery-store/index.html "amazon fresh first grocery story")] - ambient elderly monitoring - baby monitoring - house arrest monitoring - ![human rights monitoring](soundcite:static/audio/intro-to-pulse-and-radio-content-analysis.mp3)[^andre_audio_1] - remote education - school security - remote diagnostics - biomonitoring and personalised health - social distancing - music streaming - music education - composition - gaming - brand development - marketing - acoustic ecology - employee performance metrics - wearables - hearables - recruitment - banking - insurance ... |
|
|
|
|
|
|
|
|
Digital voice assistants - voice user interfaces - state and corporate surveillance - profiling - border security - home security - pre-emptive policing - weapons systems - court systems - hospital systems - call centre optimisation - disability services - grocery store wayfinding [[1](https://edition.cnn.com/2020/08/27/business/amazon-fresh-first-grocery-store/index.html "amazon fresh first grocery story")] - ambient elderly monitoring - baby monitoring - house arrest monitoring - ![human rights monitoring](soundcite:static/audio/intro-to-pulse-and-radio-content-analysis.mp3)[^andre_audio_1] - remote education - school security - remote diagnostics - biomonitoring and personalised health[[1](https://twitter.com/voiceome "The Voiceome Project")] - social distancing - music streaming - music education - composition - gaming - brand development - marketing - acoustic ecology - employee performance metrics - wearables - hearables - recruitment - banking - insurance - gender vocal training[[1](https://github.com/project-spectra "Project Spectra: Vocal-gender training software for trans & gender non-conforming people")] |
|
|
|
|
|
|
|
|
As with all forms of machine learning, questions of efficacy, access, privacy, bias, fairness and transparency arise with every use case. But machine listening also demands to be treated as an epistemic and political system in its own right, that increasingly enables, shapes and constrains basic human possibilities, that is making our auditory worlds knowable in new ways, to new institutions, according to new logics, and is remaking (sonic) life in the process. |
|
|
As with all forms of machine learning, questions of efficacy, access, privacy, bias, fairness and transparency arise with every use case. But machine listening also demands to be treated as an epistemic and political system in its own right, that increasingly enables, shapes and constrains basic human possibilities, that is making our auditory worlds knowable in new ways, to new institutions, according to new logics, and is remaking (sonic) life in the process. |
|
|
|
|
|
|
|
|