Browse Source

added interview

master
Sean Dockray 3 years ago
parent
commit
b115a54b45
1 changed files with 270 additions and 0 deletions
  1. +270
    -0
      content/interview/feldman-li-mills-pfeiffer.md

+ 270
- 0
content/interview/feldman-li-mills-pfeiffer.md View File

@@ -0,0 +1,270 @@
---
title: "Mara Mills, Xiaochang Li, Jessica Feldman, Michelle Pfeifer"
Description: "Leading off from Shannon's essay \"Urban Auscultation; or, Perceiving the Action of the Heart\", which addresses machine listening in the pandemic, we talk about the stethoscope, the decibel and other histories of machine listening, along with its epistemic and political dimensions and artistic deployments."
aliases: []
author: "Machine Listening"
date: "2020-08-18T00:00:00-05:00"
episode_image: "images/ml.gif"
explicit: "no"
hosts: ["james-parker", "joel-stern", "sean-dockray"]
images: ["images/ml.gif"]
news_keywords: []
podcast_duration: "00:55:00"
podcast_file: "https://machinelistening.exposed/library/Shannon%20Mattern/Shannon%20Mattern%20(19)/Shannon%20Mattern%20-%20Shannon%20Mattern.mp3"
podcast_bytes: ""
youtube: ""
categories: []
series: []
tags: []
---
## Transcript
James Parker (00:00:00) - Okay, well, thanks so much for joining us, everybody across all of the many different time zones where, and there's so many different ways we could begin, but perhaps just, we could start by introducing everybody introducing themselves. Just to begin with maybe Mara would you, would you want to kick things off?
Mara Mills (00:00:19) - Sure. So I'mm, Mara Mmills and I'm an Aassociate Pprofessor of media culture and communicationMedia, Culture and Communication at NYU. And I also co-direct. Umt, and co-founded the Ccenter for Ddisability Sstudies at NYU, and I've been in the MCC department for 10 years. And I feel like I should say like something about that department, because it's, it's influenced all four of us. Um, uh, Jessica, Xiaochang, and Michelle, who will also introduce themselves who are either grad students or alums of that department. It's a department with ain really unique history because it was founded by Neil Ppostman as the Ddepartment of Mmedia Eecology in 1971 at the urging of Marshall McLuhan. And Neil Ppostman was the sole chair of the department until 2002, when the year before he died. So it really heavily bears the imprint of this, like McLuhanesque Canadian media studieses, um, moment from the seventies, even though we've renamed the department and it's changed in all sorts of ways.
Mara Mills (00:01:22) - So the department has grown quite massively and we do have a sound studies branch that all of us have been part of. And, in fact, we're even part of a smaller branch, which is voice studies, but I think the McLuhanCluney, even though very few people cite him anymore or Ppostman the, sort of the emphasis on thinking about media technology, as much as content is really present. And the department still and Ppostman used to describe media , like Petri dishes , that kind of medium, that, you know, in which culture is grown. And I think people are still doing work along those lines, but also thinking about, you know, media ecology in bigger terms. MSo, um, you know, media as, as existing inand environments with each other and with the natural environment and the social world. So Tthe four of us today all have incubated it to some degree. in this.
Mara Mills (00:02:15) -– In this, wWhat was formerly known as the Mmedia Ececology department, . And, um, you know, I came to the department with a degree in history of science. So I had the training in the Harvard department that was really based on historical epistemology and it was quite a culture shock at first to be part of the media studies department with fewer historians and , and many more anthropologists and literary theorists. . Um, and I think I marked by those two moves and, you know, Mmy work focuses, it,, it, you know, over the last 10 years,, either on the question of sound and sound technologies and disability , um, both in terms of like literal technologies related to disability, hearing AIDSaids, cochlear implants, and the ways people use them, but also the , you know, of epistemology of sound seen from minority Marionminoritarian , um, viewpoints. Um, I'm actually on Zzoom right now. And for those of you who can see me on Zzoom m, um, the other interviewees, I'm sitting in front of a framed picture by the Ddeaf sound artist who lives in Berlin, Christine Ssun, Kim, um, who in addition to doing sound art, does charcoal drawings.
Mara Mills (00:03:19) - And she always describes herself as unlearning sound culture and trying to re-imagine the meaning of sound and the politics of sound through things like the image and through tactile vibrations. So that's one part of my work is along those lines, you know, epistemologies of sound understood through disability communities or, or so-called minor technologies of sound, which tend to not to be minor at all. And the other bird part of my work is been like 10 years of research into the history of the Bbell system, . You know, the parent organization of that was and was, and still is at andAT& T um, and you know, all of its subsidiaries, like the research branch Bbell Llabs. and, you know, at andAT& T was the largest corporation in the world for most of the 20th century. And, and many people describe at and T AT&T ais bringing many elements of electro acoustics and present day sound culture into being.
Mara Mills (00:04:14) - So I've done work on like tiny components, like seven some minuteminiature your vacuum tubes coming out of at and T AT&T and the beginnings of electronics and the beginnings of amplification all the way through things like the sound spectrograph and the vocoder. Um, Aand it's not just the technologies. It's, it's the user groups that come out of them. It's it's techniques of lListening that come from these technologies. It's also techniques like amplification or filtering that are end up being much broader than just the world of sound. And so trying to understand these entirely new modes of thinking that, that come out of a tiny little component is something I track as well. And it's in a non-deterministic way, also like culture and politics that from which those things arose in the first place.
Mara Mills (00:05:00) - I'll I'll stop there c. Cause there's four of us. I want to just say that I'm probably the most ancient historian in this group. And the lovely thing about working with people who have training in sound art and computation and anthropology, is that I actually learn a lot more about like what's happening now and the outcome of some of those components for the present.
James Parker (00:05:20) - Fantastic. Thanks so much. Mara I mean, there's, I mean, there's so much, so many other aspects of your work that I really wanted to get to, but maybe let's, let's move on to, I don't know who to go next. Um, Jessica, maybe, would you like to introduce you?
Jessica Feldman (00:05:34) - Um, Yyeah, so I am Jessica Feldman. I'm an assistant professor in the department of C communication, Cculture and Mmedia at the American Uuniversity of Paris based in Paris. Um, A and as Mara mentioned, I did my PhD in MCC at NYU studying under Mara I'm so heavily influenced by this science and technology studies way of thinking. And, uh, I'm also an artist with a background in sound and sort of like weird- techie-e, new media- robotics stuff, originally trained as a composer. And I think this way of thinking about media from the perspective of why are we making, what are we making?, what are we making?, why are we making what we're making?, And thinking about it from the perspective of the creator, the creating and the politics of what we're making really carries through into my scholarly research right now, my work I, my work was, uh, is sort of at the intersection of sound studies and values and design, and also, um, social movement studies and sort of, um, protest culture in some ways.
Jessica Feldman (00:06:51) - And, uh, right now I sort of have two parallel projects. One is maybe more, um, related to Machine Listening, which is, um, we'll probably talk about it more later, but it's on, um, uh, motion detection by apps, um, AI IOT devices, and how the sound signal gets sort of monitored and then interpreted psychologically and emotionally by these algorithms. That's sort of one area of study, more sort of critical. And, um, the other project I'm working on right now is, um, a more kind of hopeful or proactive project, which is researching how, um, left-wing activists and citizens’ assemblies and , uh, sort of grassroots democratic groups listen to each other and, um, design technology in order to do sort of decision-making and coordination and things like that. And, um, Tthat's been super fun and I think provides an interesting sort of, um, prefigurative model for how we could think about design in a way that is outside of the sort of corporate and surveillance capitalism model that permeates most of what we're using in fact, what we're using right now.
Jessica Feldman (00:08:06) - Um, and I I'm, I think that's really been, uh, amplified as something I want to think about right now since COVID and since all forms of assembly have moved online. So thinking about that a lot and think about Zzoom a lot. Yeah.
James Parker - So that's, that's where I am contested. Thank you. And maybe Joe XiochangXiaochang?
Xiaochang Li .- Sure. Um, yeah, so I am currently an Aassistant Pprofessor in Ccommunication at Stanford Uuniversity and like everyone here, I sort of have close ties to the media culture and communicationMedia, Culture and Communication program at NYU. I did my PhD there and Mara was also on my dissertation committee and we've written together, Jess and I were in the same cohort. So we're sort of a pretty tight knit group of folks in terms of my work. I kind of am the sort of accidental sound scholar in which I, I never really set out to work with or think about sound per se.
Xiaochang Li Jessica Feldman (00:09:02) - Um, I had started a project on thinking about machine learning and language processing, um, which is still part of my project now, which is a question about how computation language really gets tethered together as an algorithmic process in which the sort of problem of trying to bring language under the sort of purview of algorithmic processing starts to both get us towards what we would recognize as machine learning, right. It really starts to set up some of the technical and epistemic groundwork for thinking about a kind of radical data orthodoxy that we would now recognize now as sort of machine learning in which everything is organized around large scale pattern recognition that has to sift through large volumes of data. But at the same time also opens the door to make it thinkable for computation to be the means by which we come to understand social expression.
Xiaochang Li Mara Mills (00:10:00) - Right. In terms of like making it thinkable that we can not only sort of tally, but also sort of determine meaning within language, by sorting through it. Right. And so the sound piece comes in because part of that history is really about the development of speech recognition and not only how sort of language processing him to be, but specifically the challenge of doing acoustic signal processing alongside language. And I accidentally, like I came upon it almost entirely by accident. And because I made a mistake in my early research. So I was working on text prediction technologies, and I was reading up about T9 nine and I sort of read in a passing interview with one of the creators of T9 nine that it originated, uh, as a, uh, assistive technology. And so I immediately assumed that the assistive technology and question was speech recognition because that sort of made sense to me in terms of like, uh, a legacy of assisted technology coming into something like auto, correct.
Mara Mills (00:10:59) - That was completely incorrect, that it turned out what he had originally designed T9 and I for a. And what makes complete sense now, as we think about it was eye tracking, you have nine positions, et cetera. But at that point I had already started down the rabbit hole of the history of speech recognition and started to see the ways in which it was such a crucial piece in the puzzle of making language thinkable as a computational project. And sort of addressing the question of why it is that we keep trying to make computers do language things, even though they're extremely bad at precisely doing language things, but also in doing perceptual things. Things like lListening for instance, because they don't have the same kind of perceptual coordinates as, as people do. They don't recognize sounds as sounds per se. And, um, and so I had already gone so deep down that rabbit hole and it turned out that it did in fact have so much to do with this history that I was interested in, that it just sort of worked out.
Mara Mills (00:12:00) - I want to just jump in and say, this was one I didn't realize you mentioned that shut XiochangXiaochang and I had written together, but we didn't realize till very late, I think after you even finished your PhD where our own research intersected, because I knew you as doing work on text prediction with a tiny fragment of it being around speech track, and then all of a sudden our work converged around the sound spectrograph and, and understanding speech. Um, and in terms of filtered speech in a spectral trum view, um, that, and . I had done work on the , um, pre-history of that machine. And you had done, and you had done work after Wworld Wwar II, and it was just really fortuitous that we were able to like unite, especially around a mMachine called Audrey. The automatic digit recognizer her created at Bbell Llabs as the first presumably arguably speech recognizer, but it was like, I couldn't have done that work on my own.
Mara Mills (00:12:52) - So just being able to collaborate with someone who had a totally different time period and with that anchor was really, really quite an amazing experience.
James Parker: Fantastic. I want to talk about that work , uh, at some point, uh, as well, but finally, but not leastly Michelle.
Michelle Pfeifer - Yeah. My name is Michelle Pfeiffer and I'm a PhD candidate at the department of media culture and communicationMedia, Culture and Communication at and NYU and Maurice Mara is also on my dissertation committee. So yeah. And I'm still there. Um, A and really broadly, I would say my research is about sort of the relationship between media technology and border and migration policing and like different forms of like surveillance that are associated with that. And the dissertation that I'm writing now kind of looks more particularly at sound and different forms of lListening that are kind of crucial, or I say a crucial for these forms of like board and migration control.
Michelle Pfeifer Mara Mills (00:13:55) - And I'm mostly like focusing on, um, what's going on in Germany, but also kind of like Europe more broadly. I was like, very curious how, uh, Mara was describing the departments thinking about sort of like the media and the environment, because I think for me, it's also, I kind of like started really thinking about what is called, like the referred to as the European border regime, which really describes like different kinds of like infrastructures, laws, technologies, geographies, um, forms of policing that kind of create these like fortified borders of Europe. And then, um, I started the PhD in media studies and I became really interested in looking at, um, yeah, the role of like media technology. So probably most kind of related to questions of mMachine lListening is, uh, one kind of big part of the dissertation that looks at dialect recognition and how it's used in asylum determination in Germany. And maybe, I mean, I never really described myself like that, but as child XiochangXiaochang was saying that she like came to sound accidentally. I think for me, it was kind of similar.
Michelle Pfeifer Mara Mills (00:15:08) - Because basically I was, um, I was coming from Germany, started the PhD just after this moment that, um, was kind of referred to as the refugee crisis in 2015. And I started the PhD in 2016 and I sort of started noticing that there were like all these kind of like different like legal and administrative shifts in asylum determination that were making use of these like automated and semi-automated kind of biometric tools, forensic tools. Um, and one of them wass, um, the stylist recognition software. And at the time I didn't really know anyone was sort of like working on it. Now I know like a few people who are like working on it, but I just felt like it was really, um, I don't know. I was like very captivated by it for sort of like thinking about, I don't know, kind of questions of, um, you know, like sort of personhood, like who, like, who kind of can make what kind of claims and what happens when the sort of like, if we think about the voices, there's kind of like, I guess almost like political infrastructure, right?
Michelle Pfeifer Mara Mills (00:16:20) - The thing that you need and like political theory, sort of like this thing you need to like make a claim for something, what happens when that is sort of, kind of like reduced from content and then actually is only thought about in terms of, um, phonetics or acoustics. So yeah, maybe I can like describe this technology a little bit more later, butt, um, for me, I kind of like stuck with that because I felt that it was really opening these questions about yeah.h How you can kind of like make political claims how they sort of become inscribed in these technologies, but also this sort of like beginning of where I started, like, what is actually like a border and where is it? And you know, how does it sort of exist in bodies in media, but also like in voices.
Mara Mills - Michelle you probably know that Lawrence Abu Hamdan was part of the project -. Um, the Machine Listening project early on - , I think. And you probably, I mean, James can probably say more about what his contribution was, but I feel like your work is basically at the ethnographic follow-up on some of the stuff he was creating at that time.
James Parker (00:17:28) - Yeah. Lawrence, um, we, we, we, we know that Lawrence has work on , um, yeah, the lotto regime, but that's pretty automation. We actually recorded an interview with him. I mean, and this may be a way in to thinking about Machine Listening, but we had, we recorded an interview with him and it went all over the place. It was very sort of wide ranging and stuff. That's why it's not up online yet. Um, B but you know, he said, well, I was working onn, um, accent recognition in relation to refugees or, um, before it was being automated. And to me, the interesting question is not so much the automation, it's the concept of a voice being a passport. Right. And so that sort of became a, a kind of a way when we were talking with Lawrence to thinking about, well, what, what is the problem with Machine Listening what is mMachinnice Nick about the lListening?
James Parker (00:18:16) - Is that it the fact that the mMachine is involved or the kind of the, the sort of desire to apprehend the voice or, or sound in this kind of very, you know, quantified or sort of, I mean, I don't, I don't know exactly how to describe it, but, but, but it became a question of what, what is, what, what do we mean by Machine Listening what's the problem with Machine Listening and I think that's a really good question for everybody, because, you know, we we've been trying to formulate Machine Listening as a, kind of a political problem to say, you know, Machine Listening, it seems to be a language that some people use. It seems to be a language that some scientists use, um, particularly., um, Ccomposers, um, computer musicians have used this language of Machine Listening, but it's not widespread. It's not obvious that everybody know people People don't already know what Machine Listening is, uh, necessarily, you know, it sounds a little bit like mMachine learning and, but, andbut the scientists also talk about computer audition and, you know, auditory scene analysis and, you know, the mMachine in some ways is a kind of a very old figure that sort of doesn't really suggest AI or machine learning in some ways, you know?
James Parker (00:19:27) - So, so what, what are the histories of Machine Listening what do we understand by it? Do you recognize your work as being about Machine Listening in some way, perhaps that'd be a good opening to everybody, you know, what, what, if anything, does your work have to do with Machine Listening or like Lawrence, do you sort of want to reject that that's really the problem and that we should be thinking about it, you know, in other kinds of ways.things
Mara Mills (00:19:56) - I'm going to say, I feel like I have, this is me being mean. I feel like chef XiochangXiaochang is the one who is working the most directly on the n.
Mara Mills (00:20:02) - The internalist definition of Machine Listening. I think we all can like expand that category into, um, many broader realms in in fact, possibly even to like legal protocols as Michelle was mentioning, because those protocols can be almost algorithmic making every bit as inflexible as machine learning, but Xiaochangtouchi, ng your, you actually, I mean, and you have a computer science background, so I feel like you have a sense of like the internalist definition as well as, and you mentioned the phrase even just now, am I putting you on the spot?
Mara Mills (00:20:35) - I'm happy. I, so it's funny. I don't, Machine, Listening it conjures a conscious of a few things for me. Um, I don't know that this is a term that appears historically very often, right? It's certainly a term that's in circulation now. And I think what is happening is that it animates a very particular fantasy about how sort of sound and lListening and automation come together, where on one hand, Machine, Listening really taps into what is the kind of machine learning, big data Hennessy fantasy of understanding and knowledge at a scale that is beyond the horizon of human scrutiny and this sort of promise of what that would entail. But at the same time, like comes into this kind of queasy combination and alongside the sort of popular fears about surveillance and the, the sort of pervasive surveillance that is necessitated by the very promise of kind of machine learning and big data, right?
Mara Mills (00:21:38) - That you need to constantly produce more data that sort of Machine, Listening both captures the sort of dream of the promise of what that has to offer, and also the fear of the voracious appetite for data that, that requires. And I think that for me is maybe why it has such a clear circulation right now, though. I think it also in my own work taps into something that I think doesn't get articulated that often when we think about machine learning and the sorts of technologies around sound that we're thinking about today, um,e especially in relation to voice, which is that Machine, Listening implies something that is happening in terms of apprehension and interpretation on the end of the mMachine, as opposed to something like mMachine hearing. And so, and so central to what we're looking at now is about the development of kind of what sort of historically was called like new perceptual coordinates, right? We're not simply digitizing the voice, right? We're not simply sort of reformatting it in computational form; that we are actually creating a kind of new perceptual calculus, right. That we're creating computationally differentiated categories for how we think about sound and the voice in ways that aren't actually mappable onto what we think as human categories. And I think that is kind of what Machine, Listening conjures up for me.
James Parker (00:23:05) - So Machine Listening is, uh, is a new kind of l, Listening not the machinic version, uh, attempt to emulate lListening?
Mara Mills (00:23:15) - Right. Yeah. So that, that what Machine Listening is something different from like simply putting sound through a mMachine right. Or simply reformatting sound in mMachine processable form that in doing so you have to do something to what it is that we think lListening is.
Mara Mills (00:23:34) - Yeah. I mean, your work touching, especially the post-war post-World Wwar II histories , you tell, makes me realize that my pre-World Wwar II histories are often about, more about mMachine hearing than mMachine lListening or about replicating in some sort of machinic form how human hearing works, but without the processing part, without the perception part, without the learning part, simply because it wasn't technically possible, even if some of the components or tools like the idea of the sound spectrograph are still informing mMachine lListening today. My work has been more about mMachine hearing. So I've worked on things like the roots of the cochlear implant, um, attempts to like recreate human hearing in a mMachine, um, or the sound spectrograph create getting a mMachine that , that, um, captures what here captures a sound wave as it's traveling through air. So not the actual reception moment, but the transmission moment, um,and and, and visualizess, what that sound would look like. Um, it, rather rather than what it would sound like, you know, I, . Wwhen I was thinking about Machine Listening though, you know, having as someone who trained a long time ago in biology and was in the sciences for a while, I remember all of my intro physics classes always started with the question of what is a mMachine and they start. They start with like the lever or , the inclined plane. To.
Mara Mills (00:24:56) - And, you know, to be, um, the historian of science for a moment, there is a way in which if we take the word mMachine to include any even simple machines, um, is something that's mechanical model and not electronic that just is able to like modulator or change motion or force th. Then even some of my early work on like your trumpets, which are, you know, channeling sound waves, channeling, the motion of air in that sense could be, you know, considered to be something related to, or in the neighborhood of, of m Machine lListening, you know, it's more listening and hearing through machines. I think when I've written about sort of, uh, mechanical AIDS aids or amplifiers or tools related to sound and the 19th century, I usually don't use the word mMachine for them, but there's no reason why one couldn't, if one's just taking them to the basic definition of a simple mMachine. So, but yeah, I mean, withBut with Xiaochang’s Chang's definition, I often, I often personally work on the history of mMachine hearing and, or on hearing, hearing and listening through machines. So the mechanization of hearing, rather than its automation, that, that mechanization step that comes before automation.,
James Parker (00:26:10) - One of the moves you make Mara is often that, you know, to point to the fact that the reason for turning to the mMachine is often to supplement or delegate hearing and in the context of assistance. Right. You know, so that, you know, that the deaf need a Machine to hear for them. And then the machine is coming in as a kind of a supplement for, to facilitate a deficit. And d, you know, you S you see, you see that in your, your work and you critique that, and you, and, you know, you talk about how that story is part of the history of sound technologies, per se. And it's not a minor history or a marginal history, but we see that story a lot even today., you know, um, Wwe were speaking with somebody from Google recently, you know, saying that, that a lot of the ASR and audio assist, uh, the audio ive technologies that he's been working on, they first sort of first get in because of the assistive sort of function. And, Bbut he's a bit, he was a bit suspicious. I think that, that maybe that's sort of a little bit of marketing thing. Uh, and Sso I'm really interested about the story, how to think about that, that history, that proceeds, you know, AI about, or machine learning, but that's about the relationship between assistive or delegated lListening, or kind of complete, you know, the Naturalizer naturalizes, a kind ofthat naturalizes a certain kind of privileged form of lListening, hearing, speaking and so on. And then,
Mara Mills (00:27:41) - Yeah, I mean, media theorists and especially media historians love this narrative of media as prosthesis, whether you're talking about McLuhan or Kittler or Paul Virilio - Paul, Reelio the sense that like some human or maybe even all human beings have these deficits, which require some kind of supplementation as you put it. And actuallyActually, if you look at the historical record, you know, first off in the case of hearing, since that's what we're talking about, hearing loss is so common, everyone loses hearing as they age. It's so prevalent that, and it's such an internally diverse category that tons and tons of inventors, whether you were talking about someone like Edison or Fleming, who was one of the developers of the vacuum tube were hard of hearing or deaf either from childhood or in later age. And it's not necessarily that their deafness caused them to be inventive. Um,I it surely it had some influence on their work, but I just think it's important to remember that this is like such a common phenomenon, that it appears all throughout history, and sometimes it's just sitsit’s correlated, but it's not like causal of, of some of these inventions.
Mara Mills (00:28:52) - And, um, the other thing I've found is, you know, ddeaf people themselves have invented tons of, I mean, again, Edison being deaf is a great example. I have done a lot of the invention themselves. They haven't necessarily needed someone to come create a mMachine to integrate them into oral culture. And a lot of deaf people reject that anyway and, and identify as linguistic minorities. So there is a rhetoric, and I think in the United States, it can be linked to things like the NSF and the NIH, big granting organizations requiring someone to show broader impacts when they write a grant proposal. And a really easy way to do that is to talk about rehabilitating a disability or, um, and other kind of impairment or illness. In fact, that tends not to be where the market is and where the money comes from. So a lot of inventors will start by getting funding and by, uh, talking about doing work on behalf of say, deaf people, but then when they market their product, they often end up leaving disabled people completely behind, and they Mark itmarket for a broader audience and their tools aren't necessarily even, um, they're not even accessible.
Mara Mills (00:29:54) - So I write about that phenomenon is as what I call the of pretext. There's like a pretext.
Jessica Feldman (00:30:00) - And I think if you're McLuhan or Hitler Kittler looking through the record, it's like, ‘Oh look, they invented this for someone deaf.’ And then as you peel back the layers of the history, they actually didn't and they certainly didn't and the tool often wasn't even marketed that way. Um, so yeah, one has to just, I mean, this is Hhistorians always look at things in almost an annoyingly contrarian and fine-grained way, and that I could go on and on with other examples, but maybe I'll just, I'll pause myself there because we have other, I mean, I'm, you know, we have two other people who aren't historians and I think who are getting at the broader implications, the social implications beyond the technology. In in other ways.,
James Parker (00:30:39) - That's a very generous segue. Mara, I'd love to talk with you about the cybernetic, the work, the work you've done, the relation, the relates to cybernetics and information theory. But, but maybe I could just, um, jump off to ask Jessica a question, because, Mara, you, you, you talked about being fine-grained and, and, uh, and historians being , you know, really fine-grained, but Jessica I was reading, we are reading your paper on aeffective and emotion detection in relation to the voice. And I was just really struck when I was reading your paper that about how fine grain the analysis was. You weren't talking about Machine Listening right. You were talking about this specific technology that this specific company is developed developing in this very specific way. It has a very specific imagination of, I can't remember what you called it, the human emotional structure or something like this, and you were tracking really, and this, again, in this really fine grained way, like these very different technologies that might appear in the marketing as simply, you know, AI- led or, you know, um, similarly Machine machinic but actually working very differently and then follow through how they, you know, they come to market and their effects, you know, and so on, but I was just really struck by your text methodology methodologically, actually.
James Parker (00:31:54) - And I wondered if you have any, do you think of your work as having to do with Machine this Listening and you have an idea of Machine Listening in the background or is it, or is it about watching always the technology in its specificity?
Jessica Feldman (00:32:05) - Hmm, I think, uh, when I was doing that work, um, well, I didn't know what I was going to find and, you know, I, and there were only a handful of companies working on this at the time. So I was able to do a really close patent analysis of like, you know, five different companies that were kind of pioneering this, uh, vocal emotion detection. I think it matters actually a lot, these rubrics of, uh, emotional and psychological constitution that are deeply embedded in these technologies, even though once they come to market, they're sort of being marketed all for the same thing. Um, I think what matters here is how they're imagining the human soul and they are doing it somewhat differently. Um, and I, I, uh, I don't think there's a right way to do it necessarily. I'm quite concerned by all of these tools. Um, but I, I w I was in the research, I was,But in the research, I was really curious to see like these different paradigms and how they developed. I kind of, as they got closer and closer to the market,
James Parker (00:33:17) - Could you maybe give an example of some of the technologies and in their specificity?
Jessica Feldman (00:33:23) - Sure, sure. Well, um, so the first one that I looked at was, is developed by this, uh, Israeli company called them a Cisco that really focuses on a security technology. And they were S they were, their work is really coming out of like a history of lie detection technology, which has been proven to be inaccurate and illegal in many places. So, but they're using that technology basically. And what they're looking for is like, kind of like micro, micro tremors in the voice that they read as negative of a sort of discomfort that could mean that you are not telling the truth, um, or that you're a security risk. So the, the switch was in the rhetoric of it being factual to it being a risk. And this is much more, I think this is a rhetorical change. This isn't a design change because they couldn't actually claim that they were able to determine whether someone was lying or not. They could just determine whether it was probable that someone was lying or they could claim to determine that it was probable, that someone was lying. So, I mean, it's a bit, I'm selling snake oil, I think. Um,B but it's important because it's being adopted so widely now. So they're just looking for these tiny tremors in the voice that, uh, operate kind of at the level of, um, I forget the name of the muscles, but you know, the muscles that we can not control consciouslytness.
Jessica Feldman (00:34:58) - So that's, that's one paradigm and they're really just lining this up with like stress and lining up stress with truth. And then there are some other companies that have these like entirely different sort of like mappings of the human soul, basically. And like, um, sort of like, uh, information processing systems that, that they imagine as the human soul and by analyzing your voice, they can sort of claim to tell, you know, what your mood is or what your feelings are in that moment. And then there are others that really make a very like pure like affective science claim that, um, the voice, uh, is expressing some like something that is universal and uncontrollable and pre conscious and pre linguistic. And if you can just like pull that out, then you will know something about what the person is reacting to. So this is like less a model of , um, ‘this is how the soul is structured’ and Mora more a model of like ‘, there is some kind of universal pre linguistic communication that's happening here that we don't even like need a human to get at.’ Um, so I would say those are like the three main models that kind of emerged from the research and they, they understand the human in different ways. And I think that matters because it's being used on us.
Mara Mills (00:36:35) - Jessica, I wanted to ask you because, um, I know you mentioned that one of these companies focused on surveillance, which is very similar to the work Michelle is doing, and I spent so much time on. And so did judge Xiaochang hang on speech recognition that I wasn't thinking very much about the paralanguage stickpara-linguistic – about questions of accent or aeffect and the other aspects of voice, the things that count as voice. I was thinking, focusing on the things that count as speech, but what terrified me when I read your piece was how pervasive, as you mentioned, these tools are it's, it's not just order control. It's like so many different automated systems for, for calling a call center to ask for help. And I feel like now I always try to like, after reading your dissertation and this article try to like modulate my voice in certain ways in the hopes that someone will actually not think I'm as angry as I am as a customer, but I'm just curious if you, do you feel that like the surveillance and policing function is where the initial money came from or is it really consumer culture or is it just, it's just so it's just that once this technology became available, it just instantly flooded into all of these different markets.
Jessica Feldman (00:37:41) - Hmm. Well, the first thing I want to tell you is that if you want to escalate your claim, you should be more angry and you will more quickly get to a human. . Um, and that's it, thaThat's one of the most ... useful thing that I've learned from my research. Um, I think, uh, I think the money came from security. Yeah. At least as far as I can tell from like the early early stuff. Um, and Tthere's also an element of rhetoric, um, this sort of like, uh, what do you call it? The assistive, um, pretext, there's a lot of that language too about, um, sort of helping autistic people. I so that they, in any sort of emotion recognition, like you, you start to get , uh, uh, the disability language, um, for autistic people, but it doesn't ever materialize in a tool that actually is useful to them, as you said.
Jessica Feldman (00:38:43) - I think what I'm seeing so far is like, it sort of, the money sort of usually comes from like surveillance. And then, um, the deployments are first sort of in games and, um, children, you know, children's tools like, Oh, a little robot that you can talk to and, you know, some fun app you can play with on your phone and that's how they build their, um, their D their training data set. And then it sort of scales up into, um, consumer products, like a marketing and the call center and the neuro marketing sort of stuff. And, um, and then we'll see where it goes from there.
Mara MillsXiaochang Li (00:39:20 - ) - I actually just, um, the, the The stuff that you were bringing up reminded me, um, uh, of the kind of deep naturalism that is present in a lot of the speech technology researchers, right? And there's, there's a long sort of his sort of legacy of them. And many of these like early speech recognition technologies, too. There's this fantasy that what this would produce is a more natural form of writing that proceeded the current form of writing. And it comes from the fact that a lot of these engineers were referencing material from like these kinds of amateur scientists that had fashioned themselves as naturalists, because they were just like barons. .
Mara Mills (00:40:02) - Right. And that's like a thing you do and so on. And so they, they carried forward all these ideas about how sort of the sort of vocalization of speech is like the a predecessor of writing, of course, but it's also the sort of next level after the gesture, because you have to speak out when your hands are full or, I mean, these kinds of very like, to us, like very silly assumptions that were deeply embedded in sort of these naturalists ideas about sort of primitivism as well. Right. AndIn the kind of moment of colonialism and very much syncs up with, you know, and I'm thinking of Fatimah Tobing Rony’s ... work on the third eye of cinema, um, and the early anthropological film and the capture of motion as imagined to be the unfastened unfashioned right evidence, right. Evidence that the person couldn't themselves consciously manipulate the same kind of fantasy of your systems catch the micro tremors of the vocal apparatus and s.
Mara Mills (00:41:01) - And therefore can't be controlled or manipulated by the person and leads us to discover evidence about them better, like beneath their capacity to deserve control.
Mara Mills: Yeah, this love of, um, this idea of media industry or the tech industry wanting to like undo itself, like wanting to get it somehow enhance the state of nature and make itself invisible rather than creating all sorts of other technologies that we haven't even thought about that might, we might use our bodies and all sorts of different ways. And not recognizing that speech is learned. Speech is a technology quote, unquote unaided speech is a technology. I know this is at the heart of a lot of Michelle's work on accent too. And how like absurd, the idea is that you are going to find something innate about a person and unlearned and untouched based on voice or speech, but it is just bizarre what you're saying, judging all this rhetoric in the tech industry is about the sort of evils of non-natural tech
Xiaochang Li - Wwriting is the artificial technology here, not the mMachine that now interprets language.
Mara MillsJessica Feldman (00:42:05) - Yeah. And it was something that sort of came up in my research and probably it's, uh, connects a lot to Michelle's work is that these technologies donut do not at all accommodate infleicted languages. So even though they're making these claims to be universal, in fact, it wouldn't work with Chinese, you know, so, um, and . Tthey're, they're very Western- centric.
Mara Mills – Cochlear Um, even though they claim to be like, uh, you know, something sure. Hopefully or implants, same thing., Tthey totally, um, they were designed for phonetic languages, not tonal ones. That was one of the major – if you're doing from a values and design perspective – e, major biases in those. Yeah. But it also, and I know I kind of,
Michelle Pfeifer - I keep on, so thinking about this for me, this question about whether or not just like, what is, what is Machine Listening, but also if it actually matters that the machine is doing the liListening, because I like in sort of like the case that I looked at, where there was a kind of like previous iteration of like linguists were doing this work, like linguists who arewere like analyzing language .
Mara Mills (00:43:14) - Um, and, uh, we'reand were sort of like then, you know, kind of like on the basis of that, trying to determine where someone is from, which is like really crucial, like they saya really important indicator in like an asylum claim. where you're from, andAnd then there's this sort of like move to this kind of like semi-automation, and of course the kind of assumption is the same. TLike the assumption is still l, you know, that you connect - that language is actually kind of like an indicator for, uh,for like citizenship, which of course it's like not. , Bbut also that like the language is, um, like doesn't change or it's like stablebing, it's like not mobile. And that also people are not moving, you know, that like,that people wouldn't be immersed in like different languages or, you know, it would be like bilingual or all these different things. And especially, I think when it comes to like, uh, kind of migration, usually like a migrant biography doesn't really look like that.
Mara Mills (00:44:17) - Like it's, you know, and it's lAike, actually, if we think about like, kind of like how European borders work it's like intentionally made so that people cannot move from like one point to another point that they get stuck, you know, and like at many different places. So of course, like that assumption is like the same, whether linguist is doing this work or the mMachine. Um, B but then I think there's like some things that are like, I dunno, and like trying to take serious, the sort of,I am trying to take seriously. um, I guess one thing is like how there's this kind of way in which the mMachine can make conceal what it's doing more easily, you know, . Nnot like to use an overused metaphor, sort of overused, like the like the black box metaphor, but I think it's really the case that.
Mara Mills (00:45:05) - It'st it’s really hard to figure out what actually kind of like happens in that determination of the dialect someone speaks. And what is the impact of that on the outcome of an asylum claim. And Tthere's actually that sort of like, uh, kind of back and forth between human and machine. Are or there is supposed to be that, I mean, the . The German government has always kind of emphasized that this is not supposed to be, um, and the machine is not, not making decisions. I, but it's only supposed to supposed to assist a decision-making, but it's also supposed to be like a solution to this like human error or failure or something or something that the human is not capable of doing. And I think also kind of important the, if If we think about it in the context of the kind of like refugee crisis, it’s also important that they wanted something that they could like scale up, you know, that it had, could be done.
Mara Mills (00:46:02) - They wanted to like solve all these like logistical problems that there were so tooso many people that they couldn't process all the asylum claims. I mean, every like every person, every sort of like state representative I talked to, they were always just talking about how they needed to make something that they could scale up that could do the work much more quickly. And the linguists who are doing this, you know, they were, I don't know, nobody's sort of agreeing on like, what was the right way to do it, or whether you should be doing it at all. And also of course the money, like, you know, the sort of like – who is like investing in the development of this kind of stuff.? So I think that there are Eeven though like , I, yeah, the the assumption of like, you know, understanding the voice has the passport is still there and operative,.
Mara Mills (00:46:52) - I do think that there's something about hnow w, I don't know. It's almost like the kind of like violence that as an actor is like supposed to be, it's likeis maybe easier to conceal or like smooth over or like, you know, sort of laymenin the. And mMachine like, yeah. And then this idea of like, this is like of something that is supposed to be more objective, you know, and I think that's really powerful. T in, or that's also something that needs to kind of like be confronted by like just actually making visible the sort of effects.
Mara Mills - What is the tool right now t? That is the main one being used by German border control to like identify people's accents and like, do you act, is it turning, are more people being turned away? I guess my question is, are more people being turned away and denied asylum claims because of the mMachine or is it actually just incidental?
Mara Mills (00:47:43) - Is it just, is it the same as it would have been with linguists as? And it's just adding this eraair, as you say, of objectivity.
Michelle Pfeifer - I mean, I think it's kind of hard. It's kind of hard to answer that question because so many things happened that made it so that it's much harder to actually make it to Germany. So there's like less people who are actually applying for asylum, uhasylum, which is ahas like other, it's like sort of the more general kind of development of like externalizing border policing. And, and the other thing is also that , um, there was sort of like this backlog of like asylum claims being processed. So it's like, I don't know. It's just like now this moment that like some of the cases like make it to the courts, um, because there was like a backlog there too, but there's definitely, I mean, I don't know..
Mara Mills (00:48:32) - I wouldn't be able to say that like more people are like, uh, not getting asylum because of that. But, um, there's, like these sort of like, I think, more like, kind of like subtle ways in which sort of, you know, people describe how the, when they're doing their asylum interviews, that kind of like ways of questioning is are kind of like referring to these reports. So then we can also think about your sort of like influence, you know, offof that kind of like objective aura, like influences the people who are like asking the questions. So I think that this kind of stuff is like happening, but it's kind of hard to quantify – to , you know, sort of like, see it, see that in sort of like in numbers, but, um, yeah, I don't know if that answered your question.
James Parker (00:49:19) - And Iit's very helpful to think about the way in which the kind of the technical systems are embedded and related to, you know, border regimes and economic systems., you know, and Jessica and in your work, you know, you're very clear about the way in which the coming to market is a really important point that, you know, shifts things and you know, it's not as if there's something called Machine Listening , that is separate from all of these things. And actually I was wonderingg – onto the, joined a few dots because it seems like what you're saying Michelle is that , you know, part of you're a bit, you're a bit hesitant about it because the black boxes, you know, it's a thing. Uh, but you're like kind of the thing is we need to be on un- black box and to teach .
James Parker (00:49:58) - You know, the technologists (or the border agents or the government), a lesson about voice – like like sounsdound studies is doing can do political work here. It's not just sound studies, but you know that your their theory of the voice is wrong. And also theory of migration and refugees, you know, it's profoundly wrong. And, and, and Jessica, and your work, I was struck by what , you know, you said about snake oil and I, and I was reading one bit about Bbeyond Vverbal, . oOne of the companies you discuss. and Yyou say that their 2011 patent includes rubrics relating to vocal pitch. For example, the pitch of C is associated with the need for activity and survival, whereas E is associated with self controlself-control and be B with command and leadership. Now that's an example where like, there's no black box because they said it in the patent and it is bananas!. Right. And, you know, like Aapart from the fact that it's like, based on a theory of like, you know, Western harmony, which or, you know, that is like very recent and specific.… And like, I mean, it's just bananas. So part of the politics there is to say a lesson from sound studies or music theory in saying this is just wrong – so profoundly wrong. Whereas in Xiaochang’s work,
Mara Mills (00:51:07) - So part
James Parker (00:51:07) - Of the politics there is to say, uh, you know, again, it's like a lesson in, you know, from sound Saudis or music theory, or, you know, it's saying this, this is just wrong. Like, this is so profoundly wrong, whereas in Joe Chang's work and perhaps Mara’s is too (although it depends a bit which part of your work we are talking about). And it depends which bit of your work talking about, you know,, the politics are a bit different that, you know, . Iin general, XiochangXiaochang, you're, you're talking kind of about the emergence of a new, I don't know if it's an episode episteme theme exactly. But like, you know, a new, but you talk about the statistical termturn, like the emergence of statistical lListening or, uh, and, and the way in which this automatic speech recognition is kind of a precursor to our being embedded in, , or kind of leads to, , you know, data, knowledge, data power, or something more generally. So it strikes me that there are 's like different ways that politics is happening in, in each of the different projects going on here. And I just sort of wanted to maybe draw that out and maybe also invite Joe, Joe XiochangXiaochang to say something about the politics of your work, whethher where they are in your work, like, cause because it seems like there are really significant political stakes. Um, Bbut it's not of the same kind that Michelle and Jessica are dealing with, not to say that I've like mapped the entire politics of your work.
Mara MillsXiaochang Li (00:52:34) - Yeah, no, I think so. If I were to kind of draw a through line in terms of them, the things that we're talking about, um, that starts to edge into the realm of the political is that if we're thinking about something like Machine , Listening, we all seem to be working with technologies that are trying to map some kind of measurement meaning relation, right? In terms of how it's separate from the sort of earlier voice technologies that are sort of registering and quantifying the voice for different kinds of visual and mathematical scrutiny, what starts to happen with speech recognition, what we see with the emotion detection, a. And whaten we see what with the accident accent detection is that they're trying to make the machine do the work of linking the measurements, the acoustic measurements to sort of meaningful categories of things, right. Be those phonetic categories of language, dialect, ‘C equals some kind of emotion’.
Mara Mills (00:53:36) - Right. Um, and so I think thoseThose are the sites where the politics really start to enter in a lot of ways. Um, I think in the, in some of the work it's much more explicit. I think in mine, it is kind of hovering either hovering in the background or sort of as an undertow, whichever metaphor you want to go with, but I'm still actually trying to work out what the sort of more explicit direct link is. A part of it is that it's, it's part of the drive that makes data into a commercial commodity, right? It helps set up the imperative for constantly gathering data because it as sort of speech recognition and, natural language processing grow, all of a sudden bladder becomes very relevant to sort of things like search engines, things like advertising, right? There's things like surveillance that it in new ways that it wasn't before. And that is really part of this move from of like that as a kind of scientific resource into an industrial commodity. And it starts to set up that kind of data , arms , race that you see with all the big tech companies, which are essentially all data brokering companies in one way or another. Um, and Sso it's tied in there, but it's also tied into the development of models or the popularization of models . I should say that cannot be assessed .
Mara Mills (00:55:02) - eExcept in how well they predict things, right?. It's the shift in using statistics as a kind of descriptive quantification tool to one that is fundamentally about predicting outcomes. And so you both set up this lens in which the outcomes in which the are like the outcomes are used to deal with population and still with different kinds of groups – pe. People like refugees, for instance – , are made statistical in new ways that are not only descriptive, but fundamentally about predicting what they will do and assessing them in terms of risk, right. That Then you get locked into those kinds of models because that's fundamentally all the mathematical model permits you to do. And, because it doesn't model anything in particular, right? yYou can't look at you, can't open that black box and say, ‘Oh, okay. So, it, it categorizes the voice and this way, and this determines X and this determines.’ Wwhy it's a black box is , not because it's like enclosed and hidden from view, it's a black box because there's nothing in it. Right. It's, it's just trying to discern patterns between inputs and outputs . Um, and so it only can be assessed in terms of how well it predicts, which also means that it can only predict things that have happened in before. So it it's floor fundamentally conservative in that way a. And also ushers in a regime in which we're always assessing things in terms of risk. And so I think the politics really start to come together around those things.
James Parker (00:56:32) - So thisThis is why your doctoral thesis was about text prediction, right? So ASR becomes part of a story about the emergence of prediction as a automated prediction as a kind of political force or something in its own. Right A. Is that, where am I reading too much into that?
Mara Mills (00:56:52) - No, I mean, I think part of what I was trying to understand was what this kind of epistemic regime of prediction was. Right. Um, andP part of what I'm trying to figure out now, and haven't quite locked in on is how it is that a set of epistemic priorities gets plugged into kind of systems of power and things like that, which is I think where a lot of Michelle's work and Jessica's work really her, uh, hones in. Um, and so I'm trying to figure out where those two pieces lock together.
Jessica Feldman (00:57:23) - One thing that I was thinking as you were speaking, Xiaochang, and coaching is about, um, because in my work it's like , you totally see everything that you're talking about. Like you start to see it happening in the rhetoric and in like the only way, the only epistemology that is acceptable for this type of technology is that of risk and prediction. Um, and I think, you know, you, you sort ofYou elucidate that history through your research. And and then I was thinking, well, risk and prediction of what? , Maybelike the category, maybe like what, one of the things that we can work on articulating and critiquing are like the,the categories into which we can be put by these technologies. And maybe that's, that's a place of political intervention, like, and that's kind of why I'm interested in this middle layer where I'm like, okay, well, what rubrics of the soul even exists that we are being predicted into? You know? Um, so that, I think maybeMaybe that's like a kind of a, uh,an interesting and important question, like what categories, what futures are being offered to us through these forms of , uh, mMachine prediction.
Mara MillsXiaochang Li (00:58:35) -– Yeah.. And I think there's something really interesting about the production of mMachine categories, right? Because Tthe rhetoric is often something like , well, because the mMachine doesn't have a conceptualization of like humans as individuals –, as like meaningful units to produce categories around – , therefore it's therefore more objective, right . Because it's sort of, uh, because it doesn't care and doesn't know what a person is. So that makes it more suitable to produce categories of persons, which is a very strange line of reasoning. Right?
Jessica Feldman (00:59:11) - Yeah. I mean, I think that's the efficiency part, right? And that's where we, we save time and money with, you know, instead of the psychoanalytic approach where you look at an individual's history, you can do this as aeffective lListening approach,
Mara MillsXiaochang Li (00:59:27) -– And the sScale and logistics part that Michelle brought up. Right. That Ssomething about this really is fundamentally about the ability to scale that is not so much about the relationship of the human to the machine as we sometimes thought of it, but perhaps the relationship between the mMachine and the infrastructure that it's supposed to court. that Nathan logistics, for sure.
James Parker (00:59:48) - I mean, this reminds me a little bit, Jessica, in your argument, you know, you talk about this diversity at the level of design and technology. So, you know, the different ways in which all of these different affect recognition technologies conceive the sole or are made maid to conceive the soul and so on, a. And then you said the moment they hit market, they all start predicting things, but you would expect them to, and there's a kind of a convergence. So like, yeah, the tail is wagging the dog. There's a kind of, um, Jujust the market imperatives really take over. And I was struck, you know, that's so similar to what Michelle was saying about efficiency. And then I was looking at Mara's work on, um, the hearing glove and how sh you talk Mara about the sort of importance of telephony and , you know, early sort of information theory, and, and things in the emergence y of what you call an industrial conception of language. So it seems like there's a kind of, yeah, the market imperatives, just , just like they come, they rush in and take over pretty quickly in all of the stories that we're telling and, and in Mara’s story that, you know, that there's a, pre-history there too…, I suppose, but this is like,
Mara Mills (01:01:08) - And the industrial conception is, uh, it's commodification along with technication. You can't forget the commodity piece. It's always there. I mean, the telephone system in the United States at least was a legally sanctioned monopoly. It wasn't like the post office. It wasn't a service the way it might've been and in other places. So that's what I meant by the industrial conception was always, there's always that commodification and market piece there too.
James Parker (01:01:33) - So that's a crucial piece of the history for you, the telepathy family as part of the history of Machine Listening. I mean, ?
Mara Mills (01:01:39) - I mean, that's definitely part of that, that story that I was telling, I mean, the telling the history of telephany in a nutshell is quite hard because the phone system is bigger than just the phone. And, you know, it's the beginning of electro acoustics and they're byproducts of this, of research in the American telephone system at and T AT&T and T includes sound film and, you know, loudspeakers and not just cellular cellular networks. It's, it's so massive. It's like saying it's like trying to put, if someone in the next century trying to put Google into a nutshell or Amazon, it's almost impossible to do that. Um, but it, it, you know, I do think that in my research on the history of things that were happening before 1948, cause I always want to look what many people talk about 1948 as this miracle year of, you know, the transistor and cybernetics and Claude Shannon .
Mara Mills (01:02:33) - And I always want that have like, that gave birth to the so-called information age. And I always want to look at what was happening before and in my work, like touchingsXiaochang’s, I do think some of the bigger picture questions about, you, um, as we were mentioning before, phonetic biases, or say that, you know, the telephonic bias towards speech over music actually ended up in terms of the signal processing that was coming out of like Bbell Llabs or like vocoding that got built into the earliest cochlear implants and many cochlear implant users were, you know, complained for instance, that they couldn't listen to music that they could, and they couldn't hear tonal languages. So I do think some of my work it's useful for people doing work on technology today. Cause you because you can see where those biases have crept in that, as Michelle said, have been black boxed or completely invisibilized.
Mara Mills (01:03:23) - Even if the technology looks different, going through the patent chain, you often find that it has older roots than, than you would think. But then there's also just parallels to the sort of thing that Michelle was describing that maybe don't even exist anymore. There's sort of, um, there's parallels like , you know, the idea of a technological fix is a long-held dream across many different industries, including education, including governance. And it never turns out to be true because people do not. The technology doesn't act on its own, even automated technology. Um, you know, Ppeople have to learn how to use cochlear implants, their finance money training education goes into all the employment of these technologies, just, you know, in the 1960s and seventies, just as in Michelle's case legislation, police officers, um, actual laws, actual bodies, migrating, all of these things have to take place. The, the, the tool doesn't just happen and on its own. Soo I think, I think we see some of the same stories playing out, um, across our different domains. And then I think in, and even then they're just in parallel. And then I think it, in some cases, touching Xiaochang and I are providing sort of bedrock stories for components or ideas that are still like proliferating in technologies today, it's more genealogical than parallel.
James Parker (01:04:44) - I wonder if I could spin off therapy there to a different, slightly different pre-history. Cause you know, XiochangXioachang , your work is about automatic speech recognition, so sort of content on some level. And then we've talked about kind of form or the grain of the voice with Michelle and Jessica's work, but Mara and Joe Xiaochang and your collaboration on the voiceprint sort of takes us in a slightly different direction. It's about the way. And I mean, it's sort of related to Michelle story about the voice as possible, but voice ass, identity, uh, you know, thethe emergence of biometrics as a kind of a big field in relation to Machine Listening. yeah, I just, yeah, I'd love to hear a little bit more about that project, you know, cause that's an historical project again, that kind of, you know, thatbut I read that and all I can see is all of these crazy companies that are, uh, exists now that claim to be doing voice biometrics and you hinted at it. Um, but, but you know, you'reYou’re telling a much older story. So I just wondered if either or both of you would be interested in saying a little bit about that work and the history of the voiceprint and if it has any lessons for us, not that history only has the lessons., but, well,
Mara Mills (01:05:59) - This is an example where people should have spent some time looking at the history because it's, we saw across many decades over a century, in fact, and in many different national contexts and in different industries. I mean, again, whether it's government or policing or private industry, this fantasy, that one could create a voiceprint that a human being could be uniquely identified by their voice and their voice alone. Even though, even in like the 1920s speech, scientists and engineers were, were realizing as they worked on these tools, like with the early oscillographs, even the early that, you know, people's voices change with emotion, that's one of Jessica's work. That's one of the things that sportssupports, the voiceprint and people's voices change, you know, with age. And so yeah, Iin our study, we found I'm sure, I'm sure people are working on voice prints today, but what we found and what we argued was that the impulse toward voice printing shifted to an impulse towards speech recognition using the exact same tools instead, which is, you know, the idea that you can identify a word spoken by lots of different people.
Mara Mills (01:07:09) - It's also a huge technical challenge, but it's a quite different one than pin having someone uniquely identified by their voice as if it were a finger unchanging fingerprint, which it isn't. Yeah. And, and you know, things that Tthe question of speech printing is something that's supposedly has to work across accent. Now, one thing that we didn't raise was this sort of in-between category that Michelle is describing where people can beas a group of people can be identified, a population gets identified based on accent. It's a different kind of identification puzzle. It's not recognition, it's identification. And then if your accent doesn't seem quite right from the place you're saying you're from then you don't get to immigrate. You don't get asylum. And I, I, I don't think that did come. We didn't come across anything quite like that. And our in our research should be checkingXiaochang.
Xiaochang Li - Yeah, no, I mean, I don't think we did, but I think we start to see hints of that, right?
Mara Mills (01:08:02) - Because as, as you pointed out, what's really strange about what happens to voice printing as you hit the sound spectrograph is that they're using the same instrument in the same measurements to make completely sort of opposite arguments, right?. One that you can, the voice is so unique that you can identify a single person, whatever they're saying, however they're saying it. And then one that the voices, that speech is sufficiently common, that the acoustics of speech is features of sufficiently common, that you can identify specific words, despite who's saying it right. And so these don't seem like arguments that should fit together. And yet they're being produced using the same instrument and same measurements. And part of how that comes to work is the way in which they end up sort of mathematically compositing measurements across different voices, across different sort of localized sounds, things like that. And I think that's where you start to kind of see the beginnings of that same kind of population thinking as the kind of statistical manipulation start to come in. Right. And that's where you kind of start to see like, ‘Oh, well, if we can kind of map a pattern of behavior, there is then a tendency of a kind of voice, even if we cannot sort of specify a particular word or a particular person.’ And I think too, that there's another story in there that needs to be explored more, which is that this opens up a world of biometrics in a slightly different way than we've seen talked about because it's, Keirsta who Mara aa. And I wrote about in this.
Mara Mills (01:09:34) - s sStory. He goes on to sort of found his own company doing voice prints. But one of the other things that the company was trying to do was to create all kinds of other kinds of biometric identifications based on sound. So like listening to heartbeats and like blood movements and think of other sounds produced, or not even sounds, like other acoustic measurements produced in the body that were not actually like perceptible or like things that a person could hear that would serve as biometric identifiers.
Mara Mills - And he was doing stuff with Birdsong. So of course, biometric doesn't have to be the human there's lots of other biotic life forms out there. Um, A and, and doing like Birdsong prints and some of that work still exists at like the Cornell school of ornithology. It's not tended to be interpreted by critics and in as like, um, anxious of terms.,
James Parker (01:10:30) - That's an interesting way of putting it. Do you think, do you think it should be because, because basically, um, you know, so you were talking about the pretext, um, the assistiveant pretext, every time we come across, we have this conversation, you know, people will say, ‘well, look – , there is ecological uses a. And so, you know,it can't be all bad.’ because Wwe haven't really got into the critique or. Um, the possibility of critiquing the ornithologist.
Mara Mills (01:11:00) - Well, I think some people in sound studies have looked at the misapplication of some of these sound tools, um, for estimating populations of wildlife in the oceans, for instance. And if you, if you misestimate, then you are saying something's endangered or not endangered. So like tracking whale song and trying to like predict whale populations and whale migration, maybe it's just human narcissism. And anthropocentrism that we're not quite as anxious about that as we are about, um, the human applications, but , um, I'm sure there's, I'm sure there's I, I'm not– this isn't something I do primary research on – , but I have read work by my colleague, Alex Umi Hui on some of the underwater lListening tools, which are also militarized and also prone to bias and error.
James Parker (01:11:45) - There was just one other thread I wanted to pick up from your article, um, on the voiceprint stuff., you know, Yyou talk about vocal criminology ay. And so just, it just strikes me that there's, um, a threat, you know, we talked about commercial imperatives, uh, driving research, but, you know, to tie a thread into Michelle's work again about the, kind of the concept of the criminal or the risky subject. Um, Y yeah, I'm not sure if it's part of a story, uh, or a long history of the emergence of criminal law or criminology or, but, but it just, yeah, it just struck me that there's, there's something going on there, that the idea that the voiceprint could be somehow at the birth of the concept of the criminal or arrive the same time as the, the idea of the criminal.
Mara Mills (01:12:36) - WeI dropped that thread because it was sort of fizzy and gnomicphysiognomic, it was right. There was an early moment in like Dogan and other phoneticians working in Berlin who were, were creating these archives of phonograph recordings. And at first they did hope that they could identify a criminal type of voice. So that's different, that's a sort of typological thinking that's like physionomyphysiognomy or something. Um,Y you know, other people were interested in, you know, racialized types or gender types or, uh, all sorts of pathological types at that moment. And they weren't based on like statistical data sets the way we describe in our article later, population-based thinking around the voice existing. And they also weren't one-on-one so yeah, there were a lot of attempts to identify what a criminal voice was like. And we basically, in our, in our study, we're like, Oh, and then people stopped doing that.
Mara Mills (01:13:31) - Cause because it's proven to not be that scientific. And instead the mode is towards creating big data sets and having, you know, statistical more statistical thinking, but actually hearing some of the things that Jessica is describing. It sounds like that physiognomic fizzy and gnomic kind of, not that mathematical, not even, not even based on a big dataset kind of, um, thinking. And also, and I actually don't know how the technology that Michelle is describing is created – and in Michelle. I haven't asked. I mean, maybe that's not what you're researching since you're doing ethnographic work, you're looking at how it's applied, but I really don't understand how they're coming up with this idea of an accent, being particular to a particular place. That to me seems sort of physiognomic fizzy and omic as well. Um,
Mara Mills (01:14:17) - I don't know if they're, if they're asserting that certain kinds of accents are more, that certain people with certain accents are more prone to be criminals. I mean, that's certainly like of 1920s kind of logic. I'm hoping it's gone now.
Xiaochang Li - Yeah. I mean, just to expand on that a little bit too, I think part of it is that our, our article kind of stops at a certain moment that that kind of like type of logicaltypological thinking does come back in big data and machine learning a little bit later, and it comes back in a slightly different form, which is that, um, rather than finding these kinds of direct it's, it's not that they know. Um, Sso like the early sort of physiognomic busy gnomic kind of thinking really thinks about the body as the site, in which you can identify these things. And it seems to sort off, uh, foreclose the possibility of the social factors and the kind of machine learning imagination around creating these types of ologies is that actually machine learning is very good at this because all of those sort of bass bass complex social factors are then already embedded in the measurement itself.
Mara Mills (01:15:26) - And because we can't untangle it as humans, we can actually just ask the machine to sort the patterns for us if we get enough examples of this. Right? And so it gets to this black boxing thing that I think is actually more important than the like blacking like mode of the black box in terms of the obfuscation and more about the boxing of all the factors together in such a way that we can no longer disentangle them, such that as the models then get ported to conditions that have very different factors involved, . yYou're now sort of mapping things that don't actually fit together. But now you can't disentangle, which ones are sort of based on like institutional factors and social factors and geographic things and all of that stuff.
Mara Mills - Well, and the physiognomic physio gnomic is based on human perceptions. As we were talking in our, that article, like higher level human understandings of what a speech feature is.
Mara Mills (01:16:25) - And it's usually not based on a, you know, a huge data set. It's like, ‘Oh, these characteristics seem to be common to criminals’ in a more ad hoc basis. And it's things that humans can perceive. And, and as cha-chingXiaochang mentions in her research, it there's like imperceptible to humans, um, lower level acoustic elements that actually are the, the basis for making these identifications of speech or voice and in the systems we're talking about now. So features that wouldn't be considered features by the human ear wouldn't even be detected.
Michelle Pfeifer - Yeah. I was just going to say that I looked at your article again this week and Mara and Chad XiochangXiaochang and there's like this now I don't remember, but like you coats, I don't remember who you quote, but you quote someone and they're kind of saying that,t um, the problem with the body, like with fingerprinting is that you can tamper with them. that you can like temper with them.
Mara Mills (01:17:17) - Like you can burn them, but with the voice, you can't do that. And for me, it was just so interesting because it's like, uh, you know, the kind of like looking at the voice , it's like, um, to like determine, like identify someone in the context of asylum. It's like only one thing that sStates, especially in the global nNorth are doing. And they also, you know, there's a long history of like looking at the body, um, doing like age, um, assessments, uh, looking at DNA or these different kinds of things. And of course also, um, registering peoplee, uh, with their fingerprints. So often people try to taemper with their fingerprints because in the EU there's like like a sort of like, um, database, uh, a fingerprint database. And if you get registered in one country in the EU, you have to apply for asylum there. And so that might be Greece, but maybe you actually want to go to Germany right.
Mara Mills (01:18:12) - Or somewhere else. And, so there's like a reason you don't really want to get fingerprinted. And for me, it's just like, made me think again about like, yeah, what is it about the voice that there's like, this assumption that it is this more like, I don't know, it gives you like access to like the soul. Maybe that’s how psycho Jessica would say it, right. There's like something that we have there's like somewhere embedded is this idea through the voice, you actually get to the identity of someone better than maybe other parts of the body. So I just wanted to mention that because I think it also kind of relates to the sort of, I don't know, questions about risk. And I feel like in my, in my research it’s really like people who like migrate are always seen as kind of risky to the nation state, you know? And they're always like under suspicion. So it's always like this, the categories are like ‘.
Mara Mills (01:19:09) - Are you trustworthy or are you not?’, Tyou know, these are like the sort of like two categories that are there and why is the voice supposed to give us this kind of some ultimate kindness, like verification or like truth about who you actually are
James Parker (01:19:25) - Great point. And, you know, we're probably, we probably should wrap up soon, but I can't help myself because everything you were saying just reminds me of that Adrianna Cavarero cover arrow book, um, Ffor more than one voice: toward a philosophy of vocal expression, where she's trying to kind of build, you know, a political philosophy basically on the idea of, I think she calls it the phenomenology of vocalic uniqueness. And I know that this is like extremely academic, but it seems like there's a kind of…, yeah, I was always struck in that book. I don't know if any of you people read the book, is it familiar? Yeah. You know, she's so committed to the, that phenomenon, what she calls a phenomenology of uniqueness that it's really, and it's asserted, it's it's asserted.. And I sort of I'm an like, okay, well, yeah, maybe., um, Bbut it seems like what you're saying and from your research Mara and Joel, XiochangXiaochang is that basicly, I mean, as a sort of matter of science, that's sort of highly questionable and it's not to say that like Adriana Cavarero Adriana Cavalera is complicit with, um, you know, police, you know, like policing and border security forces and whatever, but that's the kind of, uh, cutting edge.
James Parker (01:20:41) - One of the most, the biggest books in the sound studies or voice studies for the last however many years. And it's very committed to that basic idea, which I don't know, it says something maybe about what the public and what people are going to be willing to accept in terms of the, the promise of voice identification that there, that, that somehow the voice really is tethered to our soul. And this is philosophically defensible, but it's sort of empirically justifiable and things. So I'm just, yeah, it's a, it's a very esoteric note to end on, but I just wondered if yeah, if you have any thoughts about that, like, is it that…
Mara Mills (01:21:20) - I think, I mean, I, what I loved about that book when it came out was that she was interested in voice, the grain of the voice and not just in speech, but I think, you know, voice is learned, vocalization has is learned to every, including the vocalizations that count as speech. We know that. And you know, I guess the question I would ask is what's at stake in arguing that, that there is vocal uniqueness. I mean, I think if we can parse vocal uniqueness from something that's essentialistlly list, it actually might have still have political force, but there are so many communities who don't want there to be an essential ism around one's one's vocal uniqueness. I mean, I'm thinking about all the interesting activist work and, and speech therapy work and also academic work that's happening around the trans voice right now. And people are the sound spectrograph and its digital online form to retrain their voices, to change the gender of their voice – genders learned, and it can be relearned.
Mara Mills (01:22:18) - And so I would say like disentangling vocal uniqueness from the essentiall list piece is very, is a really important step that needs to be taken., Ayou know, and across all of our stories studies that we are doing, you see that there, the voice can be made to tell a story about population, about universality, about speech, about affect, about an individual. It can, it can almost be endlessly interpreted. Um,B but whatever, whoever the scientist or the perceiver is – as well as, uh, using actually the same tools as I think cha-chingXiaochang stated at the beginning of our conversation. So yeah, I guess my question, I don't know if anyone can even answer this question, but what, what is at stake right now and making an argument about individual vocal uniqueness as opposed to something universal across all humans or something that's population-based, I'm not, I'm not sure.
Mara MillsXiaochang Li (01:23:11) - Yeah. I mean , I haven't, um, I haven't read the book, so I can't really speak to the arguments within the book itself, but I will say that I think, you know, if it does coincide with some of the thinking that we see from these speech technology engineers, I mean, that is not that that is not an accident in so far as the issue with a lot of these speech technologies, especially early on, is that they were neither particularly feasible nor particularly useful. And so part of what made them desirable to pursue was the already existing imagination of what the voice had to offer us. Right. So that I think the alignment comes from the fact that the, that the engineer's thinking is derivative of, of these fantasies already. Um,A and not so much that, that they, these kinds of political imaginingss around the voice are right, like coming out of the engineering fantasies. Can I just give an anecdote because.
Joel Stern (01:24:11) - Can I just give an anecdote because .
I was just thinking about, um, how a few years ago I had to call the Australian tax office to sort of sort through my finances. And it was the first time I'd encountered, um, vocal biometrics as a security device. UmS, so previously, you know, you would have to give a password and sort of gives details about your, your date of birth and things like that. But in this instance, I encountered a recorded message, um, which informed me that, you know, they, they would make a recording of my voice. And that would, that would, um, from that point on, be Bay, my passport into the sort of secure zone of, of discussing my taxation and the, the words that the tax office system asked you to say three times, um, over and over, it repeatedly in order to take a, uh, a sample of your voice and, and run it through the system where where: ‘Iin Australia, my voice identifies me. Iin Australia,.
Joel Stern (01:25:09) - mMy voice identifies me.’ And it was so striking that you were not only giving them, in that moment, that the data that the vocals that have data to analyze, but you were also sort of saying the words that, that indoctrinate you into the solar sort of system, um, you know, then the ideological or the sort of political system that, that you have to believe in, in order for, for that to work. So it's, it's sort of interesting because in some senses, it, it does, it does work, you know, and I've been able to continue to access my data sales using that system. Um, Ibut in other ways, it's striking that they, they – rather than just sort of saying, you know, ‘hello, my name is Joel’ or whatever – , you know, that, that they want you to say ‘in Australia, my voice identifies me.’ And, and for you to believe it, as you're saying it.,
Mara Mills (01:26:03) - Although it's verification, probably not identification and checking. And I talk about this in the article and we probably don't have time to get into the details of which is which, but it's, it's true. There's lots of speech or speaker verification systems now, but they're not the same as the hundred percent identity of that one would need in say policing. Although I think it's a slippery slope and it's probably convincing some people that it's possible to get again, we'll start all over again and get to speaker identification and waste voice printing.
Joel Stern (01:26:38) - No, that's right. But I was also just thinking, and I know we're going to, we're going to end up, but just, just from before, when I think Jessica, you were sort of talking about that,the um, proliferation of sort of machining classifications, you know, and, and we were, we we were speaking with someone a few, a couple of weeks ago about, um, genre recognition in music. And, you know, the way that, um, Spotify for example, have, have tried to tag genres to, to songs using Machine Listening and sort of different forms of audio analysis. And it's, it's so often wrong because the construction of genre is so much a cultural formation. And just because a song sounds like another song, it doesn't mean that it's, it's part of the same genre because you know, the different subcultures that produced it have, you know, complex affectations and there's irony and, you know, there's all of those different things. So that's just an example of where someone who listens to a lot of music and is, you know, part of a musical subculture will have so much more of a, of a kind of capacity to identify genre then a Machine Listening that is sort of a T you know, an algorithm that is attempting to do it on using some statistical kind off, um, process
Mara Mills (01:27:58) - The disability historian in me actually wants to add one little final comment on the question. I was just realizing that I wanted to also, um, mention in terms of the question of, um, the like atypical or unique voice, again, with my disability history hat that so many people who've been so many voices that are called unique are actually called that in the spirit of the atypical and it's a pathologizing claim. And so I think I would just say with regard to your, like, revisiting of CavaleroCavarero, , even though this doesn't specifically come up in her reading, like there is a hierarchy to like vocal uniqueness. And, aand a lot of people who are, are told that they have unique voices and that's not good. Um,T they're told it in the spirit of it being somehow pathological or, or having, you know, vocal absence or not entering into speech in the appropriate way.
James Parker (01:28:53) - That's an excellent point. It does actually remind me that of some companies that are specifically developing speech recognition tools and things for non typical voices.
James Parker (01:29:07) - But I don't know whether they're well used or well liked by the communities they're supposed to be for.

Loading…
Cancel
Save