ソースを参照

Update 'content/topic/improvisation-and-control.md'

master
james 3年前
コミット
fbafbb5869
1個のファイルの変更11行の追加7行の削除
  1. +11
    -7
      content/topic/improvisation-and-control.md

+ 11
- 7
content/topic/improvisation-and-control.md ファイルの表示

@@ -25,21 +25,21 @@ One of the first students to join this group was another musician-engineer named

The following year the "Music and Cognition" group rebranded.

>"I have been unhappy with 'music (and) cognition' for some time. It's not even supposed to describe our group; it was the name of a larger entity including Barry, Tod, Marvin, Ken and Pattie that was dissolved almost two years ago. But I've shied away from the issue for fear of something worse. I like Machine Listening a lot. I've also thought about Auditory Processing, and I try to get the second floor to describe my demos as Machine Audition. I'm not sure of the precise shades of connotation of the different words, except I'm pretty confident that having 'music' in the title has a big impact on people's preconceptions, one I'd rather overcome."
>"I have been unhappy with 'music (and) cognition' for some time. It's not even supposed to describe our group; it was the name of a larger entity including Barry, Tod, Marvin, Ken and Pattie that was dissolved almost two years ago. But I've shied away from the issue for fear of something worse. I like Machine Listening a lot. I've also thought about Auditory Processing, and I try to get the second floor to describe my demos as Machine Audition. I'm not sure of the precise shades of connotation of the different words, except I'm pretty confident that having 'music' in the title has a big impact on people's preconceptions, one I'd rather overcome."[Ellis]

So what began, for Rowe, as a term to describe the so-called 'analytic layer' of an 'interactive music system'[^Rowe] became the name of a [new research group at MIT](https://web.archive.org/web/19961130111950/http://sound.media.mit.edu/) and something of a catchall to describe diverse forms of emerging computational auditory analysis, increasingly involving big data and machine learning techniques. As the term wound its way through the computer music literature, it also followed researchers at MIT as they left, finding its way into funding applications and the vocabularies of new centers at new institutions.

Here is one such application, by a Professor at Columbia named Dan Ellis. This is the man sitting at the desk and the author of the email we just read. Today he works at Google, for their 'Sound Understanding Team'. As Stewart Brand once put it, 'The world of the Media Lab and the media lab of the world are busily shaping each other.'
[Here is one such application](https://www.ee.columbia.edu/~dpwe/proposals/CAREER02-machlist.pdf), by a Professor at Columbia named [Dan Ellis](https://www.ee.columbia.edu/~dpwe/). This is the man sitting at the desk and the author of the email we just read. [Today he works at Google](https://research.google/people/DanEllis/), for their 'Sound Understanding Team'. As Stewart Brand once put it, 'The world of the Media Lab and the media lab of the world are busily shaping each other.'[^Brand]

Google's 'Sound Understanding Team' is responsible, among other things, for AudioSet [videof images on this audioset site or this blog post], a collection of over 2 million ten-second YouTube excerpts totaling some 6 thousand hours of audio, all labelled with a 'vocabulary of 527 sound event categories'. AudioSet's purpose is to train Google’s 'Deep Learning systems' in the vast and expanding YouTube archive, so that, eventually, it will be able to [quote] 'label hundreds or thousands of different sound events in real-world recordings with a time resolution better than one second – just as human listeners can recognize and relate the sounds they hear'.
Google's 'Sound Understanding Team' is responsible, among other things, for [AudioSet](https://research.google.com/audioset/about.html), a collection of over 2 million ten-second YouTube excerpts totaling some 6 thousand hours of audio, all labelled with a 'vocabulary of 527 sound event categories'. AudioSet's purpose is to train Google’s 'Deep Learning systems' in the vast and expanding YouTube archive, so that, eventually, it will be able to 'label hundreds or thousands of different sound events in real-world recordings with a time resolution better than one second – just as human listeners can recognize and relate the sounds they hear'.[^Audioset]

AudioSet includes 7,000 examples tagged as 'Classical music', nearly 5,000 of 'jazz', some 3,000 examples of 'accordion music' and another 3,000 files tagged 'music of Africa'. There are 6,000 videos of 'exciting music', and 1,737 that are labelled 'scary'.

In AudioSet's 'ontology', 'human sounds', for instance, is broken down into categories like 'respiratory sounds', 'human group action', 'heartbeats', and, of course, 'speech', which can be 'shouted', 'screamed' or 'whispered'. AudioSet includes 1500 examples of 'crumpling and crinkling', 127 examples of toothbrushing, 4000 examples of 'gunshots' and 8,500 'sirens'.
In [AudioSet's 'ontology'](http://research.google.com/audioset/ontology/index.html), 'human sounds', for instance, is broken down into categories like 'respiratory sounds', 'human group action', 'heartbeats', and, of course, 'speech', which can be 'shouted', 'screamed' or 'whispered'. AudioSet includes 1500 examples of 'crumpling and crinkling', 127 examples of toothbrushing, 4000 examples of 'gunshots' and 8,500 'sirens'.

This is the world of machine listening we inhabit today; distributed across proliferating smart speakers, voice assistants, and other interactive listening systems; that attempts to understand and analyse not just what we say, but how and where we say it, along with the sonic and musical environments we move through and are moved by. Machine listening is not only becoming ubiquitous, but increasingly omnivorous too.

Jessica Feldman's essay, "The Problem of the Adjective," describes a further frontier. Affective listening software tunes in to barely perceptable vocal inflections, which are "uncontrollable, unintended, and habitual" -- but for the machine signify the [quote] "emotions, intentions, desires, fears... of the speaker—in short, the soul." Can the machine listen to our soul? Of course not, but what does it hear, what does it do when it tries? And how will we act when confronted with an instrument intent on listening so deeply?
Jessica Feldman's essay, "The Problem of the Adjective," describes a further frontier.[^Feldman] Affective listening software tunes in to barely perceptable vocal inflections, which are "uncontrollable, unintended, and habitual" -- but for the machine signify the "emotions, intentions, desires, fears... of the speaker—in short, the soul." Can the machine listen to our soul? Of course not, but what does it hear, what does it do when it tries? And how will we act when confronted with an instrument intent on listening so deeply?

## Rainbow Family
@@ -93,5 +93,9 @@ But one thing DARPA's Improv program manager says reminds us that their improvis
# Resources

[^Negroponte]: ![](bib:7cd09072-5282-441f-b30a-6d869488ecd8)
[^Rowephd]: Robert Rowe, [_Machine Listening and Composing: Making Sense of Music with Cooperating Real-Time Agents_](https://dspace.mit.edu/handle/1721.1/13835), doctoral thesis (MIT, 1991)
[^Rowe]: Robert Rowe, [_Interactive Music Systems: Machine Listening and Composing_](https://wp.nyu.edu/robert_rowe/text/interactive-music-systems-1993/) (MIT, 1993)
[^Rowephd]: Robert Rowe, [_Machine Listening and Composing: Making Sense of Music with Cooperating Real-Time Agents_](https://dspace.mit.edu/handle/1721.1/13835), doctoral thesis (MIT Press, 1991)
[^Rowe]: Robert Rowe, [_Interactive Music Systems: Machine Listening and Composing_](https://wp.nyu.edu/robert_rowe/text/interactive-music-systems-1993/) (MIT Press, 1993)
[Ellis]: Archived email exchange between Dan Ellis and [Michael Casey](https://music.dartmouth.edu/people/michael-casey), 28 March 1994. According to Casey, "Dan suggested "Machine Audition", to which I responded that term "audition" was not widely used outside of hearing sciences and medicine, and that it could be a confusing name for a group that was known for working on music--think "music audition". I believe we discussed the word hearing, but I--we?--thought it implied passivity as in "hearing aid", and instead I suggested the name "machine listening" because it had connotations of attention and intelligence, concepts that were of interest to us all at that time. That is what I remember."
[^Brand]: ![](bib:f840b2fa-8e2a-48b3-8ad7-1f138313d2b3)
[^Audioset]: Gemmeke et al. [_Audioset: An Ontology and Human-Labelled Dataset for Audio Events_](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45857.pdf)
[^Feldman]: ![](284b6cc8-1fe8-4d3f-b0b0-d53d4117370b)

読み込み中…
キャンセル
保存