Tải bản đầy đủ (.pdf) (136 trang)

Playing with tension a computational mode of improvisational accompaniment by secondary rhythmic performer in carnatic music

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (9.16 MB, 136 trang )

PLAYING WITH TENSION

PRASHANTH THATTAI RAVIKUMAR

NATIONAL UNIVERSITY OF SINGAPORE
2015


PLAYING WITH TENSION
GENERATING MULTIPLE VALID ACCOMPANIMENTS FOR THE
SAME LEAD PERFORMANCE

PRASHANTH THATTAI RAVIKUMAR
B.Tech, National Institute of Technology, Trichy,
2012

A THESIS SUBMITTED
FOR THE DEGREE OF MASTER OF ARTS
COMMUNICATIONS AND NEW MEDIA
NATIONAL UNIVERSITY OF SINGAPORE
2015



Acknowledgment
Foremost, I would like to express my sincere gratitude to my supervisors
Prof. Lonce Wyse and Prof. Kevin McGee for their continuous support,
patience, motivation, enthusiasm and immense knowledge in guiding me
to learn and do research. "To define is to limit" – I cannot quantify the
knowledge that I have learned from them in the past two years. Their
constant guidance, support and dedication has been a immense inspiration


for me to finish this dissertation.
Besides my supervisors, I would like to thank Dr. Srikumar Karaikudi
Subramanian, who has been a friend, a mentor and a person to look upto.
I will long cherish the memorable coffee-chats that have lead to so many
new insights about the thesis, music and varied things in life.
I thank my fellow lab mates from the Partner Technologies group, Dr.
Alex Mitchell, Teong Leong, Chris, Jing, Evelyn, and Kakit, for their stimulating discussions every week. Our weekly meetings used to be a ton of fun
in terms of discussing and learning diverse perspectives of doing research.
I thank the faculty, the staff and the graduate students of the Communications and New Media department for supporting and housing me as a
graduate student for the last two years.
I thank the musicians, Dr. Ghatam Karthik, Mr. Trichur Narendran,
Mr. Arun Kumar, Mr. Sumesh Narayan, Mr. Sriram, Mr. Hari, Mr.
Shrikanth, Mr. Santosh and all others who have imparted their musical
knowledge to help my understanding of the genre.
This thesis could not have progressed as much as it has, if not for
the musical insights and inspirations that I drew from our group music
jamming sessions. I take this moment to thank to my close friends and
music collaborators - Vinod, Vishnu, Lakshmi Narasimhan, Prasanna and
Arun – who have enhanced my musical growth and helped me achieve the
insights that I have in this thesis.
I thank my close friends Shyam and Kameshwari who have been a constant source of support during the tough times. I thank my friend Akshay
for the intellectually stimulating conversations. I also thank him for his
timely help during the thesis revisions. I thank Spatika Narayanan for her
help in proof-reading the document.
Last but not the least, I would also like to thank my family.
March 20, 2015
ii


Name

: Prashanth Thattai Ravikumar
Degree
: Master of Arts
Supervisor(s) : Associate Professor Kevin McGee, Associate Professor Lonce Wyse
Department
: Communications and New Media
Thesis Title
: Playing with Tension
Generating multiple valid accompaniments
for the same lead performance

Abstract
One area of research interest in computational creativity is the development of interactive music systems that are able to perform variant, valid
accompaniment for the same lead performance.
Although previous work has tried to solve the problem of generating multiple valid accompaniments for the same lead input, success has
been limited. Broadly, retrieval-based music systems use static databases
and produce accompaniment that is too repetitive; generation-based music systems that use hand-coded grammars are less repetitive, but have
a more limited range of pre-defined accompaniment options; and finally,
transformation-based music systems produce accompaniment choices which
are predictably valid for only a few cases.
This work goes beyond the existing work by proposing a model of
choice generation and selection that generates multiple valid accompaniment choices given the same input. The model is applied to generate secondary percussive accompaniment to an lead percussionist in a Carnatic
improvisational ensemble.
The central insight – the main original contribution – is that the generation of valid alternate variations of secondary accompaniment can be
accomplished by formally representing the relationship between lead and
accompaniment in terms of musical tension. By formalizing tension ranges
for acceptable accompaniment, an algorithmic system is able to generate alternate accompaniment choices that are acceptable in terms of a restricted
notion of sowkhyam (roughly, musical consonance). In the context of this
thesis, restricted sowkhyam refers to the sowkhyam of accompaniment coniii



sidered independent of the secondary performer (and his creativity).
The research proceeded in three stages. First, Carnatic music performances were analyzed in order to model the performance structures and improvisation rules that provide the freedom and constraints in secondary percussion playing. Second, based on the resulting tension model, a software
synthesis system was implemented that can take a transcribed selection
of a Carnatic musical performance and algorithmically generate new performances, each with different secondary percussion accompaniment that
meet the criteria of restricted sowkhyam. Third, a study was conducted
with six expert participants to evaluate the results of the synthesis.
The main contribution of this thesis is the development and validation
of a tension model that, assuming restricted sowkhyam, is able to generate
alternate variations of secondary accompaniment that are as valid as the
original accompaniment.

Keywords

:

Carnatic rhythmic improvisation, Improvisational
accompaniment

iv


Contents
1 Introduction
1.1

1

Structure of this document . . . . . . . . . . . . . . . . . . .


2 Related work
2.1

2.2
2.3

2
5

Retrieval-based music systems . . . . . . . . . . . . . . . . .

5

2.1.1

Retrieval from a database . . . . . . . . . . . . . . .

6

2.1.2

Retrieval using dynamic learning models . . . . . . .

6

2.1.3

Generation-based music systems . . . . . . . . . . . .

8


Hand-coded grammars . . . . . . . . . . . . . . . . . . . . .

8

2.2.1

Online learning of grammars . . . . . . . . . . . . . .

8

Transformation-based music systems . . . . . . . . . . . . .

9

2.3.1

Transformation function is pre-given . . . . . . . . .

9

2.3.2

User selects the transformation function . . . . . . . 10

3 Research problem

13

3.1


Summary of the related work . . . . . . . . . . . . . . . . . 13

3.2

Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Method

17

4.1

Analysis of the Carnatic musical performances . . . . . . . . 17

4.2

Model development . . . . . . . . . . . . . . . . . . . . . . . 17

4.3

Evaluating the tension model . . . . . . . . . . . . . . . . . 18

4.4

System development . . . . . . . . . . . . . . . . . . . . . . 18

5 Background: Carnatic quartet performance

19


5.1

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.2

Musical structures

5.3

Choices in different styles of accompaniment playing . . . . . 21

5.4

Musical actions in the improvisation

. . . . . . . . . . . . . . . . . . . . . . . 20
. . . . . . . . . . . . . 22

5.4.1

Major variations . . . . . . . . . . . . . . . . . . . . 22

5.4.2

Minor variations . . . . . . . . . . . . . . . . . . . . 24
v



6 System: design criteria & constraints

27

6.1

Research/Implementation model . . . . . . . . . . . . . . . . 27

6.2

Lead percussionist: improvisation and variation . . . . . . . 28

6.3

Secondary percussionist: accompaniment and variation . . . 29

7 Possible approaches

31

7.1

The Direct Mapping model . . . . . . . . . . . . . . . . . . . 32

7.2

The Horizontal Continuity model . . . . . . . . . . . . . . . 33

8 The tension model


35

8.1

Tension model applied to secondary playing . . . . . . . . . 35

8.2

Tension model applied to generate multiple accompaniments

9 Tension synthesis protocol

36
37

9.1

Choose Carnatic performance recording . . . . . . . . . . . . 38

9.2

Choose a sixteen bar sample of performance recording . . . . 39

9.3

Transcribe the sixteen-bar selection . . . . . . . . . . . . . . 39
9.3.1

Transcribing double hits . . . . . . . . . . . . . . . . 40


9.3.2

Transcribing hit loudness . . . . . . . . . . . . . . . . 40

9.3.3

Transcribing rhythmic repetition of bars . . . . . . . 40

9.4

Compute tension scores for each hit . . . . . . . . . . . . . . 42

9.5

Compute tension scores for each beat . . . . . . . . . . . . . 42

9.6

Compute tension range for each bar . . . . . . . . . . . . . . 43

9.7

Generate all viable accompaniment sequences . . . . . . . . 46
9.7.1

Enumerate all unique triplet values for each beat . . 47

9.7.2

Collect all viable 8-beat (1-bar) sequences . . . . . . 47


9.7.3

Collect secondary sequences that meet tension constraints . . . . . . . . . . . . . . . . . . . . . . . . . 48

9.8

Construct secondary transcription for entire piece . . . . . . 50

9.9

Synthesize performance . . . . . . . . . . . . . . . . . . . . . 51

10 Tension synthesis: practical details

53

10.1 Separating tracks from original recording . . . . . . . . . . . 53
10.2 Storing the transcript . . . . . . . . . . . . . . . . . . . . . . 54
10.3 Sequencing audio from a transcript . . . . . . . . . . . . . . 54
10.4 Creating a new recording . . . . . . . . . . . . . . . . . . . . 54
11 Study protocol

55

11.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
11.2 Materials

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
vi



11.2.1 Documents . . . . . . . . . . . . . . . . . . . . . . . 57
11.2.2 Equipment . . . . . . . . . . . . . . . . . . . . . . . . 58
11.2.3 Recordings (original) . . . . . . . . . . . . . . . . . . 58
11.2.4 Recordings (with new accompaniment) . . . . . . . . 59
11.3 Study Disclaimer . . . . . . . . . . . . . . . . . . . . . . . . 63
11.4 Study Session Protocol . . . . . . . . . . . . . . . . . . . . . 66
11.4.1 Gather demographic information . . . . . . . . . . . 67
11.4.2 Explain evaluation criteria . . . . . . . . . . . . . . . 67
11.4.3 Sequencing the recordings . . . . . . . . . . . . . . . 68
11.4.4 Evaluate recordings . . . . . . . . . . . . . . . . . . . 69
11.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
12 Study results

73

12.1 RQ1: does system produce acceptable accompaniment . . . . 74
12.1.1 Recording 1 . . . . . . . . . . . . . . . . . . . . . . . 75
12.1.2 Recording 2 . . . . . . . . . . . . . . . . . . . . . . . 75
12.1.3 Recording 3 . . . . . . . . . . . . . . . . . . . . . . . 76
12.2 RQ2: are accompaniments inside the range better? . . . . . 76
12.2.1 Recording 1 . . . . . . . . . . . . . . . . . . . . . . . 77
12.2.2 Recording 2 . . . . . . . . . . . . . . . . . . . . . . . 77
12.2.3 Recording 3 . . . . . . . . . . . . . . . . . . . . . . . 78
12.3 RQ3: do ratings decrease as a function of distance . . . . . . 78
12.3.1 Recording 1 . . . . . . . . . . . . . . . . . . . . . . . 79
12.3.2 Recording 2 . . . . . . . . . . . . . . . . . . . . . . . 79
12.3.3 Recording 3 . . . . . . . . . . . . . . . . . . . . . . . 80
12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

13 Potential objections

81

14 Discussion

85

14.1 Algorithmic limitations . . . . . . . . . . . . . . . . . . . . . 85
14.2 Transcription limitations . . . . . . . . . . . . . . . . . . . . 86
14.3 System limitations . . . . . . . . . . . . . . . . . . . . . . . 86
15 Future work

89

Appendices

93

A Key Terms

95

A.1 Terms: tension model . . . . . . . . . . . . . . . . . . . . . . 95
A.2 Terms: Carnatic music . . . . . . . . . . . . . . . . . . . . . 96
vii


B Enumerating the accompaniment sequences


99

C Assigning perceptual scores
C.1 Diction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.2 Loudness . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.3 Note duration . . . . . . . . . . . . . . . . . . . . . . . . .

101
. 101
. 102
. 104

D Transcription: internal representation
105
D.1 Transcription: internal representation . . . . . . . . . . . . . 109
E Results
111
E.1 Complete results for recordings . . . . . . . . . . . . . . . . 111
E.2 Complete results for variants . . . . . . . . . . . . . . . . . . 113
F Study documents
F.1 Session checklist . . . . . . .
F.2 Demographic questionnaire .
F.3 Participant variant sequence
F.4 Evaluation sheet . . . . . .
F.5 Participant observation form
F.6 Participant definition sheet .

.
.
.

.
.
.

viii

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.

.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.

.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.


.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

115
. 116

. 117
. 118
. 119
. 120
. 121


List of Tables
9.1

Rhythmic repetition of bars . . . . . . . . . . . . . . . . . . 41

9.2

Tension scores for each hit . . . . . . . . . . . . . . . . . . . 42

9.3

Tension scores for each beat . . . . . . . . . . . . . . . . . . 43

9.4

Computing TZP and tension range for a bar . . . . . . . . . 44

9.5

Computing TZP and tension range for a bar . . . . . . . . . 44

9.6


Computing TZP and tension range for a bar . . . . . . . . . 45

9.7

Lookup table for 2-beats . . . . . . . . . . . . . . . . . . . . 47

9.8

Possible 2-beat diction combinations . . . . . . . . . . . . . 47

9.9

Possible 2-beat diction combinations . . . . . . . . . . . . . 48

9.10 Valid 3-beat diction combination . . . . . . . . . . . . . . . 48
9.11 Two bars (average tension scores) . . . . . . . . . . . . . . . 49
9.12 Two bars of valid sequences . . . . . . . . . . . . . . . . . . 49
9.13 Rhythmic repetition of bars, with accompaniment . . . . . . 50
11.1 Participant data . . . . . . . . . . . . . . . . . . . . . . . . . 57
11.2 Two bars of valid sequences . . . . . . . . . . . . . . . . . . 60
11.3 Two bars of valid sequences . . . . . . . . . . . . . . . . . . 61
11.4 Variants by distance value . . . . . . . . . . . . . . . . . . . 62
11.5 Distance of variants used for recording 1 . . . . . . . . . . . 64
11.6 Distance of variants used for recording 2 . . . . . . . . . . . 64
11.7 Distance of variants used for recording 3 . . . . . . . . . . . 64
11.8 Recording sequences for participants . . . . . . . . . . . . . 68
11.9 Variant sequences for participant . . . . . . . . . . . . . . . 69
12.1 Average accompaniment rating per recording . . . . . . . . . 74
12.2 Average rating for variants of recording 1 . . . . . . . . . . . 75
12.3 Average rating for variants of recording 2 . . . . . . . . . . . 75

12.4 Average rating for variants of recording 3 . . . . . . . . . . . 76
12.5 Accompaniment ratings for variants of recording 1 . . . . . . 77
12.6 Accompaniment ratings for variants of recording 2 . . . . . . 78
12.7 Accompaniment ratings for variants of recording 3 . . . . . . 78
12.8 Accompaniment ratings for different variants . . . . . . . . . 79
ix


12.9 Accompaniment ratings for different variants . . . . . . . . . 79
12.10Accompaniment ratings for different variants . . . . . . . . . 80
C.1
C.2
C.3
C.4
C.5

Weights for lead strokes . . .
Weights for secondary strokes
Perceived loudness of lead and
Weights for loudness . . . . .
Weights for note duration . .

. . . . . .
. . . . . .
secondary
. . . . . .
. . . . . .

. . .
. . .

hits
. . .
. . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.

.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

101
102
103
103
104

D.1 Transcription of recording 1, bars 1-16 . . . . . . . . . . . . 106
D.2 Transcription of recording 2, bars 1-16 . . . . . . . . . . . . 107

D.3 Transcription of recording 3, bars 1-16 . . . . . . . . . . . . 108
E.1 Accompaniment ratings for recordings 1, 2, and 3 . . . . . . 112
E.2 Accompaniment ratings for variants 0-6 . . . . . . . . . . . . 113
F.1 Recording and variant sequences . . . . . . . . . . . . . . . . 118

x


List of Figures
5.1

5.2
5.3

The Carnatic quartet (from left): lead percussionist, secondary, vocalist, Tambura (provides the background drone),
and violinist. . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Two bars of lead and secondary playing . . . . . . . . . . . . 21
Different minor variations . . . . . . . . . . . . . . . . . . . 24

7.1
7.2

Direct Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 32
Horizontal Continuity: secondary follows the lead changes . 33

8.1
8.2

Tension-relaxation visualization . . . . . . . . . . . . . . . . 35
Tension between lead and secondary . . . . . . . . . . . . . 36


xi


xii


List of Algorithms
1
2
3
4
5

Hit tension score calculation .
Beat tension score calculation
Unique 1-hit and 2-hit triplets
Unique 1-beat triplets . . . .
Unique 8-beat triplets . . . .

xiii

.
.
.
.
.

.
.

.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.

.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.

.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.

.
.
.

42
43
99
99
99


Chapter 1
Introduction
This chapter introduces the research area of musical improvisational accompaniment systems and highlights an important
problem in this field. Improvisational accompaniment systems
differ from score-following, solo-trading, and tap-along systems
in that they are able to produce multiple valid musical alternatives for the same performance. Developing musical accompaniment systems that generate multiple valid accompaniments
by modeling the constraints of accompaniment playing, is the
problem of interest in this thesis.
Computational creativity is an emerging field of research in artificial
intelligence, cognitive psychology, philosophy, and the arts. The goal of
computational creativity is to model, simulate or replicate human creativity using a computer. One area of research interest in computational creativity is the development of improvisational music systems that are able
to perform variant, valid accompaniment for the same lead performance.
Developing musical accompaniment systems that generate multiple valid
accompaniments by modeling the constraints of accompaniment playing, is
the problem of interest in this thesis.
Although previous work has tried to solve the problem of generating
multiple valid accompaniments for the same lead input, success has been
limited. Broadly, retrieval-based music systems that use static databases
are produce accompaniment that is too repetitive; generation-based music systems that use hand-coded grammars are less repetitive, but have

a more limited range of pre-defined accompaniment options; and finally,
transformation-based music systems produce accompaniment choices which
are predictable valid for only a few cases.

1


This work goes beyond the existing work by proposing a model of
choice generation and selection that generates multiple valid accompaniment choices given the same input.

1.1

Structure of this document

The remainder of this document is structured as follows:
• Related work This chapter summarizes the previous work on improvisational accompaniment systems developed for generating multiple
valid accompaniments by modeling the constraints of accompaniment
playing.
• Research problem This chapter identifies a significant problem left
open by previous work and presents the research focus: to develop a
model of rhythmic accompaniment for Carnatic ensemble music that
produces multiple musically valid accompaniments, given the same
input.
• Method This chapter provides a brief overview of the method used
during this thesis research. The method included the analysis of
Carnatic music performances, development of different models of accompaniment playing, their implementation as computer programs,
and their evaluation.
• Background This chapter describes the roles and activities of the
lead and secondary percussionist within a Carnatic quartet performance. It further describes the musical structure and provides examples of different scenarios of lead and secondary percussion playing in
a performance ensemble.

• System design criteria This chapter describes the narrow subset
of constraints that guided the research and development of the secondary accompaniment system. The structural constraints separate
the music into improvisational cycles made of eight bars in a 4/4 time
signature. The input constraints restrict the lead to minor bar variations. The output constraints restrict the scope of secondary accompaniment to playing compliant accompaniment to the lead. Within
these constraints, the secondary system still has the freedom to play
a variety of valid accompaniments in a given situation.
• Possible approaches This chapter describes two seemingly-reasonable
approaches – Direct Mapping and Horizontal Continuity – and shows
why they will not effectively solve the central research problem.

2


• The tension model This chapter describes the tension model that
was developed to address the shortcomings of the previous models.
Applied to the activity of secondary accompaniment playing in a
Carnatic performance, the tension model is used as a constraint satisfaction mechanism to generate multiple accompaniments given the
same lead.
• Tension synthesis protocol This chapter describes the main steps
involved in synthesizing recordings with variant valid accompaniment.
• Tension synthesis: practical details This chapter describes the
different steps in the synthesis process in terms of the different technologies used to implement them.
• Study protocol This chapter describes the study conducted with
musical experts for evaluating the ability of the system to produce
alternate valid secondary accompaniments for a Carnatic musical performance.
• Study results This chapter describes the main results from the user
study and uses them to answer the research questions.
• Potential objections This chapter highlights the aspects of the
study design that could raise objections about the claims made from
this work.

• Discussion This chapter identifies the main limitations of the research reported here and discusses their impact on the findings from
the study.
• Future work This chapter proposes directions for future work.
The next chapter reviews work on developing improvisational accompaniment systems that generate multiple valid accompaniments by modeling
the constraints of accompaniment playing.

3


4


Chapter 2
Related work
This chapter summarizes the previous work on improvisational
accompaniment systems developed for generating multiple valid
accompaniments by modeling the constraints of accompaniment
playing. Previous work has developed retrieval-based music
systems, generation-based music systems and transformationbased music systems to solve the problem. Retrieval-based
music systems use dynamic learning models to produce different sequence continuations given the same input, but at any
given point in the performance they produce deterministic output. Generation-based music systems dynamically update the
production rules of a grammar that are used to generate different accompaniments, but at any given point in the performance the production rules produce deterministic output.
Transformation-based music systems generate permutations of
a source rhythm representation to generate multiple accompaniments, but the generated choices are not always musically
valid.
Previous work that has tried to solve the research problem can be classified into retrieval-based, generation-based, and transformation-based music
systems. This chapter reviews the systems and highlights the problems they
solve.

2.1


Retrieval-based music systems

Retrieval-based music systems use musical parameters to retrieve the best
possible accompaniment from a set of accompaniment patterns. The focus
is on optimizing the parameters for efficient representation and real-time
5


retrieval. There are two variations of retrieval-based music systems based
on the type of data structure used to store the accompaniment: retrieval
from a database and retrieval using dynamic learning models.

2.1.1

Retrieval from a database

The first type of retrieval-based music systems store the accompaniments in
a database which is queried to retrieve the accompaniment. The accompaniments in the database are organized by their musical features. Retrieval
systems extract the necessary musical features from the input, package
them into a data format which is suitable to query the database, and retrieve the accompaniment. The best matching accompaniment is retrieved
and played.
Impact is an accompaniment system that uses case-based reasoning and
production rules to retrieve accompaniment from a database of accompaniment patterns (Ganascia, Ramalho, and Rolland, 1999). It extracts metalevel descriptions of musical scenarios (such as the beginning and end of a
bar), fills in the sections and the duration of chords, and uses the result to
form a query. This query is used to retrieve the best matching accompaniment from the database. The best accompaniment is selected according
to a measure of mathematical distance between the query (called target
case) and each of the patterns in the database. Given a single input, the
system always returns one accompaniment (the best matching accompaniment) as output. Cyber-Joao is an adaptation of the Impact system that
optimizes the number of parameters used for the retrieval (Dahia et al.,

2004). It ranks the different musical features based on expert knowledge
data, and uses the ranking to determine the important musical features in
a given performance situation. Each rhythm is distinctly characterized by
a single set of accompaniment values and the musical features are used to
query and retrieve the accompaniment pattern from the database. Since
each rhythm in the database is distinctly characterized by a single set of
accompaniment values, there is always only one accompaniment available
for any given musical scenario.

2.1.2

Retrieval using dynamic learning models

In order to overcome the limitations of statically stored accompaniment
options, systems were developed with capabilities to model the input rather
than statically store it.
6


One of the earlier systems that retrieved accompaniment using Markov
models was the M system (Zicarelli, 1987). It listens to a musicians performance as streams of MIDI data and builds a Markov chain representation
on the fly. It traverses over the representation in order to send the output.
Another well known example is the Continuator system (Pachet, 2002a;
Pachet, 2002b). The Continuator uses Markov modeling to build possible
sequence continuations of musical sequences played earlier in the performance. For any given sequence of musical notes, the accompaniment is
retrieved by selecting the longest sequence continuation. A later version of
the Continuator system models the trade-offs between adaptation and continuity of the retrieved accompaniment (Cabral, Briot, and Pachet, 2006).
Apart from finding a continuation sequence, the system constantly reviews
the relationship between the retrieved accompaniment and the harmonic
context to retrieve a new continuation in case of any mismatch. Another

system, Omax, in addition to listening to lead, listens to its own past improvisations (Assayag et al., 2006). In a special self-listening state, the
system listens to its own outputs to bias its Markov model. This results
in a variety of possible choices for future accompaniment, depending on
whether the system was listening to itself or to the lead.
The second variation of the retrieval systems also use Markov models to
produce sequence continuations of accompaniment. These systems model
the music as sequence continuations, based on listening to the improviser’s
input. Given a starting note or a sequence, the model is traversed to produce the musical continuation. As the system listens to more of the input
it changes the Markov model and the sequence continuations. Thus it is
able to produce multiple alternate accompaniments for different situations.
Although the use of modeling approaches improves performance over the
static database approach, at any point in the performance these systems
retrieve and play only one valid accompaniment.
There is, however, one non-accompaniment system that falls broadly
into this category, but which generates musically-valid variations that do
meet the musical constraints of a given melodic line (Donze et al., 2013).
This “control improvisation system” generates variations of a lead melody
in jazz. Specifically, given a reference melodic and harmonic sequence, the
system builds a probabilistic model of all state transitions between the
notes of the melody. The probability values assigned to the transitions
determine the variations of the main melody produced. Assigning a high
probability to transitions of the reference melody (called direct transitions),
it produces melodic sequences similar to the reference melody. Assigning
7


low probability to the direct transitions, it produces melodic sequences different from the reference line. Thus, given the same harmonic progression
and a reference melodic line, the system produces variations by controlling
a single parameter, the probability value of transitions.
Although it is not an accompaniment system, the approach could conceivably be used as the basis for one, but not without significant modification. This is because the generation part of the system is entirely influenced

by itself, by what it played earlier. Without modification, this would result in an odd accompaniment scenario, one in which the choices of the
accompanist are based on his own decisions rather than being based on the
changes played by an improviser. And if the goal was to transform this
into an accompaniment system, it would not be sufficient to simply modify
the system so that it listened to the lead performer; many of the challenges
and limitations described in future chapters would still appear.

2.1.3

Generation-based music systems

Generation-based music systems use musical grammars to generate accompaniment. The grammars contain production rules that associate the characteristics of the input rhythm with an output accompaniment rhythm.
The grammars are either hand-coded by a human expert or automatically
inducted by listening to performances. There have been several systems
developed using each type of grammar.

2.2

Hand-coded grammars

Voyager (Lewis, 2000) and Cypher (Rowe, 1992) are examples of accompaniment systems that uses hand-coded grammars to generate accompaniment responses. They contain pre-defined sub-routines that are triggered
by specific conditions to generate the different accompaniment responses.
However, the rules of these grammars are rigid and unchanging, and as a
result, these systems are limited in their ability to respond to the same
input with alternative outputs.

2.2.1

Online learning of grammars


One improvement over hand-coded grammars is the development of grammars that are more flexible and learn on the fly.
ImprovGenerator is an example of an accompaniment system that learns
musical grammars online (Kitani and Koike, 2010). It listens to the varia8


tions of a base rhythm and generates production rules corresponding to the
variations. The different production rules are assigned a probability value
that changes over the course of a performance. FILTER is another system
that employs an online learning approach (Van Nort, Braasch, and Oliveros, 2009; Van Nort, Braasch, and Oliveros, 2012). It is an improvising
instrument system that reacts in novel and interesting ways by recognizing
the gestures of a performer. The system comes pre-loaded with 20 gestures
and the transitions between the gestures are modeled by a Markov model.
Over the course of a performance, it varies the transition probabilities of the
gestures to produce interesting and varied responses. However, the relation
between the gesture and the output parameters itself remains constant. In
other words, it does a better job of generating different responses over the
course of a performance, but at any given point in the performance, it will
produce the same accompaniment given the same input.1
Grammar systems that use online learning are more flexible and generate more varied responses compared to the systems developed using handcoded grammars. However, in both cases, the grammars are modeled deterministically and once the grammar is inducted, the same input will produce
the same output.

2.3

Transformation-based music systems

Transformation-based music systems apply a transformation function on
the input to generate the output. The transformation function is usually a
mathematical operation that is applied on each of the input parameters to
produce the output accompaniment values. Multiple accompaniments are
generated by permuting a representation of the input parameters. There

are two kinds of transformation systems based on how the transformations
are generated: systems where transformation function is pre-given and systems where the user selects the transformation function.

2.3.1

Transformation function is pre-given

In pre-given transformation systems, the transformation function computed
is given through a target accompaniment value, which is given as input to
the system. The transformation function is computed as a function of the
1

One notable thing about FILTER is that it models the interplay between lower
level audio features and higher level gestural parameters. This will be discussed in more
detail in later chapters.

9


target accompaniment and is applied on the input values to generate the
accompaniment.
Ambidrum is one system that uses a statistical measure of rhythmic
ambiguity to generate rhythmic accompaniment (Gifford and Brown, 2006).
It measures rhythmic ambiguity using a statistical correlation between the
rhythmic metre and three rhythmic variables: the beat velocity, pitch,
and duration. The system is given the target correlation values which it
uses to transform the input to the output which can be either metrically
coherent or metrically ambiguous rhythms. Metrically coherent rhythms
are musically valid as accompaniment and are generated by Ambidrum
system when its target correlation matrix (transformation function) is an

identity function. When the transformation function is not an identity
function, the rhythms generated by Ambidrum are metrically ambiguous
and their musical appropriateness varies widely.
Another system, Clap-along, uses values from the target accompaniment
to move the input towards the target (Young and Bown, 2010). The system uses four musical features to compute the distance between the source
and the target accompaniment and progressively modifies the source towards the target. For each generation, the system generates 20 choices and
finds the closest rhythm to the target by computing the Euclidean distance.
When the performer repeatedly claps the exact same pattern, the system is
able to slowly evolve its output towards the target accompaniment. However, variations in the performer’s rhythms causes unstable changes in the
system’s output, often resulting in inappropriate accompaniment.
The main limitation of both these systems (and systems like these) is
that there are very few cases when the accompaniment generated by the
systems is predictably valid (musically).

2.3.2

User selects the transformation function

In order to get transformation systems to generate [musically predictable]
output, systems have been created in which user preference is used to generate transformation functions. For example, NeatDrummer generates drumtracks by transforming the other musical parts in the song (Hoover, Szerlip,
and Stanley, 2011; Hoover, Rosario, and Stanley, 2008). The different accompaniment tracks are generated by giving different input tracks (like
piano, violin, and vocal) to an Artificial Neural Network, called CPPN,
that generates the output rhythms. The CPPN is initially trained by using
the input from the different audio tracks. In the successive generations, the
10


×