Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
Studi Linguistici e Filologici Online ISSN 1724-5230 Volume 10 (2013) – pagg. 183-218 G. Marotta, L. Iacoponi, A. Idone – “Asymmetries between Perception and Production. Pitch and Length in two varieties of Italian (Pisan and Crotonese)”
ASYMMETRIES BETWEEN PERCEPTION AND PRODUCTION.
PITCH AND LENGTH IN TWO VARIETIES OF ITALIAN (PISAN AND
CROTONESE)
GIOVANNA MAROTTA, LUCA IACOPONI, ALICE IDONE1
1 Introduction
In the last decade, a series of empirical researches on speech
perception have focussed on the perceptual effects produced by the
interaction of the two leading acoustic parameters: duration and F0
[cf. Pisoni & Remez, 2005].
The relevance of a multidimensional approach becomes
especially clear in dealing with phenomena like prominence, because
the perceptual salience of an auditory object depends on the peculiar
combination of different physical elements (e.g. frequency, duration,
intensity, voice quality), and is not derivable on a single one of them
[Niebuhr, 2009].
In this study, we would like to present original experimental data
1 All authors contributed extensively to the work presented in this paper: G. M.
conceived the study, supervised the experiment and wrote §1 and §5; L. I. wrote
code, analysed output data and wrote § 3 and §4; A. I. administered the experiment,
edited the manuscript and wrote §2.
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
184
regarding the relevance of two prosodic parameters, length and tone
modulation, in the perception of prominent vowels by Italian listeners.
In particular, our focus will be twofold: on the one hand, the
perceptual impact of the native linguistic variety on perception, and on
the other, the role of music training in tasks involving recognition of
prosodic parameters like pitch and length.
2 State of the art
2.1 F0 Modulation and Length
The influence of fundamental frequency (F0) on the perception
of vowel duration is a vexata quaestio. Previous findings are
conflicting. In spite of the widely accepted opinion, according to
which a dynamic F0 contour lengthens perceived duration
[Gussenhoven, 2004; Yu, 2006; Galloway, 2008], there is also
experimental evidence challenging this claim [Rosen, 1977a, 1977b;
van Dommelen, 1993; Lehnert-LeHouillier, 2007].
Probably, the absence of coherent results lies in the procedural
differences that mark the experiments: the effect of dynamic F0 in an
accentual language like Swedish [Rosen ,1977a; 1977b]; the rating of
synthetic monosyllables on a 7-point duration scale [Yu, 2006]; the
inclusion of syllabic structure as a relevant variable [Van Dommelen,
1993]; articulatory explanations and correlations between degree of
vowel length and degree of height [Gussenhoven, 2004].
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
185
For the purpose of this work, given the previous mixed findings
about the possible influence of F0 on the perception of length, the
ABX test was chosen because less subjective than explicit qualitative
judgments.
2.2 Language dependence
The relevance of the native language in the perception of
prosodic parameters like duration and F0 variation has recently been
questioned. Previous studies dealing with the variable of language-
dependence do not allow explanations, by and large, to be conclusive.
The double variability, namely the native language of the listener and
the language chosen as the source for testing the perceptive stimuli,
has been called into doubt, from time to time, according to the single
acoustic parameter considered.
In the specific case of Intrinsic Pitch (IP), Pape & Mooshammer
[2006] demonstrated its dependence on the native language of the
listener, but not on the language of the stimuli proposed. In a more
recent and exhaustive research, Pape [2008] delved into the
phenomenon of pitch and F0 insensitivity among Romance languages.
The data collected seemed to indicate that Romance listeners were
tendentially pitch-insensitive taking into account vowel quality rather
than F0 variations because of their small vowel inventory. So the
native language peculiarities are supposed to influence the perception
of acoustic parameters.
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
186
Lehnert-LeHouillier [2007] as well confirms the importance of
this variable by demonstrating that the lengthening effect of dynamic
F0 occurs only in listeners of some languages (Japanese speakers) but
not of others (Thai, German and Spanish listeners); whereas Galloway
[2008] challenges the dependence of this perceptual effect on native
language, since in her experiment the lengthening effect was displayed
in rhythmically different languages: the syllable-timed French and the
stress-timed Swiss German.
In conclusion, there is controversial evidence supporting the
influence of one native language in the perception of length and
melodic contour.
To limit the number of linguistic variables involved in the
experiment, two varieties of the same language rather than two
different languages were compared.
2.3 The role of musical training
Several psycholinguistic studies have recently focused on the
connection between the domains of music and speech perception. In
particular, the basic issue concerns the possible influence of musical
education on the processing of speech acoustic parameters.
In a recent paper, Schön et al. [2004] contributed to the debate
by using brain imaging techniques. They manipulated the F0 in the
final part of word and note sequences, involving French musicians and
non-musicians. The results revealed that within the domains, language
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
187
prosody and music, musicians detected weak F0 manipulations better
and faster than non-musicians. They, together with other scholars
[Deutsch et al., 2004], demonstrated that this evidence also have a
neural counterpart that can finally state the connection between music
and speech. They claimed that musical training makes easier the
detection of pitch changes not only in music, but in language as well,
calling into play similar cognitive processes.
Furthermore, it was demonstrated that not only musical training
influences and improves the perceptual processing, but the perceptual
abilities in discrimination are specific to the domains that music
training emphasizes [Rauscher & Hinton, 2003]
Moreover, the variable ‘musician’ is claimed to be independent
of the native language: all professional musicians, for example, are
pitch-sensitive, with nearly identical results across all languages
[Pape, 2008].
Nevertheless, the musical and the linguistic processing are not
completely comparable, and the typology of the language involved
can slightly modify the perspective, as the experiments involving tone
languages demonstrate [Stevens et al., 2004; Schwanhäußer &
Burnham, 2005; Bidelman, Gandour & Krisnan, 2011].
The aforementioned experiments depict a general frame in
which, in spite of differences in the processing phase, musical
competence can be an important element for a better perception of
acoustic parameters. Nevertheless, others scholars [Niebuhr, 2009]
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
188
avoid explanations of this kind by justifying the better perceptual
performances of musicians to a matter of meta-language: the poor
performance of non-musicians is due to the fact that they are less
confident with the conceptualization of acoustic parameters.
For the experiment, the sample included professional or semi-
professional musicians and subjects who had any explicit knowledge
that could influence the perception of sound stimuli. Contrary to some
experiments, students from the Linguistics Department or from the
Laboratory of Phonetics were excluded from the experiment, as the
criterion of explicit knowledge was vaguely satisfied.
2.4 Pisan and Crotonese Italian
Two varieties of the same language were chosen rather than two
different languages to narrow down the number of prosodic variables
that could influence the perception of the stimuli.
The stimuli refer to two regional varieties of Italian, and not to
dialects. Italian dialects vary greatly and mutual intelligibility is often
rare among major groups. Different varieties of Italians may differ in
many respects influenced by local dialects, especially at the
phonological level, but they maintain mutual intelligibility. Most
features of standard Italian are shared by regional varieties.
The comprehension of the stimuli was therefore guaranteed and
it was possible to include among the participants listeners belonging to
the macro-linguistic areas of Tuscany and Southern Calabria.
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
189
Typical features of Pisan are the consonantal lenition
phenomenon known as Tuscan Gorgia [see Marotta, 2001; 2008], the
deaffrication of /ʧ/ and /ʤ/ in intervocalic position [see Bertinetto and
Loporcaro 2005] and the lowering of the middle vowels /ɛ/ and /ɔ/
[see Calamai 2004]. Crotonese, as most of Southern dialects, is
characterised by consonantal fortition [see Loporcaro, 2009] and,
being an extreme Southern Italian dialects [Pellegrini, 1977], by the
neutralisation of the phonological opposition between /e/ ~ /ɛ/ and /o/
~ /ɔ/ in tonic position [see Fanciullo, 1994].
Pisan and Crotonese varieties have been chosen by virtue of
their use of the two prosodic parameters (i.e. duration and frequency)
to convey prominence. In Pisan, long and modulated vowels occur in
case of prominence [cf. Marotta et al. 2004; Marotta et al. 2011]. In
detail, with respect to Florence, in the areas of Pisa there is a stronger
increase of stressed vowel duration and of frequency range; at the
same time, on the perception side, longer vowels and higher
modulation are clearly identified as distinctive features for Italian
spoken in Pisa [Calamai & Ricci, 2005a; 2005b]. On the other hand,
Crotonese, together with most of Calabrian dialects, show very long
prominent vowels with a minor F0 modulation with respect to Pisan
[Romito & Trumper, 1989; Mendicino & Romito, 1991; Romito,
1993; Marotta & Sardelli, 2007, 2009].
The geographic position of the varieties chosen for the
experiment is shown in Figure 1.
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
190
Figure 1: Geographical location of Pisa and Crotone with reference to their linguistic area, i.e. Tuscan and Southern Calabrian.
3 Method
3.1 Stimulus Set
The auditory stimuli are part of the set of stimuli used in a previous
experiment [see Marotta et al., 2011]. Eight words containing
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
191
prominent vowels in open stressed penultimate syllable – a context
where tonic lengthening occurs in Italian [see Marotta 1999;
D’Imperio & Rosenthal 1999] - were extracted from a semi-
spontaneous conversation held at different times by a Pisan and a
Crotonese speaker.
Prominent vowels are here defined as segments with a special
degree of perceptive salience in an utterance. As is well-known, in a
phonetic string, a segment as well as a syllable, can be perceived as
being prominent after the relevant modification of the three basic
acoustic parameters, i.e. length, intensity and frequency, which are
perceived as changes in length, volume and tone [cf. Rietveld &
Gussenhoven, 1985; Kohler, 2008]. On perceptive ground, we refer to
prominence as in the definition of Terken [1991]: “the prominence is a
property by which linguistic units are perceived as standing out from
their environment”. For a more detailed discussion of the criteria
adopted for selecting prominent vowels used in the stimulus set, we
refer the reader to Marotta et al. [2011].
The varieties under scrutiny were chosen as they display
asymmetrical behavior in the use of duration and pitch to convey
prominence: Pisan displays pitch modulation in stressed vowels, while
phonetic lengthening is observed for the Crotonese variety (cf §2.4)
[Marotta et al., 2004; 2011]. To limit the number of variables,
intensity variation was not considered in the stimulus modification
set.. Finally, the target vowel in all words is low or mid-low. It was in
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
192
fact early observed [Grammont, 1993] that low vowels are longer than
high vowels, a tendency confirmed by recent measurements of the
varieties used in this experiment [Marotta et al. 2011].
The Pisan speaker is a 51 year old woman, born and raised in
Pisa by Pisan parents. The Crotonese speaker is a 25 year old student,
living in Pisa at the time of the interview, but a native and fluent
speaker of the Crotonese dialect. Both accents were easily perceivable
as regional. Each speaker was recorded in the laboratory of Phonetics,
at the University of Pisa, using a digital solid-state recorder Marantz
PMD671, equipped with a Sennheiser MKE 40-EW microphone.
All stimuli were sampled at 44.1 kHz, Bit Rate 1411 kbps and
Sample size 16 bit. The eight words were finally extracted from the
recording, and each stressed vowel modified for duration and/or F0
using Praat Software (http://www.fon.hum.uva.nl/praat). The vowel
was shortened by 30 ms (D1) at the first stage and then shortened
again by another 30 ms (D2 = -60ms). Three modifications were made
to the pitch: F0 was levelled to its maximum (HP), to its minimum
(LP) and inverted (IP). All stimuli were then normalized at the
beginning and at the end of the modification to avoid pitch smearing.
Figure 2 shows, as an example, the spectrogram of the original Pisan
stimulus ‘dottorato’; Figure 3 shows the same word after the pitch
contour modifications. Table 1 and Table 2 contain the list of the
recorded words with indication of the relevant acoustic parameters,
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
193
i.e. segment duration and F0 value in the onset, peak (or valley) and
end points of the prominent vowel.2
Figure 2: Spectrogram for the unmodified stimulus [dotːoˈraːθo] ‘doctorate’ with the pitch curve drawn in blue
Figure 3: The three pitch modifications to the word [dotːoˈraːθo] ‘doctorate’
2 For further details about the speech stimuli, we refer the reader to Marotta et al.
[2011].
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
194
Selected words Original duration
Original F0
Bene [ˈbɛːne] ‘good’ 257 ms 233 Hz – 202 Hz – 194 Hz
Dottorato [dotːoˈraːθo] ‘doctorate’ 242 ms 232 Hz – 341 Hz – 335 Hz
Emiliano [emiˈljaːno] ‘Emiliano' PR-N’
189 ms 209 Hz – 245 Hz – 203 Hz
Valerio [vaˈlɛːrjo] ‘Valerio’ PR-N
272 ms 256 Hz – 302 Hz – 283 Hz
Table 1: Stimuli recorder by the Pisan speaker
Selected words Original duration
Original F0
Bene [ˈbɛːne] ‘good’ 146 ms 99 Hz – 95 Hz
– 89 Hz
Cucinare [kuʧiˈnaːre] ‘to cook’ 186 ms 146 Hz – 196 Hz – 153 Hz
Lezione [leˈʦːjɔːne] ‘lesson’ 149 ms 103 Hz – 130 Hz – 105 Hz
Prestigioso [prestiˈʤ:ɔ:so] ‘prestigious’ 174 ms 152 Hz – 195 Hz – 162 Hz
Table 2: Stimuli recorded by the Crotonese speaker
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
195
3.2 Sampling
In total, 60 participants were recruited for the experiment. The
sampling frame was divided between the two variables of musical
training and variety of Italian. The sample was composed by Pisans
(n=20), Crotonians (n=20), and by a control group (n=20) whose
variety of Italian was neither Tuscan nor Calabrian. To be selected for
the experiment, the listeners of the first two groups had to match the
following criteria: 1) they had to have grown up and be born in one of
the two linguistic areas of interest; 2) they had to show native fluency
in his/her variety; 3) they had not lived outside their group region for
more than three consecutive years, and only during their adulthood.
With respect to the last prerequisite, 3 Crotonians and 14 participants
in the control group were alumni of the University of Pisa. Half of the
participants in each group were professional musicians (n=30) with at
least 5 years of training and an average of 4 years of experience in
their primary instrument. Of those, 12 had undergone formal training
in Italian conservatories.
None of the non-musicians had ever had any formal musical
training, training in phonetics or phonology or any other sound-related
skills. The mean age of participants was 28, with most participants
being undergraduate students between 19 and 25 years old (n= 27). A
high school diploma was required as the minimum education
requirement. All participants declared that they had no visual, hearing
or cognitive impairments.
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
196
The total number of participants within the groups is
summarized in Table 3.
N=60 Pisa Crotone Control
Musician 10 10 10
Non-Musician 10 10 10
Table 3: Total number of participants divided into the 6 groups
3.3 Design and Procedure
The experiment was divided into two blocks, each containing a
set of trials of stimuli from the same variety. For each word in the
stimuli set three stimuli were chosen to form the block’s triplets. The
triplets contained all possible pairings of original and modified stimuli
of the same word and of the same modification group, and a third
stimuli identical to the first, the second or to both in the case where
the first two were identical. The order of stimuli in the pair did not
matter for the generation of the combinations All the experiments
involving the Pisan and Crotonese groups were conducted in a calm
and silent environment in the Laboratory of Phonetics at Pisa
University. To record the input a standard Italian keyboard was used
where a coloured label was applied to each of the keys corresponding
to a possible response. The stimuli were delivered using a high
definition headphone MDR-XD20. The experiment ran on a p4 dual-
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
197
core equipped with a Realtek high definition audio card; a Samsung
R522 was used for some participants of the Crotonese group (N= 17).
Between December 2010 and March 2011 each person who met
the study inclusion criteria was called in to undergo the experiment.
In order to optimize the environmental conditions, for the comfort of
the participant and to avoid any external conditioning factors (peer
pressure, background noise, operator disposal, etc.) the participants
were tested one at a time, assured confidentiality and given the
opportunity to decline to participate in the study. The purpose of the
study was stated only after the experiment had concluded.
All operators had to follow the following standardized
procedure. First, all sensitive information was collected by the
operator, who filled in a sociolinguistic form; the participant was then
given the following instructions:
"You will hear a series of three words. You will have to press the
labeled key '1' if you recognize that the third word is identical to the
first, '2' if identical to the second and 'Don't know' if you can't hear
any difference. You can skip a word if you have trouble hearing the
stimulus because of external distractions such as noise, or a
temporary lack of attention."
A five-stimuli trial session was then delivered to familiarize the
participants with the task. If the participant did not feel comfortable
enough or did not understand the task, the trial session was run again
(this happened only for p=13). The stimuli were grouped into two
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
198
consecutive blocks, one including the Pisan stimuli and the other the
Crotonese. Between the two blocks, the participants were allowed a
short pause. The order of the blocks as well as the order of the stimuli
within each block was randomized in each session. The procedure for
the stimuli delivery was ABX, with a 1 second interval between each
stimulus, and a 2 second time lapse to answer. The stimuli could only
be listened to once. The experiment lasted about 30 minutes. The
delivery of the stimuli, the recording of the responses as well as part
of the data analysis was carried out using software Presentation®
(Version 14.7, www.neurobs.com).
4 Results
Data were analyzed using Neurobs Analyzer® (Version 14.7)
and R (Version 2.14.1). The result of a t-test among the different
groups was computed on the number of correct responses for each of
the groups divided by stimulus modification, stimulus source and
speaker location and then used to investigate the correlation among
speaker and listener groups. The mean values of correct response for
all groups is given in Table 4.
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
199
Stimuli Listeners
mod.
Source Pisan
m-Pisan
Crotonese
m-Crotonese
control
m-control
HP Pisa 0.55 1 0.83 0.95 0.6 0.9
Crotone
0.67 0.93 0.67 0.97 0.6 0.93
LP Pisa 0.6 0.8 0.82 0.78 0.57 0.88
Crotone
0.41 0.69 0.55 0.66 0.5 0.73
IP Pisa 0.71 0.91 0.78 0.91 0.6 0.94
Crotone
0.13 0.6 0.4 0.6 0.24 0.71
D1 Pisa 0.41 0.56 0.33 0.55 0.42 0.51
Crotone
0.52 0.81 0.59 0.75 0.5 0.7
D2 Pisa 0.58 0.73 0.39 0.65 0.4 0.58
Crotone
0.58 0.68 0.56 0.72 0.47 0.73
Table 4: Mean values of correct responses for modification, stimulus source and speaker location. The first two columns indicate the stimulus modification and location, the other columns the location of the participants. The m-prefix on participant location row indicates that the group is composed of musician
Figure 4 shows the average numbers of correct responses by all
speakers divided into musicians and non-musicians.
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
200
Figure 4: Mean values of correct responses for musicians and non-musicians. The y-axis represents the number of subjects who answered correctly, while the x-axis shows the number of correct responses
As repeatedly reported in the literature (see § 2.3), musicians
uniformly perform better than non-musicians in both the pitch
(p<0.001) and duration discrimination task (p<0.001; see Figure 5).
By further splitting the data, though, we unexpectedly observed an
over-recognition effect on stimulus identification when the stimuli
were identical (see Figure 6). If musicians do better than non-
musicians in all tasks, they seem to fail to recognize when stimulus A
is the same as stimulus B. The difference is statistically significant
(p<0.001).
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
201
Figure 5: Percentage of incorrect responses for pitch and duration discrimination tasks, divided for musical training
Figure 6: Percentage of incorrect responses when the first stimulus is the same as the second (A=B) and when it is different (A≠B)
Central to the design of the experiment is the variable
‘location’, i.e. the relevance of the native variety of the listeners. The
impact of the source of the stimulus in association with the subject
location in recognizing prosodic differences was analyzed among the
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
202
different groups. The mean values and their standard deviations are
shown in Table 5. The first effect observed concerns the
discrimination of pitch variations. All three groups of speakers
(Pisans, Crotonians and controls) recognized pitch differences
occurring in Pisan stimuli significantly better than those occurring in
Crotonese stimuli (p<0.001). A similar but specular correlation was
found for duration differences: the Crotonese stimuli were better
recognized than their Pisan counterparts (p=0.008). The different t-
values could be due to the fact that in conveying prominence the
relevance of duration in Crotonese is less evident than that of pitch in
Pisan.
However, a direct comparison between pitch and duration is
obviously not possible: the two parameters as well as their measures
are intrinsically different (cf. Jones & Munhall, 2000; 2005).
Therefore, the two variables can be only indirectly compared with
reference to the geographical origin of the three groups of listeners.
The weight of the prosodic features in a phonetic production varies as
a continuum where different factors play a role and the impact of
pitch, duration and intensity on the indication of prominence may vary
from language to language. In our case, duration in the Crotonese
stimuli appears to be less overwhelming than pitch in Pisan.
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
203
stimulus location (N=30) Pitch Duration
Pisa 136.4 (SD 18.35) 94. (SD 5.01)
Crotone 62.2 (SD 9.78) 105. (SD 6.51)
t-value < 0.001 0.008
Table 5: Correct responses for pitch and duration recognition tasks in Pisan and Crotonese stimuli
No correlation was found when the subject location was the
variable at stake. Speakers of a particular variety do not perform better
in recognizing a particular stimulus modification (pitch, p=0.091;
duration, p=0.062; see Table 6). Similarly, Pisan and Crotonese
speakers are not better at recognizing stimuli from their same varieties
and the controls exhibit no preference for a particular location: no
correlation could be found when considering speaker location
(p=0,303; Table 7).3
3 The high mean recorded for Pisan stimuli is probably due to the fact that pitch modified stimuli were 1/3 more numerous than duration stimuli coupled with the observation that pitch recognition is easier within the Pisan variety (see Table 5).
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
204
Listener location (N=20) Pitch Duration
Pisans 44. (SD 8.62) 35. (SD 5.74)
Crotonians 48.8 (SD 4.21) 29.8 (SD 8.26)
Controls 43.6 (SD 11.30) 29.6 (SD 9.54)
p-value 0.0911 0.0617
Table 6: Mean values of correct responses for pitch and duration tasks according to listener location
Listener location (N=20) Pisa Crotone
Pisans 74. (SD 16.97) 49.6 (SD 14.02)
Crotonians 71.2 (SD 13.33) 54.4 (SD 9.41)
Control 68.4 (SD 19.79) 48.6 (SD 13.71)
p-value 0.581 0.303
Table 7: Mean values of correct responses for Pisan and Crotonese stimuli in all listener groups
Therefore, in our opinion the most original result of our analysis
concerns the fact that listeners do not recognize the stimuli relative to
their own language variety better than the others.
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
205
5 Discussion
The results of the ABX discrimination test indicate that the
native variety of the speakers/listeners does not influence the
recognition of prosodic modifications to speech stimuli. No significant
difference was found in Pisan and Crotonese groups when listening to
their variety-specific prominent prosodic features. Pisan listeners did
not perform significantly better than Crotonians at recognizing pitch
variation (see Figure 7) and Crotonians did not discriminate duration
differences better than Pisans (see Figure 8). Similar percentages of
error were also found for the control subjects. The result is also
confirmed by the fact that Pisan and Crotonese listeners do not
discriminate between the prosodic differences in the stimuli relative to
their own variety better than the stimuli from the other variety (Table
4, §4.2).
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
206
Figure 7: Boxplot of correct responses for pitch modified stimuli
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
207
Figure 8: Boxplot of correct responses for duration modified stimuli
On the other hand, stimulus origin seems to be the variable
responsible for the variation found in perception. All participants
uniformly performed better in recognizing pitch variation in Pisan
stimuli (see Figure 9) and duration in Crotonese (see Figure 10), no
matter what their native language was (Figure 11). Intrinsic phonetic
cues used to mark prominence then are not only evident acoustically,
but are also more easily perceived by all listeners. The primary
prominence features are different in the two specific varieties of
Italian here considered: duration for Crotonese and pitch for Pisan [cf.
Marotta et al. 2011]. However, they are recognized with the same
degree of accuracy no matter what the native variety of the listener is.
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
208
This suggests that the distribution of language specific prosodic
features observed in production is not always mirrored by an
equivalent and symmetrical behavior in perception.
Figure 9: Incorrect responses by all listeners for Pisan and Crotonese stimuli modified only in F0
Figure 10: Incorrect responses by all listeners for Pisan and Crotonese stimuli modified only in duration
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
209
Figure 11: Incorrect responses by all listeners for Pisan and Crotonese stimuli modified for duration and F0
Our ABX discrimination experiment adds new evidence to the
controversial debate on the differences in perception among musicians
and non-musicians. The analysis of the responses obtained where the
first stimulus had to be recognized as different from the second
confirmed and reproduced the results previously obtained by
experiments using the same setting. Musicians scored a considerably
higher number of correct responses both in duration and pitch
recognition tasks (Figure 5). The experiment provides fresh data when
the variable A=B is considered, that is when the two stimuli have to be
recognized as being identical. Unexpectedly, the analysis of the data
revealed that in this task the percentage of correct answers is switched
between the groups. Musicians scored worse than subjects with no
explicit sound knowledge in a proportion that is extremely statistically
significant (Figure 6).
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
210
6 Conclusion
In previous sections of the article (cf. § 2.4 and passim) we
observed that Pisan and Crotonese speakers consistently differ in the
production of non-distinctive prosodic features: pitch and vowel
length have a different weight in marking prominent vowels
depending on the language variety. This datum could suggest that a
difference in production is reflected in a difference in perception.
Previous studies showed that prosodic features, when distinctive in a
language, can improve the perception accuracy of the corresponding
acoustic correlates in recognition tasks. Speakers of tonal languages
are better at recognizing pitch differences [Stevens et al., 2004], and if
vowel duration is distinctive in a language, the speakers show an
improved ability to recognize small differences in vowel length. The
stimuli used in this experiment differ from the aforementioned studies
in two aspects: first, instead of sampling speakers and stimuli of two
languages, two varieties of the same language were used, i.e. Pisan
and Crotonese; second, duration and pitch have no distinctive status in
both the Italian varieties considered.
The results obtained from the ABX discrimination test do not
confirm any perceptual impact of the native linguistic variety in
judging stimuli manipulated for duration and F0: the subject’s
behavior is no different or better when listening to his own variety. At
the same time, listeners appear to be sensitive to both the prosodic
parameters taken into account. In particular, the sensitivity is driven
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
211
by the feature which is specific of the two varieties considered: pitch
for Pisan and length for Crotonese. The native variety of the speakers
does not influence the discrimination between prosodic variations in
speech stimuli, because no significant difference was found in the
ability of the two groups of listeners (Pisans and Crotonians) in
recognizing their variety-specific prosodic feature of prominence:
Pisans did not perform significantly better than Crotonians in
recognizing pitch variation, and Crotonians did not discriminate
duration differences better than Pisans.
On the other hand, the results of our experiment suggest that
non-distinctive prosodic parameters, like F0 and duration in Italian
varieties, though used as features of prominence in production, do not
systematically affect the perception of listeners in a symmetric way.
Finally, the results of our perceptive experiment confirm the
relevance of the variable ‘musical training’: alongside previous studies
(Rauscher & Hinton [2003], Schon et al. [2004], Pape &
Mooshammer [2006], Pape [2008] among others), subjects with good
musical competence perform better in prosodic perception tasks.
However, the interpretation of these results is not one-sided. Is the
better performance of musicians directly derived from their
competence in musical code or is it vicariously dependent on a
simpler access to the mastering of the discrimination task? As a matter
of fact, the subjects listened to speech stimuli, not music or psycho-
acustic stimuli. Their explicit knowledge of music may not to be
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
212
related to any special linguistic skill, and in fact may even influence
the perception negatively. In the case of the same stimuli (A = B), our
listeners with a musical education showed a higher percentage of
errors compared to listeners without any musical training, maybe
because of their oversensitivity to pitch changes.
Our data appear to support the results obtained by Schön et al.
[2004], who observed that the scalp negativity measured during a
similar recognition task was different for musicians (temporal sites
bilaterally) and for non-musicians (centrally, left temporal sites),
suggesting that there is no improvement in the specific abilities
investigated in the experiments and that the two groups simply used
different strategies, not directly comparable. Future research will shed
more light on the interaction between competence in music and
prosodic perception.
GIOVANNA MAROTTA, LUCA IACOPONI, ALICE IDONE
DEPARTMENT OF LINGUISTICS, UNIVERSITY OF PISA
7 References
Bidelman, G. M.; Gandour, J. T.; Krishnan, A.: Musicians demonstrate
experience-dependent brainstem enhancement of musical scale
features within continuously gliding pitch. Neuroscience
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
213
Letters 503(3): 203-207 (2011).
Bertinetto, P. M.; Loporcaro, M.: The sound pattern of Standard
Italian, as compared with the varieties spoken in Florence,
Milan and Rome. Journal of the International Phonetic
Association 35:131-151 (2005)
Calamai, S.: Il vocalismo tonico dell’area pisana e livornese. Aspetti
storici, percettivi e acustici, (Edizioni dell’Orso, Alessandria
2004).
Calamai, S.; Ricci, I.: Sulla percezione dei confini vocalici in Toscana:
primi risultati. In Cosi, P. (eds.), Atti del I Convegno Nazionale
AISV (EDK Editore, Torriana 2005a).
Calamai, S.; Ricci, I.: Un esperimento di matched-guise in Toscana.
Studi Linguistici e Filologici Online 3.1, pp. 63-105 (2005b)
(www.humnet.unipi.it/slifo.htlm).
Deutsch, D.; Henthorn, T.; Dolson, M.: Absolute Pitch, Speech, and
Tone Language: Some Experiments and a Proposed
Framework. Music Percept 21: 339-356 (2004).
D'Imperio, M.; Rosenthall, S.: Phonetics and Phonology of Main
Stress in Italian. Phonology 16(1): 1-27 (1999).
Fanciullo, F.: Fra Oriente e Occidente. Per una storia linguistica
dell'Italia meridionale (ETS, Pisa 1997).
Galloway, R.E.: Should rhythm metrics take account of fundamental
frequency? Poster presented at the Workshop on Empirical
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
214
Approaches to Speech Rhythm (EASR08), 28th March 2008,
(University College of London 2008).
Grammont, M.: Traité de phonétique (Delagrave, Paris 1933).
Gussenhoven, C.: Perceived vowel duration. In H. Quené & V. van
Heuven (eds.), On Speech and Language: Studies for Sieb G.
Nooteboom, LOT. 65-71 (Utrecht 2004).
Jones, A.J.; Munhall, K.G.: Perceptual calibration of F0 production:
evidence from feedback perturbation. Journal of the Acoustical
Society of America 108:1246-1251 (2000).
Jones, A.J.; Munhall, K.G.: Remapping Auditory-Motor
Representations in Voice Production, Current Biology
15:1768-1772 (2005).
Kohler, K.J.: The Perception of Prominence Patterns. Phonetica 65:
257-269 (2008).
Lehnert–Le Houillier, H.: The influence of dynamic F0 on the
perception of vowel duration: cross linguistic evidence. In
Proceedings of the 16th International Congress of Phonetic
Sciences: 757 – 760 (Saarland University, Saarbrücken 2007).
Loporcaro, Michele: Profilo linguistico dei dialetti italiani (Laterza,
Roma Bari 2009).
Marotta, G.: Modelli e misure ritmiche. La durata vocalica in italiano
(Zanichelli, Bologna 1985).
Marotta, G.: Degenerate Feet nella fonologia metrica dell'italiano. In
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
215
P. Benincà, A. Mioni & L. Vanelli (eds.), Fonologia e
morfologia dell'italiano e dei dialetti d'Italia. Atti del XXXI
Congresso S.L.I.: 97-116 (Bulzoni, Roma 1999).
Marotta, G.: Non solo spiranti. La ‘gorgia toscana’ nel parlato di Pisa.
L’Italia Dialettale 62: 27-60 (2001).
Marotta, G.: Lenition in Tuscan Italian (gorgia toscana). In J. Brandao
de Carvalho, T. Scheer e Ph. Ségéral (eds.), Lenition and
Fortition: 235-272 (Mouton-de Gruyter, Berlin 2008).
Marotta, G.; Calamai, S.; Sardelli, E.: Non di sola lunghezza. La
modulazione di F0 come indice socio-fonetico. In A. De
Dominicis, L. Mori, M. Stefani (eds.), Costituzione, gestione e
restauro di corpora vocali. Atti delle XIV Giornate del GFS:
210-215 (Esagrafica, Roma 2004).
Marotta, G.; Molino, A.; Bertini, C.: Lunghezza o frequenza: quale
parametro per la prominenza?. In B. Gili Fivela, A. Stella, L.
Garrapa, M. Grimaldi (eds.), Contesto comunicativo e
variabilità nella produzione e percezione della lingua. Atti del
VII Convegno AISV: 31-42 (Bulzoni, Roma 2011).
Marotta, G.; Sardelli E.: Sulla prosodia della domanda con soggetto
postverbale in due varietà di italiano toscano. In P. Cosi, E.
Magno Caldognetto, A. Zamboni (eds.), Studi di fonetica in
ricordo di F. Ferrero: 205-212 (Unipress, Padova 2003).
Marotta, G.; Sardelli, E.: Prosodic parameters for the detection of
regional varieties in Italian. In Proceedings of the 16th
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
216
International Congress of Phonetic Sciences: 682-704
(Saarland University, Saarbrücken 2007).
Marotta, G.; Sardelli E.: Prosodiatopia: parametri prosodici per un
modello di riconoscimento diatopico. In G. Ferrari, R. Benatti
& M. Mosca (eds.), Linguistica e modelli tecnologici di
ricerca, Atti del XL Congresso SLI: 411-436 (Bulzoni, Roma
2009).
Mendicino, A.; Romito, L.: Isocronia e base di articolazione: uno
studio su alcune varietà meridionali. Quaderni del
Dipartimento di Linguistica, Università della Calabria, Serie
Linguistica 3: 49 – 67 (1991).
Niebuhr, O.: F0 – based rhythm effects on the perception of local
syllable prominence. Phonetica 66: 95 – 112 (2009).
Pape, D.: The native language influence on perceptual Intrinsic Pitch:
Cross-linguistic data from German, Italian, Portuguese, and
Spanish. In Proceedings of the 4th Conference on Speech
Prosody, pp. 743-746 (Campinas, Brazil 2008).
Pape, D.; Mooshammer, C.: Is Intrinsic pitch language-dependent?
Evidence from a cross-linguistic vowel pitch perception
experiment: In Proceedings of the ISCA International
Workshop on Multilinguistic MULTILING (Stellenbosch,
South Africa 2006).
Pellegrini, G. B.: La Carta dei Dialetti d’Italia. (Pacini editore, Pisa
1977).
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
217
Pisoni, D.; Remerez, R. E.: The Handbook of Speech Perception.
(Blackwell, Oxford 2005).
Rauscher, F.H.; Hinton, S.C.: Type of music training selectively
influences perceptual processing. In Proceedings of the
European Society for the Cognitive Sciences of Music
(Hannover, Germany 2003).
Rietveld, A. C. M.; Gussenhoven, C.: On the relation between speech
excursion size and pitch prominence. Journal of Phonetics 13:
299 – 308 (1985).
Romito, L.: Cenni sui correlati elettroacustici dell’accento in alcune
varietà di italiano. In Atti delle IV Giornate di Studio del GFS,
pp. 107–119 (1993).
Romito, L.; Trumper, J.: Un problema della coarticolazione:
l’isocronia rivisitata. In Atti del XVII Convegno
dell’Associazione Italiana di Acustica, pp. 449 – 455 (1989).
Romito, L.; Turano, T.; Loporcaro, M.; Mendicino, A.: Micro e
Macrofenomeni di centralizzazione nella variazione diafasica:
rilevanza dei dati fonetico-acustici per il quadro dialettologico
del calabrese". Atti del convegno "VII Giornate di Studio del
Gruppo di Fonetica Sperimentale (G.F.S.)", Napoli, novembre
1996, pp.157 – 175 (1997).
Rosen, S. M.: The Effect of Fundamental Frequency patterns on
perceived duration. Speech Transmission Laboratory Quarterly
Progress and Status Report 1:17 – 30 (1977a).
Studi Linguistici e Filologici Online 10 Dipartimento di Linguistica–Università di Pisa
www.humnet.unipi.it/slifo
218
Rosen, S. M.: Fundamental frequency patterns and the long-short
vowel distinction in Swedish. Speech Transmission Laboratory
Quarterly Progress and Status Report 1: 31-37 (1977b).
Schön, D.; Magne, M.; Besson, M.: The music of speech: Music
training facilitates pitch processing in both music and
language. Psychophysiology 41 (3): 341-349 (2004).
Schwanhäußer, B.; Burnham, D.: Lexical Tone and Pitch Perception in
Tone and Non-Tone Language Speakers. In Proceedings of the
9th European Conference on Speech Communication and
Technology ISCA: 1701–1704 (Bonn, 2005).
Stevens, C.; Keller, P. E.; Tyler, M. D.: Language tonality and its
effect on the perception of contour in short spoken and musical
items. In Proceedings of the 8th International Conference on
Music Perception and Cognition, Evanston, IL, pp. 713-716
(Evanston 2004).
Terken, J.: Fundamental Frequency and Perceived Prominence.
Journal of the Acoustical Society of America 89: 1768 – 1776
(1991).
Van Dommelen, W.: Does dynamic F0 increase perceived duration?
New light on an old issue. Journal of Phonetics 21:367 – 386
(1993).
Yu, A. C. L.: Tonal effects on perceived vowel duration. Laboratory
Phonology 10 (Paris, France 2006).