Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.philol.msu.ru/~otipl/SpeechGroup/publications/domod_96.doc
Дата изменения: Thu Mar 10 15:18:35 2005
Дата индексирования: Sat Dec 22 14:10:45 2007
Кодировка: Windows-1251

O. Krivnova (MSU)

DYNAMIC APPROACH TO RHYTMIZATION AND INTONATION PHRASING

(theoretical and applicational problems)[1]

In accordance with the cognitive aim of modern linguistics the focus of
researchers' attention is now gradually displaced from isolated phrase or
sentence to systemic analysis of connected text, and from the latter to a
man who generates and comprehends such a text. Methodological changes are
accompanied with the revaluation of the linguistical status and functional
role of concrete language means, especially those which are directly
connected with the communicative intention of a speaker and the process of
its realization. Prosodic speech characteristics and their language
correspondence give us a bright example of such revaluation: if 10-15 years
ago one should prove the necessity of including speech prosody into the
field of linguistic analysis, now problems of speech prosodic organization
become predominant not only in phonetics but are also involved in such
branches of linguistics as syntax and semantics. At the same time many
problems of phrase and text prosody remain unsolved, need theoretical
comprehension and experimental research in the frame of integral speech
generation model which must explain in particular the sound patterns of a
speaker activity. The opposite relation is also important: prosodic studies
can lead to more thoughtful and concrete conceptions of speech generation
and understanding.
My report is devoted to the processes of Rhythmization and Intonation
Phrasing in connected speech transmitting complex information content. Both
processes lead to a text division which I will call further "Rhythmo-
Intonational Phrasing" or just prosodic phrasing for shortness. More
concretely, this type of prosody phrasing is the division of a text into
fragments of different size (from a rhythmical period or phonological
phrase = syntagma in Russian Phonetics, up to a paragraph or supraphrase
unit). This division is performed by a speaker with specific sound means on
the base of text semantics and syntax and in accordance with the universal
principles of speech rhythmic organization.
As an illustration let us consider an example from Russian (Fig.1). It
is taken from the book of R. I. Avanesov "Russian literary pronunciation"
(1972, p.382), with the author's transcription which reflects the different
degree of breakness (discontinuity) of speech at prosodic boundaries. As we
see, Avanesov distinguishes 4 degrees of prosodic breakness: |, /, //, ///
in accordance with the increasing degree of breakness.
Capital Russian letters in this example designate syntagmas (or
rhythmic periods).
This Russian sentence illustrates the hierarchical nature of prosodic
phrasing. The idea of the hierarchy is that each unit is made up of some
number of units from the next lower level (Nespor, Vogel 1986). This
example shows also that there can be distinguished at least two basic
layers: rhythmic layer with its basic unit = syntagma and proper intonation
layer with its basic unit = intonation phrase.

* * *

As many phenomena in language and speech, Rhythmo-Intonational Phrasing
can be viewed and analyzed statically and dynamically. Under the static
approach the researchers' attention is concentrated on the the task of
revealing the inventory of prosodic means which create the division and on
the nature of correspondence between prosodic constituents, semantics and
syntactic structure of already "made" utterance (sentence). The static
approach is preserved even in generative phonology where the above
mentioned correspondence is described by a set of special mapping rules
which operate within limits of a whole, already made sentence. Assuming
that there is a good deal of variability in a speaker's choice of prosodic
phrasing some authors working in generative tradition offer special
restructuring rules to derive the variants of phrasing from some initial
prosodic structure which is called basic or neutral prosodic form of the
sentence (Nespor, Vogel 1986). These restructuring rules account of such
relevant phrasing factors as the length of prosodic constituents, presence
of contrastive prominence, speech style, speaking rate and so on.

А Б В
/// Все это водное племя / обладало навыками | долголетних плаваний //

Г Д Ж
в большинстве прошло войну / и самой природой | было словно выделено |

З
для пребывания на судах ///

whole sentence

4


X Y

3 intonation phrase's complexes



X1 X2 Y1
Y2

2 intonation phrases



А Б В Г Д
Ж З

1 syntagmas (rhythmic periods)

/// А / Б | В // Г
/ Д | Ж | З ///

Fig. 1. Hierarchical nature of prosodic phrasing.

The level numbers are in accordance with the degree of breakness at the
prosodic boundaries. 2 ( ) separates rhythmic level from the higher
prosodic levels which are proper intonational. There are 7 syntagmas, 4
intonation phrases and 2 intonation phrase's complexes in this sentence.

We think that the very idea of the static approach to prosodic phrasing
isn't sufficient for its explaining. This phenomenon belongs to such text
events which can be adequately understood only in the framework of
integrated, dynamic model of speech generation. Under the dynamic approach
the focus of investigation is on the on-going nature of prosodic phrasing,
its integration into the whole process of text generation by a speaker.
As many linguists emphasize now, speech communication is a complex
human activity involving both text production and perception, language
competence and performance. In the latter there is used not only language
but also such psychological mechanisms as working memory, attention,
planning and so on. Functioning of these mechanisms and their properties
influence both text generation process and the characteristics of the
utterance which is being constructed.
In spite of obvious difficulties expecting anybody who wants to model
speech behavior (even in its restricted aspects) it is useful to know some
fundamental principles evolving in recent models of speech generation.
Further I'd like to attract your attention to those of them which are of
importance for prosodic phrasing. I will discuss them with the help of the
simplified functional scheme of speech generation process (Fig. 2).


|conceptual and emotive sphere | | |s|
| | | |y|
| | | |s|
|intention and global settings | | |t|
| | | |e|
| | | |m|
|conceptualizator | |selfmonitoring | |
| | | |o|
| | | |f|
| | | | |
|Conceptualized speech message | | |u|
| | | |n|
| | | |d|
| | | |e|
|language sphere | | |r|
| | | |s|
| | | |t|
|verbalizator | | |a|
| | | |n|
| | | |d|
|Grammatical processor | | |i|
| | | |n|
| | |l |g|
|Lexico-syntactical structure | |e | |
|of an utterance | |x |a|
|(grammatical characteristic) | |i |n|
| | |c |d|
| | |o | |
|phonetical processor | |n |p|
| | | |e|
| | | |r|
| | | |c|
|Phonetic characteristic of an | | |e|
|utterance | | |p|
| | | |t|
| | | |i|
|motor sphere | | |o|
| | | |n|
|аrticulator | | |o|
| | | |f|
| | | |s|
|Articulation stream | | |p|
| | | |e|
| | | |e|
|Acoustic signal | | |c|
| | | |h|


Fig. 2. functional scheme of speech generation process

Essential features:
0. Intention + global settings
1. In two dimensions simultaneously:
A. from intention to articulation through the stages of
Conceptualization and Verbalizaion (top-down dimension)
B. from the beginning of an utterance to its end (left-to-right
dimension).
2. Chunking strategy with selfmonitoring
3. In on-going fashion without much looking ahead

0. The initial point in text generation process is a communicative
intention of a speaker. Its rising is accompanied by the choice of some
prosodically relevant global settings: speech style ( formal, didactic,
casual), loudness and speaking rate, degree of expressiveness and so on.
Accounting of global initial settings makes it unnecessary to use any
artificial restructuring rules. Setting parameters on which size and
expressiveness of prosodic phrasing depend on must be determined from the
very beginning of text generation. It is noteworthy that these settings are
flexible, they can be locally changed in the course of speaking (if, for
example, a speaker finds out that a listener fails to hear or understand
him) and besides they may have direct influence on the work of phonetic
processor of the model without touching its grammatical component. Model
accounting of this flexibility is a difficult problem for global settings
completely depend on current speaker's intention.
1. An utterance is constructed in a course of a complex generation
process which is developing in two dimensions simultaneously: in "top-down"
dimension from intention to articulation through the stages of
Conceptualization and Verbalization and in "left to right" dimension from
the beginning of an utterance to its end. Phonetic characteristic of an
utterance is constructed on Verbalization stage by a special phonetic
processor and is viewed as an abstract representation. Information about
prosodic phrasing must be reflected in it with symbolic labels inserted
into it by Rhythmization and Intonational Phrasing Procedures.
As to acoustic correlates of these labels (such as patterns of
fundamental frequency, energy, duration and vowel quality) they are
programmed on the articulation stage on the base of abstract prosodic
patterns.
It may be noted here that the problem of prosodic transcription
addressing the needs of phonetic theory and computational models is now
widely discussed in literature and at the workshops on language processing
technology. Regretfully, the offered prosodic transcriptions are far from
being universal and complete. The most famous system among those worked out
for computer speech processing is TOBI (Tone and Break Indices) for
American English.
2. Speech intention activates all the mechanisms operating on all
stages of generation process. It is supposed that text construction is
based on a chunking strategy in which some text fragments are cognitively
planned and conceptualized as a whole , verbalized and uttered as an
integrated speech acts. Such speech activity chunks result in important
text events which correspond to the completion of constructing definite
verbal fragments. Some of such text events (or may be even each of them as
some researchers suppose) are marked intonationally by means of intonation
markers inserted into the terminal part of the constructed verbal fragment.
These markers function not as linking or segmentation means but as
speech phase devices which signal in on-going fashion the realization of
prosodically relevant text events and their relation to the whole
integrated act of the utterance constructing (finality-nonfinality with
different degrees). At the same the markers function as phrase terminals
which separate already constructed verbal fragments from intended text
continuation (if any). It is worthy to note that text event interpretation
of intonation markers explains why sometimes intonation phrases don't form
any sense or syntactical units. For example:
I think that Peter and Mary // never come to us again.
One more important feature of speech generation with its presupposed
chunking strategy is selfmonitoring. A speaker hears himself and can
currently compare what he was intended to say with what he is really
saying. Selfmonitoring is closely connected with psychological mechanisms
of attention which are not fully understood yet. At the same time it is
clear that selfmonitoring strategies are different and depend on
communicative situation, speech skills of a speaker, his knowledge of the
text topic and so on. Under hypercontrol a speaker can pronounce an
utterance word by word, with each word as a separate intonation phrase.
More usual strategy is the selective one, when a speaker "trusts" in speech
automatisms and controls the current phonetic output only at some linear
points. Studies on speech errors (Levelt 1983) show, for example, that
errors are much better detected and corrected by a speaker at the main
syntactic boundaries (sentence clauses, subject NP and VP). It is well
known also that these points are often (but not obligatory) marked
intonationally and are the most likely candidates for pausing.
The results of experiments on listeners' behavior are also of interest
because speech generation strategies develop in close connection with the
strategies of speech decoding. Special experiments (Krivnova 1987) show
that for a listener intonation phrase markers have important text
orientation function organizing and unifying the listeners' text analysis.
One example of such experiments is the study of listeners' responses made
in simulated telephone conversations (Dittmann, Llewellyn 1967). It was
observed that while a speaker is talking, a listener often makes audible
interjections such as "uh huh, yes, I see, really" and visible gestures
such as head-nods. These reactions signal that the listener is paying
attention and successfully decoding the speaker's utterance. It turned out
that 80% of responses were synchronized with the speaker's pauses occurring
after intonation phrases.
3. Speaking about intonation phrasing the last fact I'd like to mention
is the following. At the very point of constructing process when a speaker
marks intonationally the end of some constructed verbal fragment he knows
only in general what he is going to say further. For a speaker, insertion
of intonation marker takes place "here and now" without the possibility of
much looking ahead into the specific grammatical form of utterance
continuation. For a listener, detection of intonation marker signals the
closure of the current information chunk without the necessity of looking
ahead to verify the closure hypotheses. It is a very important
communicative function of intonation phrasing which allows to avoid
backtracking in speech decoding.
Now about the problems.
Admitting that chunking strategy, selfmonitoring and intention of a
speaker to highlight some text event may be the main motivation factors of
intonation phrasing we are still in need of answering a lot of questions
such as: what kinds of text events can be intonationally marked; what is
the probability of their marking by different speakers in different texts;
what factors decrease or increase this probability; what is the relation
between voluntary and automatic aspects of phrasing; what factors regulate
the choice of intonation markers and so on. It would be wrong to say that
we have no answers to these questions but even about the reading of the
text our knowledge is rather incomplete. First of all we need large data-
bases containing speech corpus from different domains with each utterance
described on all relevant levels of linguistic structure (prosodic,
syntactic, semantic, pragmatic).
Working out of such data-bases requires the joint efforts of many
specialists in different branches of science. These data-bases are also
necessary for such applications as text-to-speech systems. The central
problem here is to create the automatic prosodic transcriber for inserting
abstract prosodic labels into the text to be spoken. Up to now punctuation
marks are the main written cues for localization and choice of intonation
markers. It is obvious that these cues are not enough and more over the
relation between punctuation and intonation markers has its own problems.
What it was said earlier was concerned Intonation Phrasing. By rhythm I
mean the pattern of alternation of metrically strong and weak stressed
syllables which results in forming rhythmical periods - syntagmas. This
process is based on metrical schemes of words as units of Mental Lexicon.
Including the current word into the utterance a speaker has to define its
metrical prominence: if the word is considered to be strong it becomes the
rhythmical center of the period that is a bearer of the so called
syntagmatic stress. Motivation factor of Rhythmization is in the Motor
Sphere of speech generation: it is the necessity to regulate the degree of
muscular effort and articulation control during pronunciation of the
syllable chain.
The main eurhythmical tendency presupposes one or two weak stresses (on
content words) between two strong ones. Rhythmical Procedure based on this
tendency is fulfilled during the construction of the abstract phonetic
characteristic of an utterance and is coordinated with its grammatical
characteristic.
It means that the word which may be the rhythmic center according to
phonetic principles in addition must answer definite syntactical
conditions. Often it is a noun, potential ability of which to become the
rhythmic center increases, when there is no direct syntactical link with
the next word.
Let us consider the following examples from Russian where the
rhythmical characteristic of the initial noun phrase is of our special
interest.
2 2 3
1. В описаниях русской морфологии / обы?но используется орфографи?еская
запись.
2 3 2 3
2. В существующих описаниях русской морфологии / .
2 2(1) 3 2 3
3. а) В существующих ныне описаниях русской морфологии / .
2 3 2 2 3
б) В существующих ныне описаниях русской морфологии / .
2 2 3 2 2
3
4. В существующих в настоящее время описаниях русской морфологии / .

As we can see the above mentioned tendency really exists. However
observations show that the degree of syntactical freedom in rhythmically
relevant choices is rather great and formal description of Rhythmical
Procedure is to be based on statistically sufficient body of data which
unfortunately we don't have nowadays.
So, we have a lot of work in this field in future.

References
1. Avanesov R. I. Russian literary pronunciation. M., 1972 (in Russian).
2. Nespor M., Vogel J. Prosodic phonology. Dordrecht, 1986.
3. Levelt W. Monitoring and selfrepair in speech // Cognition. 1983. V.14,
pp.41-104.
4. Krivnova O. F. Intonational Phrasing and its Role in Speech
Communication // Proc. of the XI-th Int. Congr. of Ph. Sc. Tallinn. 1987.
5. Dittmann A. T., Llewellyn L. G. The phonemic clause as a unit of speech
decoding // Journal of Personality and Social Psychology. 1967. N6, 341-9.

-----------------------
[1] This work has been supported by СEU\RSS fund, grant No:1063\94.