Contents:
This schema describes the SAPI 5.0 TTS XML grammar format. The SAPI TTS XML
schema is included in the TTS XML parser. Hence, it is not necessary to
include the schema in the XML file when authoring a grammar. NOTE: This schema
is based on the Microsoft schema language and is not fully W3C compliant. This
schema will be rewritten and will be compliant with the W3C standard once it
has been approved by the W3C.
This schema describes the following elements and attributes:
Document conventions:
- [] - optional
- []* - zero or more times
- + - one or more times
SAPI Elements
Inserts a bookmark into the input stream using the bookmark element. If an
application specifies interest in bookmark events, it will receive an event
when synthesis has passed this element in an input stream. If the audio
output destination supports handling of events, then an application will
receive this event once the synthesized speech up to this bookmark has been
output. Otherwise, an application receives a bookmark event when the voice
implementation has synthesized speech up to this bookmark.
| syntax: |
<BOOKMARK
/> |
| content: |
empty |
| order: |
many
(default) |
| parents: |
SAPI |
| children: |
(none) |
| attributes: |
MARK |
| model: |
closed |
| source: |
<ElementType name="BOOKMARK" content="empty" model="closed">
<description>Inserts a bookmark into the input stream using the bookmark element. If an application specifies interest in bookmark events, it will receive an event when synthesis has passed this element in an input stream. If the audio output destination supports handling of events, then an application will receive this event once the synthesized speech up to this bookmark has been output. Otherwise, an application receives a bookmark event when the voice implementation has synthesized speech up to this bookmark. </description>
<attribute type="MARK"/>
</ElementType>
|
The context can specify the type of normalization rules which should be
applied to the scoped text. SAPI does not guarantee any predefined contexts.
| syntax: |
<CONTEXT
>
</CONTEXT> |
| content: |
mixed |
| order: |
many
(default) |
| parents: |
SAPI |
| children: |
(none) |
| attributes: |
ID |
| model: |
closed |
| source: |
<ElementType name="CONTEXT" content="mixed" model="closed">
<description>The context can specify the type of normalization rules which should be applied to the scoped text. SAPI does not guarantee any predefined contexts. </description>
<attribute type="ID"/>
</ElementType>
|
Places emphasis on the words contained by this element.
| syntax: |
<EMPH /> |
| content: |
empty |
| order: |
many
(default) |
| parents: |
SAPI |
| children: |
(none) |
| attributes: |
(none) |
| model: |
closed |
| source: |
<ElementType name="EMPH" content="empty" model="closed">
<description>Places emphasis on the words contained by this element. </description>
</ElementType>
|
Changes the LANGID of the scoped text. When the LANGID is changed, SAPI will
try to detect if the current voice can handle the new language. If voice
does not speak the specified language, then an engine must choose another
language it speaks as a best attempt. Using the VOICE tag and REQUIRED
attribute, this fall back path can be prevented if not desirable.
| syntax: |
<LANG
>
</LANG> |
| content: |
mixed |
| order: |
many
(default) |
| parents: |
SAPI |
| children: |
(none) |
| attributes: |
LANGID |
| model: |
closed |
| source: |
<ElementType name="LANG" content="mixed" model="closed">
<description>Changes the LANGID of the scoped text. When the LANGID is changed, SAPI will try to detect if the current voice can handle the new language. If voice does not speak the specified language, then an engine must choose another language it speaks as a best attempt. Using the VOICE tag and REQUIRED attribute, this fall back path can be prevented if not desirable.
</description>
<attribute type="LANGID"/>
</ElementType>
|
The part of speech of contained word(s). The PARTOFSP tag is used to force a
particular pronunciation of a word (for example, the word record as a noun
versus the word record as a verb).
| syntax: |
<PARTOFSP
>
</PARTOFSP> |
| content: |
mixed |
| order: |
many
(default) |
| parents: |
SAPI |
| children: |
(none) |
| attributes: |
PART |
| model: |
closed |
| source: |
<ElementType name="PARTOFSP" content="mixed" model="closed">
<description>The part of speech of contained word(s). The PARTOFSP tag is used to force a particular pronunciation of a word (for example, the word record as a noun versus the word record as a verb). </description>
<attribute type="PART"/>
</ElementType>
|
The scoped/global element PITCH modifies the underlying numerical values of
a speech block. Relative attribute values, those preceded by a dash (-) or a
plus sign (+), increment the underlying numerical value by the specified
amount. SAPI compliant engines have the option of supporting only the
guaranteed range of values and behaving as -10 for adjustments below -10 and
behaving as +10 for values above +10.
| syntax: |
<PITCH
>
</PITCH> |
| content: |
mixed |
| order: |
many
(default) |
| parents: |
SAPI |
| children: |
(none) |
| attributes: |
ABSMIDDLE,
MIDDLE |
| model: |
closed |
| source: |
<ElementType name="PITCH" content="mixed" model="closed">
<description>The scoped/global element PITCH modifies the underlying numerical values of a speech block. Relative attribute values, those preceded by a dash (-) or a plus sign (+), increment the underlying numerical value by the specified amount. SAPI compliant engines have the option of supporting only the guaranteed range of values and behaving as -10 for adjustments below -10 and behaving as +10 for values above +10.</description>
<attribute type="MIDDLE"/>
<attribute type="ABSMIDDLE"/>
</ElementType>
|
Pronounces the contained text (possibly empty) according to the provided
Unicode string.
| syntax: |
<PRON
>
</PRON> |
| content: |
mixed |
| order: |
many
(default) |
| parents: |
SAPI |
| children: |
(none) |
| attributes: |
SYM |
| model: |
open |
| source: |
<ElementType name="PRON" content="mixed" model="open">
<description>Pronounces the contained text (possibly empty) according to the provided Unicode string.
</description>
<attribute type="SYM"/>
</ElementType>
|
Set the relative speed adjustment at which words are synthesized.
| syntax: |
<RATE
>
</RATE> |
| content: |
mixed |
| order: |
many
(default) |
| parents: |
SAPI |
| children: |
(none) |
| attributes: |
ABSSPEED,
SPEED |
| model: |
closed |
| source: |
<ElementType name="RATE" content="mixed" model="closed">
<description>Set the relative speed adjustment at which words are synthesized.</description>
<attribute type="SPEED"/>
<attribute type="ABSSPEED"/>
</ElementType>
|
At the beginning of the SAPI tag, the state of the voice is the same state
as the insertion point of the SAPI tag. At the close of the SAPI tag, the
voice returns to the same state as that of the insertion point. SAPI tags
may be nested. When a nested SAPI tag is closed, the voice state returns to
what it was at the insertion point of the nested tag.
| syntax: |
<SAPI >
</SAPI> |
| content: |
mixed |
| order: |
many
(default) |
| parents: |
No parents found. This is
probably the document element. |
| children: |
BOOKMARK,
CONTEXT, EMPH,
LANG, PARTOFSP,
PITCH, PRON,
RATE, SILENCE,
SPELL, VOICE,
VOLUME |
| attributes: |
(none) |
| model: |
open |
| source: |
<ElementType name="SAPI" content="mixed" model="open">
<description>At the beginning of the SAPI tag, the state of the voice is the same state as the insertion point of the SAPI tag. At the close of the SAPI tag, the voice returns to the same state as that of the insertion point. SAPI tags may be nested. When a nested SAPI tag is closed, the voice state returns to what it was at the insertion point of the nested tag. </description>
<element type="BOOKMARK"/>
<element type="SILENCE"/>
<element type="EMPH">
<description> Place emphasis on the words contained by this element. It is up to the engine implementation to design what emphasis is for the engine. </description>
</element>
<element type="SPELL">
<description>Spell out words letter by letter contained by this element. NOTE: The engine should not normalize the text scoped in the SPELL tag. This includes numbers, words, etc. Words which contain punctuation, such as U.S.A should spell out the letters as well as the punctuation scoped within the tag. </description>
</element>
<element type="PARTOFSP"/>
<element type="PRON">
<description>String representing a phoneme for a language supported by the voice implementing synthesized speech. </description>
</element>
<element type="LANG"/>
<element type="VOICE"/>
<element type="RATE"/>
<element type="VOLUME">
<description>0 to 100 (no overflow allowed)</description>
</element>
<element type="PITCH">
<description>Set the relative pitch adjustment of synthesized speech.</description>
</element>
<element type="CONTEXT"/>
</ElementType>
|
Produces silence for a specified number of milliseconds to the output audio
stream.
| syntax: |
<SILENCE
/> |
| content: |
empty |
| order: |
many
(default) |
| parents: |
SAPI |
| children: |
(none) |
| attributes: |
MSEC |
| model: |
closed |
| source: |
<ElementType name="SILENCE" content="empty" model="closed">
<description>Produces silence for a specified number of milliseconds to the output audio stream. </description>
<attribute type="MSEC"/>
</ElementType>
|
Spells out words letter by letter contained by this element. Note: The
engine should not normalize the text scoped in the SPELL tag. This includes
numbers, words, etc. Words that contain punctuation, such as
"U.S.A." should spell out the letters as well as the punctuation
scoped within the tag.
| syntax: |
<SPELL /> |
| content: |
empty |
| order: |
many
(default) |
| parents: |
SAPI |
| children: |
(none) |
| attributes: |
(none) |
| model: |
closed |
| source: |
<ElementType name="SPELL" content="empty" model="closed">
<description>Spells out words letter by letter contained by this element.
Note: The engine should not normalize the text scoped in the SPELL tag. This includes numbers, words, etc. Words that contain punctuation, such as "U.S.A." should spell out the letters as well as the punctuation scoped within the tag. </description>
</ElementType>
|
Sets which voice implementation is used for synthesis of associated input
stream text. The best voice implementation given the required and optional
attributes will be selected by SAPI.
| syntax: |
<VOICE
>
</VOICE> |
| content: |
mixed |
| order: |
many
(default) |
| parents: |
SAPI |
| children: |
(none) |
| attributes: |
OPTIONAL,
REQUIRED |
| model: |
closed |
| source: |
<ElementType name="VOICE" content="mixed" model="closed">
<description>Sets which voice implementation is used for synthesis of associated input stream text. The best voice implementation given the required and optional attributes will be selected by SAPI. </description>
<attribute type="REQUIRED"/>
<attribute type="OPTIONAL"/>
</ElementType>
|
The scoped/global elements VOLUME modify the underlying numerical values of
a speech block. The underlying value can never be below zero or exceed 100.
All negative value entries will result in zero and all values above 100 will
result in 100. VOLUME may also receive an absolute value (no '-' or '+'
character) of an integer between zero and 100.
| syntax: |
<VOLUME
>
</VOLUME> |
| content: |
mixed |
| order: |
many
(default) |
| parents: |
SAPI |
| children: |
(none) |
| attributes: |
LEVEL |
| model: |
closed |
| source: |
<ElementType name="VOLUME" content="mixed" model="closed">
<description>The scoped/global elements VOLUME modify the underlying numerical values of a speech block. The underlying value can never be below zero or exceed 100. All negative value entries will result in zero and all values above 100 will result in 100. VOLUME may also receive an absolute value (no '-' or '+' character) of an integer between zero and 100. </description>
<attribute type="LEVEL"/>
</ElementType>
|
SAPI Attributes
The value can range from 10 to +10. A value of 0 sets a voice to speak at
its default pitch. A value of 10 sets a voice to speak at three-fourths
(or Ύ) of its default pitch. A value of +10 sets a voice to speak at
four-thirds (or 4/3) of its default pitch. Each increment between 10 and
+10 is logarithmically distributed such that incrementing/decrementing by 1
is multiplying/dividing the pitch by the 24th root of 2 (about 1.03). Values
more extreme than 10 and 10 will be passed to an engine but SAPI
5compliant engines may not support such extremes and instead may clip the
pitch to the maximum or minimum pitch it supports. Values of 24 and +24
must lower and raise pitch by 1 octave respectively. All
incrementing/decrementing by 1 must multiply/divide the pitch by the 24th
root of 2. When scoped, this attribute is absolute.
| syntax: |
[ ABSMIDDLE = int
] |
| required: |
no (default) |
| datatype: |
int |
| elements: |
PITCH |
| source: |
<AttributeType name="ABSMIDDLE" dt:type="int">
<description> The value can range from 10 to +10. A value of 0 sets a voice to speak at its default pitch. A value of 10 sets a voice to speak at three-fourths (or Ύ) of its default pitch. A value of +10 sets a voice to speak at four-thirds (or 4/3) of its default pitch. Each increment between 10 and +10 is logarithmically distributed such that incrementing/decrementing by 1 is multiplying/dividing the pitch by the 24th root of 2 (about 1.03). Values more extreme than 10 and 10 will be passed to an engine but SAPI 5compliant engines may not support such extremes and instead may clip the pitch to the maximum or minimum pitch it supports. Values of 24 and +24 must lower and raise pitch by 1 octave respectively. All incrementing/decrementing by 1 must multiply/divide the pitch by the 24th root of 2. When scoped, this attribute is absolute.</description>
</AttributeType>
|
The value can range from 10 to +10. A value of 0 sets a voice to speak at
its default rate. A value of 10 sets a voice to speak at one-third (or
1/3) of its default rate. A value of +10 sets a voice to speak at 3 times
its default rate. Each increment between 10 and +10 is logarithmically
distributed such that incrementing/decrementing by 1 is multiplying/dividing
the rate by the 10th root of 3 (about 1.12). Values more extreme than 10
and +10 will be passed to an engine, but SAPI 5compliant engines may not
support such extremes and instead may clip the rate to the maximum or
minimum rate it supports. When scoped, this attribute is absolute.
| syntax: |
[ ABSSPEED = int
] |
| required: |
no (default) |
| datatype: |
int |
| elements: |
RATE |
| source: |
<AttributeType name="ABSSPEED" dt:type="int">
<description>The value can range from 10 to +10. A value of 0 sets a voice to speak at its default rate. A value of 10 sets a voice to speak at one-third (or 1/3) of its default rate. A value of +10 sets a voice to speak at 3 times its default rate. Each increment between 10 and +10 is logarithmically distributed such that incrementing/decrementing by 1 is multiplying/dividing the rate by the 10th root of 3 (about 1.12). Values more extreme than 10 and +10 will be passed to an engine, but SAPI 5compliant engines may not support such extremes and instead may clip the rate to the maximum or minimum rate it supports. When scoped, this attribute is absolute.</description>
</AttributeType>
|
This specifies the type of context. Refer to the SAPI documentation for the
vairous context ids.
| syntax: |
ID = string |
| required: |
yes |
| datatype: |
string |
| elements: |
CONTEXT |
| source: |
<AttributeType name="ID" dt:type="string" required="yes">
<description>This specifies the type of context. Refer to the SAPI documentation for the vairous context ids.</description>
</AttributeType>
|
Language identifier. The language identifier is specified as a hexadecimal
value. For example, the LANGID for English (US) expressed in the hexadecimal
form is 409.
| syntax: |
LANGID = int |
| required: |
yes |
| datatype: |
int |
| elements: |
LANG |
| source: |
<AttributeType name="LANGID" dt:type="int" required="yes">
<description>Language identifier. The language identifier is specified as a hexadecimal value. For example, the LANGID for English (US) expressed in the hexadecimal form is 409. </description>
</AttributeType>
|
This specifies the volume as percent of the maximum volume of the current
voice. Each voice implementation has its own maximum volume. This value
must between 0 and 100 inclusive. Values above 100 or below 0 are clipped to
100 and 0 respectively.
| syntax: |
LEVEL = int |
| required: |
yes |
| datatype: |
int |
| elements: |
VOLUME |
| source: |
<AttributeType name="LEVEL" dt:type="int" required="yes">
<description> This specifies the volume as percent of the maximum volume of the current voice. Each voice implementation has its own maximum volume. This value must between 0 and 100 inclusive. Values above 100 or below 0 are clipped to 100 and 0 respectively.</description>
</AttributeType>
|
The value of a bookmark may be any string or integer.
| syntax: |
MARK = int |
| required: |
yes |
| datatype: |
int |
| elements: |
BOOKMARK |
| source: |
<AttributeType name="MARK" dt:type="int" required="yes">
<description>The value of a bookmark may be any string or integer. </description>
</AttributeType>
|
The value can range from 10 to +10. A value of 0 sets a voice to speak at
its default pitch. A value of 10 sets a voice to speak at three-fourths
(or Ύ) of its default pitch. A value of +10 sets a voice to speak at
four-thirds (or 4/3) of its default pitch. Each increment between 10 and
+10 is logarithmically distributed such that incrementing/decrementing by 1
is multiplying/dividing the pitch by the 24th root of 2 (about 1.03). Values
more extreme than 10 and 10 will be passed to an engine but SAPI
5compliant engines may not support such extremes and instead may clip the
pitch to the maximum or minimum pitch it supports. Values of 24 and +24
must lower and raise pitch by 1 octave respectively. All
incrementing/decrementing by 1 must multiply/divide the pitch by the 24th
root of 2. When scoped, this attribute is relative.
| syntax: |
MIDDLE = int |
| required: |
yes |
| datatype: |
int |
| elements: |
PITCH |
| source: |
<AttributeType name="MIDDLE" dt:type="int" required="yes">
<description>The value can range from 10 to +10. A value of 0 sets a voice to speak at its default pitch. A value of 10 sets a voice to speak at three-fourths (or Ύ) of its default pitch. A value of +10 sets a voice to speak at four-thirds (or 4/3) of its default pitch. Each increment between 10 and +10 is logarithmically distributed such that incrementing/decrementing by 1 is multiplying/dividing the pitch by the 24th root of 2 (about 1.03). Values more extreme than 10 and 10 will be passed to an engine but SAPI 5compliant engines may not support such extremes and instead may clip the pitch to the maximum or minimum pitch it supports. Values of 24 and +24 must lower and raise pitch by 1 octave respectively. All incrementing/decrementing by 1 must multiply/divide the pitch by the 24th root of 2. When scoped, this attribute is relative.</description>
</AttributeType>
|
Number of milliseconds, from zero to 65535, of silence. Value entries that
exceed this range should be limited to 65535. Value entries that are below
this range (negative values) should be set to zero.
| syntax: |
MSEC = int |
| required: |
yes |
| datatype: |
int |
| elements: |
SILENCE |
| source: |
<AttributeType name="MSEC" dt:type="int" required="yes">
<description>Number of milliseconds, from zero to 65535, of silence. Value entries that exceed this range should be limited to 65535. Value entries that are below this range (negative values) should be set to zero. </description>
</AttributeType>
|
The XML parser selects the first voice registered containing all of the
specified attributes. A string that contains semicolon-delimited sub-strings
is used to specify the attributes. The speak call will fail if the parser
cannot find the required tags.
| syntax: |
[ OPTIONAL = string
] |
| required: |
no (default) |
| datatype: |
string |
| elements: |
VOICE |
| source: |
<AttributeType name="OPTIONAL" dt:type="string">
<description>The XML parser selects the first voice registered containing all of the specified attributes. A string that contains semicolon-delimited sub-strings is used to specify the attributes. The speak call will fail if the parser cannot find the required tags.
</description>
</AttributeType>
|
String name of part of speech. Valid SAPI parts of speech arenoun, verb,
modifier, function, interjection and unknown.
| syntax: |
PART = enumeration:
noun|verb|modifier|function|interjection|unknown |
| required: |
yes |
| datatype: |
enumeration |
| values: |
noun|verb|modifier|function|interjection|unknown |
| elements: |
PARTOFSP |
| source: |
<AttributeType name="PART" dt:type="enumeration" dt:values="noun|verb|modifier|function|interjection|unknown" required="yes">
<description> String name of part of speech. Valid SAPI parts of speech arenoun, verb, modifier, function, interjection and unknown. </description>
</AttributeType>
|
The XML parser selects the first voice registered containing all of the
specified attributes. A string that contains semicolon-delimited sub-strings
is used to specify the attributes. The speak call will fail if the parser
cannot find the required tags.
| syntax: |
[ REQUIRED = string
] |
| required: |
no (default) |
| datatype: |
string |
| elements: |
VOICE |
| source: |
<AttributeType name="REQUIRED" dt:type="string">
<description>The XML parser selects the first voice registered containing all of the specified attributes. A string that contains semicolon-delimited sub-strings is used to specify the attributes. The speak call will fail if the parser cannot find the required tags.
</description>
</AttributeType>
|
The value can range from 10 to +10. A value of 0 sets a voice to speak at
its default rate. A value of 10 sets a voice to speak at one-third (or
1/3) of its default rate. A value of +10 sets a voice to speak at 3 times
its default rate. Each increment between 10 and +10 is logarithmically
distributed such that incrementing/decrementing by 1 is multiplying/dividing
the rate by the 10th root of 3 (about 1.12). Values more extreme than 10
and +10 will be passed to an engine, but SAPI 5compliant engines may not
support such extremes and instead may clip the rate to the maximum or
minimum rate it supports. When scoped, this attribute is relative.
| syntax: |
[ SPEED = int
] |
| required: |
no (default) |
| datatype: |
int |
| elements: |
RATE |
| source: |
<AttributeType name="SPEED" dt:type="int">
<description>The value can range from 10 to +10. A value of 0 sets a voice to speak at its default rate. A value of 10 sets a voice to speak at one-third (or 1/3) of its default rate. A value of +10 sets a voice to speak at 3 times its default rate. Each increment between 10 and +10 is logarithmically distributed such that incrementing/decrementing by 1 is multiplying/dividing the rate by the 10th root of 3 (about 1.12). Values more extreme than 10 and +10 will be passed to an engine, but SAPI 5compliant engines may not support such extremes and instead may clip the rate to the maximum or minimum rate it supports. When scoped, this attribute is relative.</description>
</AttributeType>
|
String representing a phoneme for a language supported by the voice
implementing synthesizing speech. Refer to SAPI Phoneme Spec.
| syntax: |
SYM = char |
| required: |
yes |
| datatype: |
char |
| elements: |
PRON |
| source: |
<AttributeType name="SYM" dt:type="char" required="yes">
<description>String representing a phoneme for a language supported by the voice implementing synthesizing speech. Refer to SAPI Phoneme Spec.</description>
</AttributeType>
|
SAPI Source
<Schema name="SAPI" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes">
<description> This schema describes the SAPI 5.0 TTS XML grammar format. The SAPI TTS XML schema is included in the TTS XML parser. Hence, it is not necessary to include the schema in the XML file when authoring a grammar. NOTE: This schema is based on the Microsoft schema language and is not fully W3C compliant. This schema will be rewritten and will be compliant with the W3C standard once it has been approved by the W3C.</description>
<!-- Attribute definitions -->
<AttributeType name="ID" dt:type="string" required="yes">
<description>This specifies the type of context. Refer to the SAPI documentation for the vairous context ids.</description>
</AttributeType>
<AttributeType name="SYM" dt:type="char" required="yes">
<description>String representing a phoneme for a language supported by the voice implementing synthesizing speech. Refer to SAPI Phoneme Spec.</description>
</AttributeType>
<AttributeType name="LANGID" dt:type="int" required="yes">
<description>Language identifier. The language identifier is specified as a hexadecimal value. For example, the LANGID for English (US) expressed in the hexadecimal form is 409. </description>
</AttributeType>
<AttributeType name="LEVEL" dt:type="int" required="yes">
<description> This specifies the volume as percent of the maximum volume of the current voice. Each voice implementation has its own maximum volume. This value must between 0 and 100 inclusive. Values above 100 or below 0 are clipped to 100 and 0 respectively.</description>
</AttributeType>
<AttributeType name="MARK" dt:type="int" required="yes">
<description>The value of a bookmark may be any string or integer. </description>
</AttributeType>
<AttributeType name="MIDDLE" dt:type="int" required="yes">
<description>The value can range from 10 to +10. A value of 0 sets a voice to speak at its default pitch. A value of 10 sets a voice to speak at three-fourths (or Ύ) of its default pitch. A value of +10 sets a voice to speak at four-thirds (or 4/3) of its default pitch. Each increment between 10 and +10 is logarithmically distributed such that incrementing/decrementing by 1 is multiplying/dividing the pitch by the 24th root of 2 (about 1.03). Values more extreme than 10 and 10 will be passed to an engine but SAPI 5compliant engines may not support such extremes and instead may clip the pitch to the maximum or minimum pitch it supports. Values of 24 and +24 must lower and raise pitch by 1 octave respectively. All incrementing/decrementing by 1 must multiply/divide the pitch by the 24th root of 2. When scoped, this attribute is relative.</description>
</AttributeType>
<AttributeType name="MSEC" dt:type="int" required="yes">
<description>Number of milliseconds, from zero to 65535, of silence. Value entries that exceed this range should be limited to 65535. Value entries that are below this range (negative values) should be set to zero. </description>
</AttributeType>
<AttributeType name="OPTIONAL" dt:type="string">
<description>The XML parser selects the first voice registered containing all of the specified attributes. A string that contains semicolon-delimited sub-strings is used to specify the attributes. The speak call will fail if the parser cannot find the required tags.
</description>
</AttributeType>
<AttributeType name="REQUIRED" dt:type="string">
<description>The XML parser selects the first voice registered containing all of the specified attributes. A string that contains semicolon-delimited sub-strings is used to specify the attributes. The speak call will fail if the parser cannot find the required tags.
</description>
</AttributeType>
<AttributeType name="SPEED" dt:type="int">
<description>The value can range from 10 to +10. A value of 0 sets a voice to speak at its default rate. A value of 10 sets a voice to speak at one-third (or 1/3) of its default rate. A value of +10 sets a voice to speak at 3 times its default rate. Each increment between 10 and +10 is logarithmically distributed such that incrementing/decrementing by 1 is multiplying/dividing the rate by the 10th root of 3 (about 1.12). Values more extreme than 10 and +10 will be passed to an engine, but SAPI 5compliant engines may not support such extremes and instead may clip the rate to the maximum or minimum rate it supports. When scoped, this attribute is relative.</description>
</AttributeType>
<AttributeType name="PART" dt:type="enumeration" dt:values="noun|verb|modifier|function|interjection|unknown" required="yes">
<description> String name of part of speech. Valid SAPI parts of speech arenoun, verb, modifier, function, interjection and unknown. </description>
</AttributeType>
<AttributeType name="ABSMIDDLE" dt:type="int">
<description> The value can range from 10 to +10. A value of 0 sets a voice to speak at its default pitch. A value of 10 sets a voice to speak at three-fourths (or Ύ) of its default pitch. A value of +10 sets a voice to speak at four-thirds (or 4/3) of its default pitch. Each increment between 10 and +10 is logarithmically distributed such that incrementing/decrementing by 1 is multiplying/dividing the pitch by the 24th root of 2 (about 1.03). Values more extreme than 10 and 10 will be passed to an engine but SAPI 5compliant engines may not support such extremes and instead may clip the pitch to the maximum or minimum pitch it supports. Values of 24 and +24 must lower and raise pitch by 1 octave respectively. All incrementing/decrementing by 1 must multiply/divide the pitch by the 24th root of 2. When scoped, this attribute is absolute.</description>
</AttributeType>
<AttributeType name="ABSSPEED" dt:type="int">
<description>The value can range from 10 to +10. A value of 0 sets a voice to speak at its default rate. A value of 10 sets a voice to speak at one-third (or 1/3) of its default rate. A value of +10 sets a voice to speak at 3 times its default rate. Each increment between 10 and +10 is logarithmically distributed such that incrementing/decrementing by 1 is multiplying/dividing the rate by the 10th root of 3 (about 1.12). Values more extreme than 10 and +10 will be passed to an engine, but SAPI 5compliant engines may not support such extremes and instead may clip the rate to the maximum or minimum rate it supports. When scoped, this attribute is absolute.</description>
</AttributeType>
<!-- Definition of SAPI Element -->
<ElementType name="SAPI" content="mixed" model="open">
<description>At the beginning of the SAPI tag, the state of the voice is the same state as the insertion point of the SAPI tag. At the close of the SAPI tag, the voice returns to the same state as that of the insertion point. SAPI tags may be nested. When a nested SAPI tag is closed, the voice state returns to what it was at the insertion point of the nested tag. </description>
<element type="BOOKMARK"/>
<element type="SILENCE"/>
<element type="EMPH">
<description> Place emphasis on the words contained by this element. It is up to the engine implementation to design what emphasis is for the engine. </description>
</element>
<element type="SPELL">
<description>Spell out words letter by letter contained by this element. NOTE: The engine should not normalize the text scoped in the SPELL tag. This includes numbers, words, etc. Words which contain punctuation, such as U.S.A should spell out the letters as well as the punctuation scoped within the tag. </description>
</element>
<element type="PARTOFSP"/>
<element type="PRON">
<description>String representing a phoneme for a language supported by the voice implementing synthesized speech. </description>
</element>
<element type="LANG"/>
<element type="VOICE"/>
<element type="RATE"/>
<element type="VOLUME">
<description>0 to 100 (no overflow allowed)</description>
</element>
<element type="PITCH">
<description>Set the relative pitch adjustment of synthesized speech.</description>
</element>
<element type="CONTEXT"/>
</ElementType>
<!-- Definition of elements -->
<!--Definition of BOOKMRK Element -->
<ElementType name="BOOKMARK" content="empty" model="closed">
<description>Inserts a bookmark into the input stream using the bookmark element. If an application specifies interest in bookmark events, it will receive an event when synthesis has passed this element in an input stream. If the audio output destination supports handling of events, then an application will receive this event once the synthesized speech up to this bookmark has been output. Otherwise, an application receives a bookmark event when the voice implementation has synthesized speech up to this bookmark. </description>
<attribute type="MARK"/>
</ElementType>
<!-- Definition of SILENCE Element -->
<ElementType name="SILENCE" content="empty" model="closed">
<description>Produces silence for a specified number of milliseconds to the output audio stream. </description>
<attribute type="MSEC"/>
</ElementType>
<!-- Definition of EMPH Element -->
<ElementType name="EMPH" content="empty" model="closed">
<description>Places emphasis on the words contained by this element. </description>
</ElementType>
<!-- Definition of SPELL Element -->
<ElementType name="SPELL" content="empty" model="closed">
<description>Spells out words letter by letter contained by this element.
Note: The engine should not normalize the text scoped in the SPELL tag. This includes numbers, words, etc. Words that contain punctuation, such as "U.S.A." should spell out the letters as well as the punctuation scoped within the tag. </description>
</ElementType>
<!-- Definition of PARTOFSP Element -->
<ElementType name="PARTOFSP" content="mixed" model="closed">
<description>The part of speech of contained word(s). The PARTOFSP tag is used to force a particular pronunciation of a word (for example, the word record as a noun versus the word record as a verb). </description>
<attribute type="PART"/>
</ElementType>
<!--Definition of PRON Element-->
<ElementType name="PRON" content="mixed" model="open">
<description>Pronounces the contained text (possibly empty) according to the provided Unicode string.
</description>
<attribute type="SYM"/>
</ElementType>
<!-- Definition of LANG Element -->
<ElementType name="LANG" content="mixed" model="closed">
<description>Changes the LANGID of the scoped text. When the LANGID is changed, SAPI will try to detect if the current voice can handle the new language. If voice does not speak the specified language, then an engine must choose another language it speaks as a best attempt. Using the VOICE tag and REQUIRED attribute, this fall back path can be prevented if not desirable.
</description>
<attribute type="LANGID"/>
</ElementType>
<!-- Definition of VOICE Element -->
<ElementType name="VOICE" content="mixed" model="closed">
<description>Sets which voice implementation is used for synthesis of associated input stream text. The best voice implementation given the required and optional attributes will be selected by SAPI. </description>
<attribute type="REQUIRED"/>
<attribute type="OPTIONAL"/>
</ElementType>
<!-- Definition of RATE Element -->
<ElementType name="RATE" content="mixed" model="closed">
<description>Set the relative speed adjustment at which words are synthesized.</description>
<attribute type="SPEED"/>
<attribute type="ABSSPEED"/>
</ElementType>
<!-- Definition of VOLUME Element -->
<ElementType name="VOLUME" content="mixed" model="closed">
<description>The scoped/global elements VOLUME modify the underlying numerical values of a speech block. The underlying value can never be below zero or exceed 100. All negative value entries will result in zero and all values above 100 will result in 100. VOLUME may also receive an absolute value (no '-' or '+' character) of an integer between zero and 100. </description>
<attribute type="LEVEL"/>
</ElementType>
<!-- Definition of PITCH Element -->
<ElementType name="PITCH" content="mixed" model="closed">
<description>The scoped/global element PITCH modifies the underlying numerical values of a speech block. Relative attribute values, those preceded by a dash (-) or a plus sign (+), increment the underlying numerical value by the specified amount. SAPI compliant engines have the option of supporting only the guaranteed range of values and behaving as -10 for adjustments below -10 and behaving as +10 for values above +10.</description>
<attribute type="MIDDLE"/>
<attribute type="ABSMIDDLE"/>
</ElementType>
<!-- Definition of CONTEXT Element -->
<ElementType name="CONTEXT" content="mixed" model="closed">
<description>The context can specify the type of normalization rules which should be applied to the scoped text. SAPI does not guarantee any predefined contexts. </description>
<attribute type="ID"/>
</ElementType>
</Schema>
Schema Attributes Reference:
The element can contain elements, attributes, and text not specified in the
content model. This is the default value.
The element cannot contain elements, attributes, and text except for that
specified in the content model. DTDs use a closed model.
The element can contain only text, not elements. Note that if the model
attribute is set to "open", the element can contain text and
additional elements.
The element can contain only the elements, not free text. Note that if the
model attribute is set to "open", the element can contain text and
additional elements.
The element cannot contain text or elements. Note that if the model attribute
is set to "open", the element can contain text and additional
elements.
The element can contain a mix of named elements and text. This is the default
value.
Permits only one of a set of elements.
Requires the elements to appear in the specified sequence.
Permits the elements to appear (or not appear) in any order. This is the
default.
Datatype Reference:
MIME-style Base64 encoded binary BLOB.
Hexadecimal digits representing octets.
0 or 1, where 0 == "false" and 1 =="true".
String, one character long.
Date in a subset ISO 8601 format, without the time data. For example:
"1994-11-05".
Date in a subset of ISO 8601 format, with optional time and no optional zone.
Fractional seconds can be as precise as nanoseconds. For example,
"1988-04-07T18:39:09".
Date in a subset ISO 8601 format, with optional time and optional zone.
Fractional seconds can be as precise as nanoseconds. For example:
"1988-04-07T18:39:09-08:00".
Represents the XML ENTITY type.
Represents the XML ENTITIES type.
Represents an enumerated type (supported on attributes only).
Same as "number" but no more than 14 digits to the left of the
decimal point, and no more than 4 to the right.
Real number, with no limit on digits; can potentially have a leading sign,
fractional digits, and optionally an exponent. Punctuation as in U.S. English.
Values range from 1.7976931348623157E+308 to 2.2250738585072014E-308.
Represents the XML ID type.
Represents the XML IDREF type.
Represents the XML IDREFS type.
Number, with optional sign, no fractions, and no exponent.
Represents the XML NMTOKEN type.
Represents the XML NMTOKENS type.
Represents a NOTATION type.
Number, with no limit on digits; can potentially have a leading sign,
fractional digits, and optionally an exponent. Punctuation as in U.S. English.
(Values have same range as most significant number, R8,
1.7976931348623157E+308 to 2.2250738585072014E-308.)
Represents a string type.
Time in a subset ISO 8601 format, with no date and no time zone. For example:
"08:15:27".
Time in a subset ISO 8601 format, with no date but optional time zone. For
example: "08:1527-05:00".
Integer represented in one byte. A number, with optional sign, no fractions,
no exponent. For example: "1, 127, -128".
Integer represented in one word. A number, with optional sign, no fractions,
no exponent. For example: "1, 703, -32768".
Integer represented in four bytes. A number, with optional sign, no fractions,
no exponent. For example: "1, 703, -32768, 148343, -1000000000".
Real number, with seven digit precision; can potentially have a leading sign,
fractional digits, and optionally an exponent. Punctuation as in U.S. English.
Values range from 3.40282347E+38F to 1.17549435E-38F.
Real number, with 15 digit precision; can potentially have a leading sign,
fractional digits, and optionally an exponent. Punctuation as in U.S. English.
Values range from 1.7976931348623157E+308 to 2.2250738585072014E-308.
Unsigned integer. A number, unsigned, no fractions, no exponent. For example:
"1, 255".
Unsigned integer, two bytes. A number, unsigned, no fractions, no exponent.
For example: "1, 255, 65535".
Unsigned integer, four bytes. A number, unsigned, no fractions, no exponent.
For example: "1, 703, 3000000000".
Universal Resource Identifier (URI). For example,
"urn:schemas-microsoft-com:Office9".
Hexadecimal digits representing octets, optional embedded hyphens that are
ignored. For example: "333C7BC4-460F-11D0-BC04-0080C7055A83".