以文本方式查看主题

-  中文XML论坛 - 专业的XML技术讨论区  (http://bbs.xml.org.cn/index.asp)
--  『 其他W3C规范 』  (http://bbs.xml.org.cn/list.asp?boardid=25)
----  请教XML高手  (http://bbs.xml.org.cn/dispbbs.asp?boardid=25&rootid=&id=51388)


--  作者:duxiong
--  发布时间:8/15/2007 10:13:00 AM

--  请教XML高手
XML标准中,对文档的定义为:
[1]    document    ::=    ( prolog element Misc* ) - ( Char* RestrictedChar Char* )  

请教:
1. Misc*; Char*等的*号是什么意思?
2. ( prolog element Misc* ) 的意思是prolog + element + Misc*吗?
3. ( Char* RestrictedChar Char* )应该如何理解呢?也是Char* +  RestrictedChar + Char* 吗?
4. 有没有关于这种表达式及其语法规则?或者它只是XML标准中的文本约定?

5.万分感谢


--  作者:duxiong
--  发布时间:8/24/2007 5:16:00 PM

--  自己找到答案了
没人回答,好在自己找到答案了。共享一下:

document ::= prolog element Misc*

This production says that the symbol named document (which represents a well-formed XML document), consists simply of one prolog followed by one element followed by zero or more Miscs. Each of these symbols is defined in terms of other symbols and character sequences.

Note that the XML 1.0 Recommendation refers to UCS characters by their Unicode scalar values, using a notation of #x followed by only as many hex digits as needed. So #x9 in the EBNF productions means the abstract character that would be represented in Unicode 3.1's "U+" notation as U+0009. It does not necessarily mean a byte with hex value 9.

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
S ::= (#x20 | #x9 | #xD | #xA)+

The first line means that Char is the one character that is in those ranges listed. Note that characters U+0000 through U+0008 and several other ranges are not considered Chars and are not allowed in XML documents. The second line shows that S is a sequence of one or more instances of any of the 4 "whitespace" characters. The definition of a Comment is given as:

Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

This means that Comment is the 4 characters <!-- and the 3 characters -->, in between which are 0 or more instances of either a Char that is not -, or the character - followed by a Char that is not -.

Misc ::= Comment | PI | S

This means that Misc is one of Comment, PI, or S. The definition of PI is too lengthy to include here, so we'll just leave it as it is.

Since Comment and S have been defined, it would be just as accurate to say:

Misc ::= '<!--' ((#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] - '-') | ('-' (#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] - '-')))* '-->' | PI | (#x20 | #x9 | #xD | #xA)+

The other components of document are defined in the same way. It follows that a well-formed XML document is a UCS character sequence that follows certain patterns.


--  作者:cndev
--  发布时间:9/23/2007 11:33:00 AM

--  
1. Misc*; Char*等的*号是什么意思?
2. ( prolog element Misc* ) 的意思是prolog + element + Misc*吗?
3. ( Char* RestrictedChar Char* )应该如何理解呢?也是Char* +  RestrictedChar + Char* 吗?
4. 有没有关于这种表达式及其语法规则?或者它只是XML标准中的文本约定?
这是正规表达式的语法。
a* 表示0个或多个a
a+表示一个或多个a
ab表示a和b的连接
a|b表示a或者b
具体的你可以在网上查一查正规表达式的语法。
W 3 C h i n a ( since 2003 ) 旗 下 站 点
苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
62.500ms