中文XML论坛--SAX不能正确处理特殊字符的转义实体？

贴子主题： SAX不能正确处理特殊字符的转义实体？

举报打印推荐 IE收藏夹

本主题类别:

刚刚学习使用SAX解析XML，遇到了两个问题。

全部程序如下：
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SimpeXmlHandler extends DefaultHandler {

private String str = null;

private String element = null;

  public void startElement(String namespaceURI, String localName,
      String fullName, Attributes attributes) throws SAXException {
    element = fullName;
    for (int i = 0; i < attributes.getLength(); i++) {
      String qName = attributes.getQName(i);
      if (qName.equals("id")) {
        System.out.println("id=" + attributes.getValue(qName).trim());
        break;
      }
    }
  }

  public void endElement(String uri, String localName, String qName)
      throws SAXException {
    if (str != null) {
      if (element.equalsIgnoreCase("title")) {
        System.out.println("title=" + str);
      } else if (element.equalsIgnoreCase("href")) {
        System.out.println("href=" + str);
      } else if (element.equalsIgnoreCase("content")) {
        System.out.println("content=" + str);
      }
    }
  }

  public void characters(char[] chars, int start, int length)
      throws SAXException {
    str = new String(chars, start, length).trim();
  }
}

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;

public class SimpleXmlTest {

  public static void main(String[] args) throws Exception {
    SimpeXmlHandler handler = new SimpeXmlHandler();
    SAXParserFactory factory = SAXParserFactory.newInstance();
    factory.setValidating(false);
    SAXParser parser = factory.newSAXParser();
    XMLReader xmlReader = parser.getXMLReader();
    xmlReader.setContentHandler(handler);
    InputSource source = new InputSource("config/sample.xml");
    xmlReader.parse(source);
  }
}

执行SimpleXmlTest解析如下的XML文件，
<?xml version="1.0" encoding="UTF-8"?>
<root>
  <articles>
    <article id="00001">
      <title>titleValue</title>
      <href>hrefValue</href>
      <publishtime>timeValue</publishtime>
      <content>contentValue</content>
      <tag>0</tag>
    </article>
  </articles>
</root>
结果如下：
id=00001
title=titleValue
href=hrefValue
content=contentValue

将XML文件的内容换成如下：
<?xml version="1.0" encoding="UTF-8"?>
<root>
  <articles>
    <article id="00001">
      <title>titleValue</title>
      <href>hrefValue</href>
      <publishtime>timeValue</publishtime>
      <content>start>end</content>
      <tag>0</tag>
    </article>
  </articles>
</root>
执行程序后会得到如下结果：
id=00001
title=titleValue
href=hrefValue
content=end

再将XML文件的内容换成如下：
<?xml version="1.0" encoding="UTF-8"?>
<root>
  <articles>
    <article id="00001">
      <title>titleValue</title>
      <href>hrefValue</href>
      <publishtime>timeValue</publishtime>
      <content>start>end</content>
      <tag>0</tag>
    </article>
  </articles>
</root>
再次执行程序后会得到如下结果：
id=00001
title=titleValue
href=hrefValue
content=start>end

似乎SAX会自动地把">"转换成">"，这样就造成了错误。
一般地，在XML文件中直接使用">"，"<"，"&"，...等特殊字符会造成错误，所以会使用">"，"<"，"&amp"，...等转义实体。
但在我的程序中，似乎恰恰与此相反。
如何解释上述情况呢？

另，将XML文件的内容换成如下：
<?xml version="1.0" encoding="UTF-8"?>
<root>
  <articles>
    <article id="00001">
      <title>titleValue</title>
      <href>hrefValue</href>
      <publishtime>timeValue</publishtime>
      <content>contentValue</content>
      
    </article>
  </articles>
</root>
执行测试程序后会得到如下结果：
id=00001
title=titleValue
href=hrefValue
content=contentValue
content=
content=
content=
对于最后三行的"content="，我不能理解。

希望大家能为我解惑，谢谢！


	W 3 C h i n a ( since 2003 ) 旗下站点苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》	93.750ms