0

I am using DocumentBuilder to convert xhtml(xml) from the internet which contains "--" in comment to org.w3c.dom.Document. Are there may method to bypass it? I have already set the setIgnoringComments and setValidating.

I know -- is not permitted to appear within comments in XML in W3C specification. related posts.

Any suggestions to preprocess XML before convention?

public static Document convertXmlStrToDocument(String xml) throws ParserConfigurationException, SAXException, IOException{ DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance(); documentBuilderFactory.setIgnoringComments(true); documentBuilderFactory.setValidating(false); DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder(); Document document = documentBuilder.parse(new ByteArrayInputStream(xml.getBytes())); return document; } 

It throw exception:

org.xml.sax.SAXParseException; lineNumber: 914; columnNumber: 17; The string "--" is not permitted within comments. at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) at com.techoffice.util.XmlUtil.convertXmlStrToDocument(XmlUtil.java:41) at com.techoffice.util.XmlUtil.evaluateXpath(XmlUtil.java:46) at com.techoffice.jc.horse.service.web.ResultWebService.raceDateSelect(ResultWebService.java:41) at com.techoffice.jc.horse.service.web.ResultWebServiceTest.retrieveXml(ResultWebServiceTest.java:35) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75) at org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86) at org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:252) at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:94) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61) at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:191) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192) 
3
  • Thank. I got the definite answer, no. I would like to know any method to pass it including preprocess the XML content. Commented Dec 13, 2016 at 0:36
  • I found that html tidy is console application and a c library. But my application is a Java. Commented Dec 13, 2016 at 0:55
  • Then look at the Java version of HTML Tidy (answer updated), but note that this question seems to be morphing into a tool/library request, which is offtopic here. Commented Dec 13, 2016 at 1:22

2 Answers 2

1

No, the string "--" must not appear within an XML comment:

For compatibility, the string " -- " (double-hyphen) must not occur within comments.

This is not configurable. Anything's hackable, but you'll be going against the grain and without XML parser support. Not recommended.

Try HTML Tidy to clean-up the HTML first. There is also a Java version of HTML Tidy.

Sign up to request clarification or add additional context in comments.

1 Comment

I have tried to jtidy the XML before processing. It works. Thanks
0

If this is the situation

function escape(input) { input = input.replace(/->/g, '_'); return '<!-- ' + input + ' -->'; } 

if you want to bypass the Html comment by input then use

--!>

after this, you can write whatever you want.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.