2

I was reading a text book to learn XPath. And the below line I found from that book:

How does XPath handle text in XML CDATA sections? Each character within a CDATA section is treated as character data. In other words, a CDATA section is treated as if the <![CDATA[ and ]]> were removed and every occurrence of markup like < and & was replaced by the corresponding character entities like &lt; and &amp;.

But the book didn't give any examples to explain the above sentences. Can any one help me to understand what the Author tried to say in the below:

a CDATA section is treated as if the <![CDATA[ and ]]> were removed and every occurrence of markup like < and & was replaced by the corresponding character entities like &lt; and &amp;.

4
  • Book's probably badly typeset, and someone ripped out the encoded entities and put in the the rendered equivalents: CDATA means that while within the cdata block, < is treated as if it was &lt;, and & is &amp;. e.g. the xml metacharacters lose their "meta-ness". Commented Jul 15, 2013 at 14:45
  • 1
    Actually it was the question which was badly formatted. :) Commented Jul 15, 2013 at 15:11
  • @JensErat thanks for the edit. I don't know why that part got hidden :( Commented Jul 15, 2013 at 15:14
  • @Priti That happens a lot. Stackoverflow uses Markdown for formatting posts (which I regard superior anyway), but also allows HTML. This leads to lots of missing XML input all the time, as browsers ignore them if they're not HTML... Commented Jul 15, 2013 at 17:34

1 Answer 1

4

I think of it the other way round - everything between a <![CDATA[ and the next ]]> is treated as text, and not subject to the usual decoding of entity references, and < signs don't introduce element names. So

<something><![CDATA[<foo>text&more</foo>]]></something> 

is the same as

<something>&lt;foo>text&amp;more&lt;/foo></something> 

whereas

<something><foo>text&more</foo></something> 

is not well-formed XML (because the & is treated as the start of an entity reference but there's no corresponding ; to end it).

Sign up to request clarification or add additional context in comments.

1 Comment

+1 good explanation and example (although I think the first paragraph may be more confusing than helpful). I guess you're saying that you think of it in terms of the text inside the CDATA not getting parsed; where as the book describes that stuff as being escaped before being parsed. I guess I think of it the same way as you, though the book is right too.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.