Need to understand - why CDATA section is treated as if the <![CDATA[ and ]]>?

Question

I was reading a text book to learn XPath. And the below line I found from that book:

How does XPath handle text in XML CDATA sections? Each character within a CDATA section is treated as character data. In other words, a CDATA section is treated as if the <![CDATA[ and ]]> were removed and every occurrence of markup like < and & was replaced by the corresponding character entities like < and &.

But the book didn't give any examples to explain the above sentences. Can any one help me to understand what the Author tried to say in the below:

a CDATA section is treated as if the <![CDATA[ and ]]> were removed and every occurrence of markup like < and & was replaced by the corresponding character entities like < and &.

Book's probably badly typeset, and someone ripped out the encoded entities and put in the the rendered equivalents: CDATA means that while within the cdata block, < is treated as if it was <, and & is &. e.g. the xml metacharacters lose their "meta-ness". — Marc B
– Marc B, Commented Jul 15, 2013 at 14:45
@JensErat thanks for the edit. I don't know why that part got hidden :( — Arup Rakshit
– Arup Rakshit, Commented Jul 15, 2013 at 15:14
@Priti That happens a lot. Stackoverflow uses Markdown for formatting posts (which I regard superior anyway), but also allows HTML. This leads to lots of missing XML input all the time, as browsers ignore them if they're not HTML... — Jens Erat
– Jens Erat, Commented Jul 15, 2013 at 17:34

Ian Roberts · Accepted Answer · 2013-07-15 14:45:50Z

I think of it the other way round - everything between a <![CDATA[ and the next ]]> is treated as text, and not subject to the usual decoding of entity references, and < signs don't introduce element names. So

<something><![CDATA[<foo>text&more</foo>]]></something>

is the same as

<something>&lt;foo>text&amp;more&lt;/foo></something>

whereas

<something><foo>text&more</foo></something>

is not well-formed XML (because the & is treated as the start of an entity reference but there's no corresponding ; to end it).

+1 good explanation and example (although I think the first paragraph may be more confusing than helpful). I guess you're saying that you think of it in terms of the text inside the CDATA not getting parsed; where as the book describes that stuff as being escaped before being parsed. I guess I think of it the same way as you, though the book is right too.

Collectives™ on Stack Overflow

Need to understand - why CDATA section is treated as if the <![CDATA[ and ]]>?

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related