Try xml_pp with style "-s cvs"
You asked for something in libxml2. I don't know about that. But if you are willing to use something else, then read on below.
xml_pp is part of the XML::Twig library and has a bunch of different preconfigured styles.
You can specify a style via the "-s" (style) parameter.
If you just leave "-s" empty, then it will show all available styles. (It actually generate that list on the fly. So it's guaranteed to be fresh.)
$ xml_pp -s Use of uninitialized value $opt{"style"} in hash element at /usr/bin/xml_pp line 100. usage: /usr/bin/xml_pp [-v] [-i<extension>] [-s (none|nsgmls|nice|indented|indented_close_tag|indented_c|wrapped|record_c|record|cvs|indented_a)] [-p <tag(s)>] [-e <encoding>] [-l] [-f <file>] [<files>] at /usr/bin/xml_pp line 100.
Here's the same thing again but in a nicer list format. It turns out that the version I have installed supports 11 formats out of the box:
$ xml_pp -s 2>&1 | grep -Po '(?<=\[-s \()[^)]*' -o | tr '|' '\n' | nl 1 none 2 nsgmls 3 nice 4 indented 5 indented_close_tag 6 indented_c 7 wrapped 8 record_c 9 record 10 cvs 11 indented_a
So let's try them all.
This is our input file:
$ cat in.xml <a attr="one" bttr="two" tttr="three" fttr="four"/>
And these are all the styles:
$ for STYLE in $(echo "none nsgmls nice indented indented_close_tag indented_c wrapped record_c record cvs indented_a"); do echo; echo "==> Style: xml_pp -s $STYLE <=="; cat in.xml | xml_pp -s $STYLE | tee out.xml_pp.$STYLE.xml; echo; done ==> Style: xml_pp -s none <== <a attr="one" bttr="two" fttr="four" tttr="three"/> ==> Style: xml_pp -s nsgmls <== <a attr="one" bttr="two" fttr="four" tttr="three" /> ==> Style: xml_pp -s nice <== <a attr="one" bttr="two" fttr="four" tttr="three"/> ==> Style: xml_pp -s indented <== <a attr="one" bttr="two" fttr="four" tttr="three"/> ==> Style: xml_pp -s indented_close_tag <== <a attr="one" bttr="two" fttr="four" tttr="three"/> ==> Style: xml_pp -s indented_c <== <a attr="one" bttr="two" fttr="four" tttr="three"/> ==> Style: xml_pp -s wrapped <== <a attr="one" bttr="two" fttr="four" tttr="three"/> ==> Style: xml_pp -s record_c <== <a attr="one" bttr="two" fttr="four" tttr="three"/> ==> Style: xml_pp -s record <== <a attr="one" bttr="two" fttr="four" tttr="three"/> ==> Style: xml_pp -s cvs <== <a attr="one" bttr="two" fttr="four" tttr="three" /> ==> Style: xml_pp -s indented_a <== <a attr="one" bttr="two" fttr="four" tttr="three" />
A bunch of these styles are equivalent for this small input file. They produce the same output:
$ sha256sum * | sort 452f5c19177d9cc6a54589168dbb1ee790c783a963110662e7dfae170bf997e4 out.xml_pp.cvs.xml 452f5c19177d9cc6a54589168dbb1ee790c783a963110662e7dfae170bf997e4 out.xml_pp.indented_a.xml 8e119bb50bcbf3d72159c96139cf328f46a0de259410acdd344f26e52f033996 out.xml_pp.nsgmls.xml d1ed9a4d1ebf8b9f1d012577809909e91e1ba0fc01b5afc8ff1302ca9dced617 out.xml_pp.record_c.xml d1ed9a4d1ebf8b9f1d012577809909e91e1ba0fc01b5afc8ff1302ca9dced617 out.xml_pp.record.xml e0d13f80ddc48876678c62e407abd3ab1eac8481a82d5aabb1514e24aee4717c in.xml ea90003eab0ba71936a8a329a87b079b4fb120fe6873d4fa9bc8f986e8654b45 out.xml_pp.indented_close_tag.xml ea90003eab0ba71936a8a329a87b079b4fb120fe6873d4fa9bc8f986e8654b45 out.xml_pp.indented_c.xml ea90003eab0ba71936a8a329a87b079b4fb120fe6873d4fa9bc8f986e8654b45 out.xml_pp.indented.xml ea90003eab0ba71936a8a329a87b079b4fb120fe6873d4fa9bc8f986e8654b45 out.xml_pp.nice.xml ea90003eab0ba71936a8a329a87b079b4fb120fe6873d4fa9bc8f986e8654b45 out.xml_pp.none.xml ea90003eab0ba71936a8a329a87b079b4fb120fe6873d4fa9bc8f986e8654b45 out.xml_pp.wrapped.xml
None of these style are exactly what you wanted.
But "cvs" is pretty close. (And "indented_a" produces identical output.)
Afterthoughts: bit dirty
Afterthoughts: Output feels a little dirty.
(a) Some of the files just start with a blank line for no good reason...
$ grep '^$' * -n out.xml_pp.record_c.xml:1: out.xml_pp.record.xml:1:
(b) ... and some of the files just have no line terminators at all:
$ file * in.xml: ASCII text out.xml_pp.cvs.xml: ASCII text out.xml_pp.indented_a.xml: ASCII text out.xml_pp.indented_close_tag.xml: ASCII text, with no line terminators out.xml_pp.indented_c.xml: ASCII text, with no line terminators out.xml_pp.indented.xml: ASCII text, with no line terminators out.xml_pp.nice.xml: ASCII text, with no line terminators out.xml_pp.none.xml: ASCII text, with no line terminators out.xml_pp.nsgmls.xml: ASCII text out.xml_pp.record_c.xml: ASCII text out.xml_pp.record.xml: ASCII text out.xml_pp.wrapped.xml: ASCII text, with no line terminators
-- The thing seems to be that xml_pp does not add a trailing newline after the last line. So if you only have ONE line then there will be no newline byte in there. Quite weird.
Looks like this:
$ wc --lines * 5 out.xml_pp.cvs.xml 5 out.xml_pp.indented_a.xml 0 out.xml_pp.indented_close_tag.xml 0 out.xml_pp.indented_c.xml 0 out.xml_pp.indented.xml 0 out.xml_pp.nice.xml 0 out.xml_pp.none.xml 5 out.xml_pp.nsgmls.xml 1 out.xml_pp.record_c.xml 1 out.xml_pp.record.xml 0 out.xml_pp.wrapped.xml 17 total
This here is how I like to add a trailing LF (0x0A byte) if none is present:
$ mkdir 1; mv out.*.xml 1/; cp -r 1/ 2/ $ pcregrep -LMr '\n\Z' 2/ | xargs -n1 --no-run-if-empty -- sed -i -e '$a\' -- $ diff --recursive 1/ 2/ | head diff --recursive 1/out.xml_pp.cvs.xml 2/out.xml_pp.cvs.xml 6c6 < /> \ No newline at end of file --- > /> diff --recursive 1/out.xml_pp.indented_a.xml 2/out.xml_pp.indented_a.xml 6c6 < /> \ No newline at end of file
Looks like this afterwards:
$ cd 2/ $ wc --lines * 6 out.xml_pp.cvs.xml 6 out.xml_pp.indented_a.xml 1 out.xml_pp.indented_close_tag.xml 1 out.xml_pp.indented_c.xml 1 out.xml_pp.indented.xml 1 out.xml_pp.nice.xml 1 out.xml_pp.none.xml 6 out.xml_pp.nsgmls.xml 2 out.xml_pp.record_c.xml 2 out.xml_pp.record.xml 1 out.xml_pp.wrapped.xml 28 total