27

I have a formatted XML file, and I want to convert it to one line string, how can I do that.

Sample xml:

<?xml version="1.0" encoding="UTF-8"?> <books> <book> <title>Basic XML</title> <price>100</price> <qty>5</qty> </book> <book> <title>Basic Java</title> <price>200</price> <qty>15</qty> </book> </books> 

Expected output

<?xml version="1.0" encoding="UTF-8"?><books><book> <title>Basic XML</title><price>100</price><qty>5</qty></book><book><title>Basic Java</title><price>200</price><qty>15</qty></book></books> 
2
  • @Tomalak I need that to be pass to a cgi as an input and that cgi only accepts xml in one-line form. Commented Apr 4, 2011 at 14:32
  • @All, thanks a lot for all the answers Commented Apr 4, 2011 at 14:34

11 Answers 11

48
//filename is filepath string BufferedReader br = new BufferedReader(new FileReader(new File(filename))); String line; StringBuilder sb = new StringBuilder(); while((line=br.readLine())!= null){ sb.append(line.trim()); } 

using StringBuilder is more efficient then concat http://kaioa.com/node/59

Sign up to request clarification or add additional context in comments.

3 Comments

This will not remove leading/trailing spaces, no?
This doesn't respect the encoding mentioned in the XML document, does it?
sorry for offtopic comment but that link is expired and redirect users to irrelevant domains.
8

Run it through an XSLT identity transform with <xsl:output indent="no"> and <xsl:strip-space elements="*"/>

<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output indent="no" /> <xsl:strip-space elements="*"/> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet> 

It will remove any of the non-significant whitespace and produce the expected output that you posted.

2 Comments

this seems to be a nice way but you did not mention how to run this XSLT in Java?
6
// 1. Read xml from file to StringBuilder (StringBuffer) // 2. call s = stringBuffer.toString() // 3. remove all "\n" and "\t": s.replaceAll("\n",""); s.replaceAll("\t",""); 

edited:

I made a small mistake, it is better to use StringBuilder in your case (I suppose you don't need thread-safe StringBuffer)

4 Comments

What if there was whitespace between a content element e.g. <text>foo (newline) bar</text>?
double spaces, look at expected result, we have e.g. <book> <title> - after book is space. I don't think @sprenna want do something with spaces.
It looks like an error in the example, b/c the other <book><title> combinations have no space in between
that is a typo, there shouldn't be any space in between. sorry for that.
5

In java 1.8 and above

BufferedReader br = new BufferedReader(new FileReader(filePath)); String content = br.lines().collect(Collectors.joining("\n")); 

1 Comment

If the OP wants to minify the XML, something like this might work for most documents: reader.lines().map(String::trim).collect(Collectors.joining());. Note: it would likely fail in cases where element attributes are split over multiple lines.
4

Using this answer which provides the code to use Dom4j to do pretty-printing, change the line that sets the output format from: createPrettyPrint() to: createCompactFormat()

public String unPrettyPrint(final String xml){ if (StringUtils.isBlank(xml)) { throw new RuntimeException("xml was null or blank in unPrettyPrint()"); } final StringWriter sw; try { final OutputFormat format = OutputFormat.createCompactFormat(); final org.dom4j.Document document = DocumentHelper.parseText(xml); sw = new StringWriter(); final XMLWriter writer = new XMLWriter(sw, format); writer.write(document); } catch (Exception e) { throw new RuntimeException("Error un-pretty printing xml:\n" + xml, e); } return sw.toString(); } 

Comments

4

Open and read the file.

Reader r = new BufferedReader(filename); String ret = ""; while((String s = r.nextLine()!=null)) { ret+=s; } return ret; 

3 Comments

ret +=s :(( don't do that, better use StringBuffer
@smas :P it's not real code, I still haven't figured out to properly format on this site so I went for the most concise way. The idea still holds (if you import the relevant libraries, set up the variables like filename, and set up try try{} catch{} blocks)
don't use string concat or stringbuffer as smas suggests, use StringBuilder kaioa.com/node/59
3

Underscore-java library has static method U.formatXml(xmlstring). Live example

import com.github.underscore.U; import com.github.underscore.Xml; public class MyClass { public static void main(String[] args) { System.out.println(U.formatXml("<a>\n <b></b>\n <b></b>\n</a>", Xml.XmlStringBuilder.Step.COMPACT)); } } // output: <a><b></b><b></b></a> 

Comments

1

I guess you want to read in, ignore the white space, and write it out again. Most XML packages have an option to ignore white space. For example, the DocumentBuilderFactory has setIgnoringElementContentWhitespace for this purpose.

Similarly if you are generating the XML by marshaling an object then JAXB has JAXB_FORMATTED_OUTPUT

Comments

1

The above solutions work if you are compressing all white space in the XML document. Other quick options are JDOM (using Format.getCompactFormat()) and dom4j (using OutputFormat.createCompactFormat()) when outputting the XML document.

However, I had a unique requirement to preserve the white space contained within the element's text value and these solutions did not work as I needed. All I needed was to remove the 'pretty-print' formatting added to the XML document.

The solution that I came up with can be explained in the following 3-step/regex process ... for the sake of understanding the algorithm for the solution.

String regex, updatedXml; // 1. remove all white space preceding a begin element tag: regex = "[\\n\\s]+(\\<[^/])"; updatedXml = originalXmlStr.replaceAll( regex, "$1" ); // 2. remove all white space following an end element tag: regex = "(\\</[a-zA-Z0-9-_\\.:]+\\>)[\\s]+"; updatedXml = updatedXml.replaceAll( regex, "$1" ); // 3. remove all white space following an empty element tag // (<some-element xmlns:attr1="some-value".... />): regex = "(/\\>)[\\s]+"; updatedXml = updatedXml.replaceAll( regex, "$1" ); 

NOTE: The pseudo-code is in Java ... the '$1' is the replacement string which is the 1st capture group.

This will simply remove the white space used when adding the 'pretty-print' format to an XML document, yet preserve all other white space when it is part of the element text value.

Comments

1

Below I present the prepared solution. Only the standard library of Java 1.8 was used.

XSLT:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output indent="no"/> <xsl:strip-space elements="*"/> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet> 

Java:

public static String convertXmlToOneLine(String xml) throws TransformerException { final String xslt = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n" + "<xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">\n" + " <xsl:output indent=\"no\"/>\n" + " <xsl:strip-space elements=\"*\"/>\n" + " <xsl:template match=\"@*|node()\">\n" + " <xsl:copy>\n" + " <xsl:apply-templates select=\"@*|node()\"/>\n" + " </xsl:copy>\n" + " </xsl:template>\n" + "</xsl:stylesheet>"; /* prepare XSLT transformer from String */ Source xsltSource = new StreamSource(new StringReader(xslt)); TransformerFactory factory = TransformerFactory.newInstance(); Transformer transformer = factory.newTransformer(xsltSource); /* where to read the XML? */ Source source = new StreamSource(new StringReader(xml)); /* where to write the XML? */ StringWriter stringWriter = new StringWriter(); Result result = new StreamResult(stringWriter); /* transform XML to one line */ transformer.transform(source, result); return stringWriter.toString(); } 

Sample output:

<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"><xsl:output indent="no"/><xsl:strip-space elements="*"/><xsl:template match="@*|node()"><xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy></xsl:template></xsl:stylesheet> 

License: The MIT License

Comments

-2
FileUtils.readFileToString(fileName); 

link

1 Comment

The link even dictates that the method is depreciated. I wouldn't recommend using this method when a simple buffer read with trim would suffice

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.