We have an web application that integrates with a third-party service where the service will send a response after receiving a request that is created from our application. Currently we are troubleshooting an issue where when the service's response returns special or accented characters (e.g Latin-accented characters) as an encoded artifact in our Wildfly logs and we also see an extra non-printable character added to the end (a '?'), which causes the string length count to be incorrect to our reader class.
For example the string "PARAŇAQUE", as the intended string, the web application logs the response of that string with the Ň shown as the following image below:
Note: We tried to replicate that encoding artifact directly using online tools but we could not reproduce it locally as this was spotted in our UAT test environment which is located in a different network. We tried exporting and transferring the file to our local, but the character shown is different when opened.
I've checked with the third-party service provider and they confirmed that they are returning the value as PARAŇAQUE in their response file.
First question: is it correct that this is also considered a double-encoding issue between ISO-8859-1 and UTF-8 encoding character sets?
We also are looking at different methods how to resolve this issue, but in making it generic and performance-oriented, we are looking at this two fix methods:
private String fixResponseEncoding(String responseString) { try { // Try the most common fix: ISO-8859-1 → UTF-8 String fixed = new String(responseString.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8); // Only use the fix if it reduces encoding artifacts if (countEncodingArtifacts(fixed) < countEncodingArtifacts(responseString)) { LOGGER.info("Applied double-encoding fix (ISO-8859-1 → UTF-8)"); return fixed; } // If no improvement, return original return responseString; } catch (Exception e) { LOGGER.warn("Encoding fix failed: {}", e.getMessage()); return responseString; } } private int countEncodingArtifacts(String text) { int count = 0; // Count common double-encoding patterns (Ã followed by high bytes) for (int i = 0; i < text.length() - 1; i++) { char c1 = text.charAt(i); char c2 = text.charAt(i + 1); // Ã (0xC3) followed by 0x80-0xBF range if (c1 == 'Ã' && c2 >= 0x80 && c2 <= 0xBF) { count++; } // Â (0xC2) followed by 0x80-0xBF range if (c1 == 'Â' && c2 >= 0x80 && c2 <= 0xBF) { count++; } } // Count replacement characters count += text.length() - text.replace("�", "").length(); return count; } Second question: Is our fixing method correct and an effective way to address this? We are well aware that the service that the third-party service provider used is also used by other clients as well.
