Is a hex code with a non-printable character is a double-encoded character? [closed]

Question

We have an web application that integrates with a third-party service where the service will send a response after receiving a request that is created from our application. Currently we are troubleshooting an issue where when the service's response returns special or accented characters (e.g Latin-accented characters) as an encoded artifact in our Wildfly logs and we also see an extra non-printable character added to the end (a '?'), which causes the string length count to be incorrect to our reader class.

For example the string "PARAŇAQUE", as the intended string, the web application logs the response of that string with the Ň shown as the following image below:

Note: We tried to replicate that encoding artifact directly using online tools but we could not reproduce it locally as this was spotted in our UAT test environment which is located in a different network. We tried exporting and transferring the file to our local, but the character shown is different when opened.

I've checked with the third-party service provider and they confirmed that they are returning the value as PARAŇAQUE in their response file.

First question: is it correct that this is also considered a double-encoding issue between ISO-8859-1 and UTF-8 encoding character sets?

We also are looking at different methods how to resolve this issue, but in making it generic and performance-oriented, we are looking at this two fix methods:

private String fixResponseEncoding(String responseString) { try { // Try the most common fix: ISO-8859-1 → UTF-8 String fixed = new String(responseString.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8); // Only use the fix if it reduces encoding artifacts if (countEncodingArtifacts(fixed) < countEncodingArtifacts(responseString)) { LOGGER.info("Applied double-encoding fix (ISO-8859-1 → UTF-8)"); return fixed; } // If no improvement, return original return responseString; } catch (Exception e) { LOGGER.warn("Encoding fix failed: {}", e.getMessage()); return responseString; } } private int countEncodingArtifacts(String text) { int count = 0; // Count common double-encoding patterns (Ã followed by high bytes) for (int i = 0; i < text.length() - 1; i++) { char c1 = text.charAt(i); char c2 = text.charAt(i + 1); // Ã (0xC3) followed by 0x80-0xBF range if (c1 == 'Ã' && c2 >= 0x80 && c2 <= 0xBF) { count++; } // Â (0xC2) followed by 0x80-0xBF range if (c1 == 'Â' && c2 >= 0x80 && c2 <= 0xBF) { count++; } } // Count replacement characters count += text.length() - text.replace("�", "").length(); return count; }

Second question: Is our fixing method correct and an effective way to address this? We are well aware that the service that the third-party service provider used is also used by other clients as well.

I'm guessing that there's just not enough information here for anyone to determine what is happening. Even your example doesn't look like real code. We like short examples here, but you also need to reproduce the actual problem, and I don't see how the code does that. The actual combination of sub-systems and environments is beyond my pay grade, but I wanted to add that thought if it helps you improve the question. — markspace
– markspace, Commented Nov 11 at 18:04
Whatever the problem is, it occurs where bytes or an InputStream is being used. You need make sure the proper encoding is used before any Strings are created. "Converting" a String is an antipattern and is not reliable, since not all byte values are guaranteed to be preserved when a String is created. — VGR
– VGR, Commented Nov 11 at 18:11
instead or in addition to the image, add a textual hex dump of the string to the question. — aled
– aled, Commented Nov 11 at 19:32
We have an web application that integrates with a third-party service where the service will send a response after receiving a request that is created from our application. The encoding of this response must be specified. So why not find that out and simply use that encoding when reading it? — g00se
– g00se, Commented Nov 11 at 21:14

Collectives™ on Stack Overflow

Is a hex code with a non-printable character is a double-encoded character? [closed]

0

Hot Network Questions