java.lang.NumberFormatException for input string "1"

Question

So, I have an issue that really bothers me. I have a simple parser that I made in java. Here is the piece of relevant code:

while( (line = br.readLine())!=null) { String splitted[] = line.split(SPLITTER); int docNum = Integer.parseInt(splitted[0].trim()); //do something }

Input file is CSV file, the first entry of the file being an integer. When I start parsing, I immidiately get this exception:

Exception in thread "main" java.lang.NumberFormatException: For input string: "1" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at dipl.parser.TableParser.parse(TableParser.java:50) at dipl.parser.DocumentParser.main(DocumentParser.java:87)

I checked the file, it indeed has 1 as its first value (no other characters are in that field), but I still get the message. I think that it may be because of file encoding: it is UTF-8, with Unix endlines. And the program is run on Ubuntu 14.04. Any suggestions where to look for the problem are welcome.

Nice one using copy and paste to put the error in the question! — T.J. Crowder
– T.J. Crowder, Commented Sep 26, 2016 at 11:11

Community · Accepted Answer · 2017-05-23 12:00:44Z

38

You have a BOM in front of that number; if I copy what looks like "1" in your question and paste it into vim, I see that you have a FE FF (e.g., a BOM) in front of it. From that link:

The exact bytes comprising the BOM will be whatever the Unicode character U+FEFF is converted into by that transformation format.

So that's the issue, consume the file with the appropriate reader for the transformation (UTF-8, UTF-16 big-endian, UTF-16 little-endian, etc.) the file is encoded with. See also this question and its answers for more about reading Unicode files in Java.

edited May 23, 2017 at 12:00

CommunityBot

11 silver badge

answered Sep 26, 2016 at 11:11

T.J. Crowder

1.1m201 gold badges2k silver badges2k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

T.J. Crowder Over a year ago

@Doval: Thank you, I was absolutely wrong to say it was a UTF-8 BOM, and you're quite right that on-the-wire, the BOM for UTF-8 is EF BB BF. But what we're looking at is the end result of reading the file and then seeing the output in the error message. The file might be in any transformation; all BOMs end up being FE FF once read.

T.J. Crowder Over a year ago

But if it was read raw, then...oh, I don't know. :-) Could well have been UTF-16. :-) It'll all depend on how the file was read into the stream.

Kevin Over a year ago

"all BOMs end up being FE FF once read" - Not quite. All BOMs end up being U+FEFF (which is not the same as 0xFE 0xFF since it's a code point rather than a sequence of bytes) once decoded. Before decoding, all you have is bytes, which may be in any encoding that can represent Unicode characters (mostly UTF-8 and UTF-16 but others exist).

T.J. Crowder Over a year ago

@Kevin: Yes, that's what I meant.

Collectives™ on Stack Overflow

java.lang.NumberFormatException for input string "1"

1 Answer 1

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Linked

Related