3

I am having an issue with "umlauts" (letters ä, ü, ö, ...) and ifstream in C++.

I use curl to download an html page and ifstream to read in the downloaded file line by line and parse some data out of it. This goes well until I have a line like one of the following:

te="Olimpija Laibach - Tromsö"; te="Burghausen - Münster"; 

My code parses these lines and outputs it as the following:

Olimpija Laibach vs. Troms? Burghausen vs. M?nster 

Things like outputting umlauts directly from the code work:

cout << "öäü" << endl; // This works fine 

My code looks somewhat like this:

ifstream fin("file"); while(!(fin.eof())) { getline(fin, line, '\n'); int pos = line.find("te="); if(pos >= 0) { pos = line.find(" - "); string team1 = line.substr(4,pos-4); string team2 = line.substr(pos+3, line.length()-pos-6); cout << team1 << " vs. " << team2 << endl; } } 

Edit: The weird thing is that the same code (the only changed things are the source and the delimiters) works for another text input file (same procedure: download with curl, read with ifstream). Parsing and outputting a line like the following is no problem:

<span id="...">Fernwärme Vienna</span> 
4
  • Once you know what the encoding of the input is, some of the examples at cppreference may help, e.g. here Commented Jul 23, 2012 at 8:25
  • possible duplicate of does (w)ifstream support different encodings Commented Jul 23, 2012 at 8:26
  • I just edited and extended my question. I don't understand why the (nearly) same code is working with another input. Commented Jul 23, 2012 at 8:44
  • Usually std::cout << "öäü" << std::endl; also does not work. Commented Sep 7, 2021 at 11:50

1 Answer 1

2

What's the locale embedded in fin? In the code you show, it would be the global locale, which if you haven't reset it, is "C".

If you're anywhere outside the Anglo-Saxon world—and the strings you show suggest that you are— one of the first things you do in main should be

std::locale::global( std::locale( "" ) ); 

This sets the global locale (and thus the default locale for any streams opened later) to the locale being using in the surrounding environment. (Formally, to an implementation defined native environment, but in practice, to whatever the user is using.) In "C" locale, the encoding is almost always ASCII; ASCII doesn't recognize Umlauts, and according to the standard, illegal encodings in input should be replaces with an implementation defined character (IIRC—it's been some time since I've actually reread this section). In output, of course, you're not supposed to have any unknown characters, so the implementation doesn't check for them, and the go through.

Since std::cin, etc. are opened before you have a chance to set the global locale, you'll have to imbue them with std::locale( "" ) specifically.

If this doesn't work, you might have to find some specific locale to use.

Sign up to request clarification or add additional context in comments.

11 Comments

Figuring the encoding of HTML is non-trivial. (in the best case, finding a line like <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">) Using the users' locale is only a slightly better guess.
Unfortunately this did not help. Included std::locale::global( std::locale( "de_DE.UTF-8" ) ); as the first line in main but the output stays the same. Worth to mention that I am using an Amazon EC2 instance in the US to compile and run the code.
@mike: Is UTF-8 actually the input encoding? (It could be ISO-8859-1 or ISO-8859-15, or something completely different.) Is de_DE.UTF-8 actually supported on the system you're using?
found the following line in the html of the page that is not working for me: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">. Changed locale to std::locale::global( std::locale( "de_DE.iso88591" ) ); but the problem stays the same. No difference with std::locale::global( std::locale( "de_DE.iso885915@euro" ) ); either.
@MSalters If you're reading HTML, then the header should contain an indication of the encoding, and you can imbue the corresponding locale.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.