8

I need to read all attributes from a tеxt file that looks like the following for one Stern (engl.: Star) object. I need to replace the string "leer" with "" but there can also be a valid string which shouldn't be replaced with "".

I.e for another Stern object there could be "leer" instead of "Sol" as well.

Problem:
The problem is it doesn't replace the "leer" with the "". And it seems like it saves "leer\\r" in the object instead of only "leer" but I tried to replace "leer\\r" as well and it still doesn`t work.

This is one Stern in the text file that should be read:

0 Sol 0.000005 0.000000 0.000000 leer 1 0 

And this is my operator >> to read it:

istream& operator>>(istream& is, Stern& obj) { string dummy; is >> obj.m_ID; getline(is, dummy); getline(is, obj.m_Bez); if (obj.m_Bez == "leer") obj.m_Bez = ""; is >> obj.m_xKoord >> obj.m_yKoord >> obj.m_zKoord; getline(is,dummy); getline(is,obj.m_Sternbild); if (obj.m_Sternbild == "leer") obj.m_Sternbild = ""; is >> obj.m_Index >> obj.m_PrimID; return is; } 

Stern.h:

#ifndef STERN_H #define STERN_H #include <string> #include <iostream> using namespace std; class Stern { public: Stern(); // 2.a) //Stern(int m_ID, string m_Bez, float m_xKoord, float m_yKoord, float m_zKoord, string m_Sternbild, int m_Index, int m_PrimID); virtual ~Stern(); void print() const; // 1.b) friend ostream& operator<<(ostream& os, const Stern& obj); // 1.b)i. friend istream& operator>>(istream& is, Stern& obj); private: int m_ID; string m_Bez; float m_xKoord; float m_yKoord; float m_zKoord; string m_Sternbild; int m_Index; int m_PrimID; }; #endif /* STERN_H */ 
28
  • 1
    And what is the problem with the code you show? Commented Aug 30, 2017 at 9:08
  • The problem is it doesn't replace the "leer" with the "" And it seems like it saves "leer\\r" in the object instead of only "leer" but I tried to replace "leer\\r" as well and it still doesn`t work. Commented Aug 30, 2017 at 9:09
  • 1
    If the input is the same as the one from your description then I guess it's because of the whitespace before the "leer" word? Don't forget you are using getline(is, obj.m_Bez); and that doesn't remove the whitespace. Try triming the string first then check for equality. Commented Aug 30, 2017 at 9:11
  • The exact values saved are: m_ID: 0 m_Bez: "Sol\\r" m_xKoord:4.99999987e-06 Commented Aug 30, 2017 at 9:13
  • And if you step through the code line by line in a debugger, what do you notice then? Are the values you read the correct ones, the ones you expect? Commented Aug 30, 2017 at 9:14

3 Answers 3

3

You could use this to remove any unwanted characters returned by std::getline.

// std::string s; // std::getline(input, s); s.erase(std::remove(s.begin(), s.end(), '\r' ), s.end()); s.erase(std::remove(s.begin(), s.end(), '\n' ), s.end()); 

This works on Linux systems where the input file is formatted with line endings CRLF. This is because, on Linux systems, std::getline is searching for the \n character, hence it returns an extra \r at the end of each line.

I would not expect this to work exactly as you might anticipate on other systems. For example, it might be the case that:

  • On OS X, getline probably searches for \r, meaning subsequent calls return a string which starts with \n. (The above will probably still work, because you still erase the \n.
  • On Windows, getline searches for \r\n. If a file was produced on OS X or Linux, I would assume getline fails to split the input into different lines, and just returns the entire input.
  • I'm not 100% sure about the above two points and haven't tested either case, because I don't happen to have an OS X system available, or a Windows system setup for development work.
Sign up to request clarification or add additional context in comments.

6 Comments

With C++20: std::erase(s, '\r'); std::erase(s, '\n');
I'm using Visual Studio 2022 in Windows and "getline" is returning "\r". "getline" should have an option for choosing multiple line endings (strings and characters). I landed on this page hoping someone had made such a function. Guess I'm out of luck... 😒
@andrepacheco you could use the above two lines and wrap them in a function to do that
A big correction: I think I'm getting the "\r" because I created the stream in binary mode, I think text mode should work for same-OS text files. Anyway, I think I've found my wanted function in the selected answer of https://stackoverflow.com/questions/6089231/getting-std-ifstream-to-handle-lf-cr-and-crlf. Haven't tried it though...
@FreelanceConsultant Thanks for your answer but that would not have the performance I am looking for. 🙂
|
2

The problem is that in Windows a newline is represented as CR + LF which is: "\r\n" and in Unix it is LF which is just "\n".
Your std::getline(...) command is reading till the "\n" in "leer\r\n" and discards the "\n", your resulting string will be:

"leer\r" 

To solve this problem and convert files between Unix/Windows there are the 2 tools dos2unix and unix2dos. The Ubuntu equivalents are fromdos and todos, you will need fromdos to convert your Windows text file to a Unix text file.

To test wether a file uses CR + LF or LF you can do:

dos2unix < myfile.txt | cmp -s - myfile.txt 

which was ansered here on the Unix & Linux StackExchange site.


And it seems like it saves "leer\\r" in the object instead of only "leer" but I tried to replace "leer\\r" as well and it still doesn`t work. I still cant understand why my if (obj.m_Sternbild == "leer\\r") didn`t work because imo it should have worked?

It should be:

if (obj.m_Sternbild == "leer\r") 

without escaping the backslash \, because \r is read into the string.

Edit:

As @FreelanceConsultant in the comment below write: The above answer is not a general solution. Because a binary compiled either on Windows or Unix should work for text files for both platforms.

There are two solutions for that.

The obvious one is, to compare against two different versions of the input. With std::getline the Windows result is "leer\r" and Unix result is "leer".

if (obj.m_Sternbild == "leer\r" || obj.m_Sternbild == "leer") 

Another solution would be to normalize the newline representation to one form and only check against that. It is a matter of taste and performance, because you would need to create new strings. See his answer as example.

2 Comments

This isn't really a solution. Some text files are created on Unix systems, some are created on Windows systems. Regardless of which platform a binary is compiled for, it should work with both text files created on Windows systems or Linux systems.
@FreelanceConsultant: Yes, you are right. It should work without any additional steps on the input file on Unix or Windows. To achieve that, it is needed to compare against both string versions (if (obj.m_Sternbild == "leer\r" || obj.m_Sternbild == "leer")) or to normalize it for the check in his own code without any program.
2

And it seems like it saves "leer\r" in the object instead of only "leer"

You can either trim the string you get from getline or use getline in combination with a stringstream :

 std::string line; getline(is,line); std::stringstream ss(line); std::string trimmed_string; ss >> trimmed_string; 

Now trimmed_string will contain only the desired string, no end line, trainling or leading whitespace or other stuff.

PS: this only works if the string you want to read does not contain whitespace itself. If thats the case you have to resort to a bit more involved massaging of the string you get from getline or choose some special character that you can replace with whitespaces after reading (eg read "Alpha_Centauri" and then replace "_" with " " to get "Alpha Centauri").

6 Comments

Yes the problem is that there are m_Bez (basically the name of the star) which look like this: "96 G. Psc" And I wouldnt be allowed to change the txt file in any way.
@CraigHarrison then unfortunately my answer does not help. Maybe I will edit it later....
Thanks for trying to help me. I really appreciate it!
Just note that std::getline() will read an entire line as-is up to the line break, whereas ss >> will skip leading whitespace and then read up to the first whitespace or end-of-string, whichever occurs first. So, ss >> is not just trimming if the line has any non-leading/trailing space in it before the line break. You would be chopping off actual data. Trimming involves scanning and removing only leading + trailing whitespace, not any whitespace in the middle.
@RemyLebeau thats what my PS is about. I was hoping that a simple solution can help and didnt have time yet to improve the answer
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.