6
\$\begingroup\$

I have started to practice programming by doing exercises and I want to get feed back to improve my self in nice clean coding and I found this website. The program I wrote now is a word transformation map.

It has a text file contains a pair of words in each line (transferor.txt), and an input text file(input.txt).


transferor.txt contains:

 'em them cuz because gratz grateful i I nah no pos supposed sez said tanx thanks wuz was 

and input.txt file contains:

 nah i sez tanx cuz i wuz pos to not cuz i wuz gratz 

The result is an output file (output.txt) containing:

 no I said thanks because I was supposed to not because I was grateful 

Here is the code:

#include <iostream> #include <map> #include <fstream> #include <string> #include <sstream> int main() { std::fstream fi; std::fstream fo; const char* transferor_dir = "/mnt/c/Users/meysa/source/repos/CPP_VIM_PRACTICE/ourplan/transferor.txt"; fi.open(transferor_dir,std::fstream::in); if(!fi.is_open()) std::cerr << "Problem in opening the transferor file" << std::endl; std::string key,mapped; std::map<std::string,std::string> maptransferor; while(fi >> key >> mapped ) { maptransferor.insert(std::make_pair(key,mapped)); } fi.close(); const char* input = "/mnt/c/Users/meysa/source/repos/CPP_VIM_PRACTICE/ourplan/input.txt"; const char* output = "/mnt/c/Users/meysa/source/repos/CPP_VIM_PRACTICE/ourplan/output.txt"; fi.open(input,std::fstream::in); fo.open(output,std::fstream::out); if(!fi.is_open()) std::cerr << "There is a problem in opening the input file! " << std::endl; std::string line; std::string word; while(std::getline(fi,line)) { std::stringstream ss(line,std::ios_base::in); while( ss >> word) { if(maptransferor.count(word)) fo << (maptransferor.find(word))->second << " "; else fo << word << " "; } fo << std::endl; } fi.close(); fo.close(); return 0; } 
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Use more functions, use exceptions rather then fi.is_open() and have a look std::unordered_map it will have faster lookup then map. And to add key/value to the map use emplace_back({key,mapped}) it will avoid some unecessary temporaries. \$\endgroup\$ Commented Dec 19, 2021 at 9:32
  • 1
    \$\begingroup\$ This is a really good learning/practice project, and your swing at it is also quite good. I’m not going to review, because the reviews you already have are perfect. However, I will suggest the next stage of this learning challenge: try making it so that instead of single words, multiple words can be replaced. For example, “alot” could be replaced with “a lot” (easy), and “non existent” could be replaced with “nonexistent” (very hard, especially if words can be broken across lines (“non\nexistent”) or with multiple spaces, and if there are common first words). \$\endgroup\$ Commented Dec 19, 2021 at 22:57

5 Answers 5

6
\$\begingroup\$

Your code is currently tied to very specific pathnames, and would need to be recompiled to work with a different mapping dictionary, or different input and output files.

I recommend using int main(int argc, char **argv) so that you can use a command argument to specify the mapping file, and once that's successfully read, then the program can operate as a filter (take input from std::cin and write output to std::cout).

You would then invoke the program as

translate transferor.txt <input.txt >output.txt 

It's also a good idea to divide the program into separately testable functions. I'd start with

std::map<std::string,std::string> make_mapping(std::istream& input); 

and

void map_text(const std::map<std::string,std::string>& mapping, std::istream& input, std::ostream& output); 

I'm pleased that you check that the files can be opened:

if(!fi.is_open()) std::cerr << "There is a problem in opening the input file! " << std::endl; 

However, if the opening failed, then there's really no point reading from it (or even opening an output file):

fi.open(input); if (!fi.is_open()) { std::cerr << "There is a problem in opening the input file!\n"; return EXIT_FAILURE; } fo.open(output, std::ofstream::trunc); 

We do two lookups in the map here:

 if(maptransferor.count(word)) fo << (maptransferor.find(word))->second << " "; else 

We can remember the result of find() and use it like this:

 auto tr = maptransferor.find(word); if (tr != maptransferor.end()) { fo << it->second << " "; } else { 

We should be checking the return value from the close() operation on each stream - several kinds of failure make themselves apparent that way.

(If we wanted to ignore close() failures, we could just omit those calls because the streams' destructors will do exactly that.)

\$\endgroup\$
0
6
\$\begingroup\$

In addition to Toby Speight's answer:

Use '\n' instead of std::endl

Prefer using '\n' instead of std::endl; the latter is equivalent to the former, but also forces the output to be flushed, which is usually not necessary and will lower performance.

Missing error checking

You check whether a file has succesfully been opened, but errors can happen at any point during I/O operations. I suggest that you do the following:

  • For every input stream, after you reached the end, check that fi.eof() is true. If it is not, then an error occured before you reached the end.
  • For every output stream, call fo.flush() and check that the result of that is true. For every file output stream, call fo.close() and check that the result of that is true.

Use std::ifstream and/or std::ofstream instead of std::fstream

Rarely do you need to open a file for both reading and writing. Use the input and output-specific file stream objects when you know that a file only has to either be read from or written to. This will catch errors if you accidentily do write to an input stream or read from an output stream.

\$\endgroup\$
1
  • \$\begingroup\$ Very help full, Thank you. Yes you are write '\n' is shorter and do kind of the same in my case. \$\endgroup\$ Commented Dec 19, 2021 at 15:08
5
\$\begingroup\$

Here are some things that may help you improve your program.

Combine declaration and initialization where practical

The code currently contains separate lines to declare the various fstreams and then to open the file. Better would be to initialize variables when they are declared where practical. See ES.20.

Consider the user

Instead of having a hardcoded filename, it might be nice to allow the user to control the name and location of the input and output files. For this, it would make sense to use a command line argument and then pass the filename to the functions as needed.

Prefer specialized objects

Instead of std::fstream, a more specialized std::ifstream could be used. This has the advantage of making it clear to readers of the code which files are input files and which are outputs.

Decompose your program into functions

All of the logic here is in main but it would be better to decompose this into separate functions. There are many ways to do this, but for example, see the next suggestion.

Consider separating I/O from the algorithm

In this program, there are two separate things that are happening. The first thing is to create the map for use in translation, and then the second thing is to process words as they are read using that map. I'd suggest creating two functions for that, and then use them in main.

Don't use std::endl if you don't really need it

The difference betweeen std::endl and '\n' is that '\n' just emits a newline character, while std::endl actually flushes the stream. This can be time-consuming in a program with a lot of I/O and is rarely actually needed. It's best to only use std::endl when you have some good reason to flush the stream and it's not very often needed for simple programs such as this one. Avoiding the habit of using std::endl when '\n' will do will pay dividends in the future as you write more complex programs with more I/O and where performance needs to be maximized.

Use efficient data structures

The std::map that you're using works, but it provides logarithmic complexity for lookup. There's no real need to have it sorted, so you could use std::unordered_map instead for a more efficient program with no real work on your part.

Avoid doing the same operation twice

The algorithm currently uses count and then find with the same key. Better would be to do the operation once instead. I'd suggest using find once and then comparing the returned value to the .end() iterator.

Use a more modern style for inserting objects

The code currently contains this line:

maptransferor.insert(std::make_pair(key,mapped)); 

There is nothing wrong with that usage, but one can make it slightly clearer, in my opinion, by letting the compiler implicitly create the pair:

maptransferor.insert({key,mapped}); 

Consider using the program as a pipe

A common operation for text processing programs like this one, is to chain them together so that the output of one file becomes the input to the next. This is particularly common in Linux, but is well supported on Windows and Mac as well. To do this, the program need only take its input from std::cin and write its output to std::cout.

Results

Here is a rewrite incorporating these ideas:

#include <fstream> #include <iostream> #include <sstream> #include <string> #include <unordered_map> std::unordered_map<std::string,std::string> createDict(std::istream& dictfile) { std::string key,mapped; std::unordered_map<std::string,std::string> dict; while(dictfile >> key >> mapped) { dict.insert({key,mapped}); } return dict; } std::ostream& translate(std::istream& in, std::ostream& out, const std::unordered_map<std::string, std::string>& dict) { std::string line; while (std::getline(in, line)) { std::string word; std::stringstream liness{line}; while (liness >> word) { auto dictEntry{dict.find(word)}; if (dictEntry != dict.end()) { out << dictEntry->second << ' '; } else { out << word << ' '; } } out << '\n'; } return out; } int main(int argc, char *argv[]) { if (argc != 2) { std::cerr << "Usage: translate dictfile\n"; return 1; } std::ifstream dictfile{argv[1]}; if(!dictfile) { std::cerr << "Error: could not open the dictionary file" << argv[1] << "\n"; return 2; } auto dict{createDict(dictfile)}; dictfile.close(); translate(std::cin, std::cout, dict); } 
\$\endgroup\$
10
  • 1
    \$\begingroup\$ Oh my GOSH, I love this website, I will try to implement all your comments. This is really wonderful.(Sorry, I am a bit excited because I've never had feed back to improve my coding style), and it's very helpful to have your idea and feedbacks \$\endgroup\$ Commented Dec 19, 2021 at 17:51
  • \$\begingroup\$ The dictFile.close() line is redundant here. The ifstream destructor will close it at the end of main(). \$\endgroup\$ Commented Dec 19, 2021 at 19:26
  • \$\begingroup\$ @MarkH it is not required but it is not redundant. It was a choice to make sure that the file handle was released before calling translate. Since file handles are typically a finite resource, it is sensible to release them as soon as practical. \$\endgroup\$ Commented Dec 19, 2021 at 19:29
  • \$\begingroup\$ I completely missed the use of std::fstream rather than specific in and out streams (perhaps because I see what I expect to see). Thanks for commenting on that one! BTW, ignoring errors from dictfile.close() is probably not such a good choice (if it was, you'd omit it entirely, and let the destructor close the file). \$\endgroup\$ Commented Dec 19, 2021 at 20:02
  • \$\begingroup\$ @TobySpeight I didn't read the other reviews until I had already completed mine. I think you and G. Sliepen covered about everything. As for ignoring any error from dictfile.close() it's a void function that has no return value, so I'm not sure what you mean. \$\endgroup\$ Commented Dec 19, 2021 at 20:16
2
\$\begingroup\$

Use functions for all your steps it makes code more readable and you can add error handling better.

  • pass arguments by reference/const reference.
  • use exceptions for your benefit (this works best with RAII,exception safe programming techniques, may not work so well on embedded systems)
  • about std::endl, is "\n" + also force a flush of the stream. (lost performance mostly).
  • unordered_map is faster in lookup

#include <stdexcept> #include <string_view> #include <iostream> #include <fstream> #include <sstream> #include <string> #include <unordered_map> // simulated file content for demonstation purposes std::istringstream file_content { "'em them\n" "cuz because\n" "gratz grateful\n" "i I\n" "nah no\n" "pos supposed\n" "sez said\n" "tanx thanks\n" "wuz was\n" }; // load the map from any input stream, not just a file // later in your development as engineer you will find // this can be very useful in unit test (since you can // use both files and stringstreams as this example // shows you) auto load_dictionary(std::istream& is) { std::unordered_map<std::string, std::string> dictionary; std::string key; std::string value; while (is >> key >> value) dictionary.insert({ key,value }); return dictionary; } // convert a string using a dictionary // pass dictionary by const reference since it should not be copied and not be modified. auto convert(const std::unordered_map<std::string,std::string> dictionary, const std::string& input) { // change the input string to a stringstream // this makes it easy to pick out the words. std::istringstream is{ input }; // build up the output string in a stringstream // its more efficient then concatenating std:string std::ostringstream os; std::string word; bool space = false; while (is >> word) { if ( space ) os << " "; auto it = dictionary.find(word); os << ((it != dictionary.end()) ? it->second : word); space = true; } return os.str(); } void save(const std::string& output) { try { std::ofstream ofile{ "output.txt" }; // if opening file failed an exception will be thrown ofile << output << std::endl; } catch (const std::exception& e) { std::cout << "writing output file failed : " << e.what() << "\n"; } } int main(int argc, char** argv) { try { // std::ifstream file{ argv[1] }; // todo input error handling // auto dictionary = load_from_stream(file); auto dictionary = load_dictionary(file_content); auto output = convert(dictionary, "nah i sez tanx cuz i wuz pos to. not cuz i wuz gratz"); std::cout << output << "\n"; save(output); // todo use a commandline parameter for output file name as well // no need to close your files manually // files will be closed automatically when they go out of scope return 0; } catch (const std::exception& e) { std::cout << "something unexpected went wrong : " << e.what() << "\n"; } } 
\$\endgroup\$
1
\$\begingroup\$

This is a nice little program. There are a few tweaks I’d make, some of them optimizations and others stylistic changes. Unusually for this site, people are rewriting your code themselves, so I’ll do the same. Then, I’ll explain the changes.

#include <cstdlib> #include <fstream> #include <iostream> #include <string> #include <sstream> #include <unordered_map> #include <utility> /* Many people prefer to use std:: consistently. */ using std::cerr; using std::cin; using std::cout; using std::endl; using std::exit; using std::getline; /* This boilerplate would really go in a separate header file. */ #define fatal_error(s) fatal_error_handler( __FILE__, __LINE__, (s) ) [[noreturn]] static void fatal_error_handler( const std::string_view file, const int line, const std::string_view msg ) { cout.flush(); // Don't mix the streams! cerr << msg << " (at " << file << ':' << line << ")\n"; exit(EXIT_FAILURE); } // We’ve already changed the type of this data structure once. using WordMap = std::unordered_map< std::string, std::string >; WordMap read_word_map( const char* const filename ) /* Reads a mapping of word pairs from the specified file into a hash map. * FIXME: Does not validate the syntax. */ { std::ifstream fi(filename); // atd::ios_base::in is the default mode. if(!fi.is_open()) { // This error is not recoverable. fatal_error("Problem in opening the dictionary file"); } WordMap word_map; // Will be returned by guaranteed copy elision. std::string key, mapped; while( fi >> key >> mapped ) { // Use the move constructor to construct in-place without copying. word_map.emplace( std::move(key), std::move(mapped) ); } return word_map; // fi is closed automatically on return, by RAII. } const std::string& lookup_word( const WordMap& dictionary, const std::string& word ) /* Returns the dictionary substitution for the word if one exists, or else * the unchanged word. */ { WordMap::const_iterator it = dictionary.find(word); return ( it == dictionary.end() ) ? word : it->second; } std::string map_line( const WordMap& dictionary, std::string&& line ) /* Returns a string with all words in line that are keys in the dictionary * replaced with their corresponding values. It replaces all whitespace * other than newlines with a single space. */ { std::stringstream input( std::move(line), std::stringstream::in ); /* The line string object, having been moved to the stream buffer, is now * invalid and cannot be used. This saves us from copying it. */ std::string output; // Will be returned by guaranteed copy elision. std::string word; // A temporary buffer. /* The first time this algorithm runs, it should not output a leading * space. */ if ( input >> word ) { output += lookup_word( dictionary, word ); } while ( input >> word ) { output += ' '; output += lookup_word( dictionary, word ); } return output; } int main( const int argc, const char* const argv[] ) { constexpr char default_map_path[] = "./transferor.txt"; const char* const transferor_path = (argc > 1) ? argv[1] : default_map_path; const WordMap dictionary = read_word_map(transferor_path); std::string line; while(getline( cin, line, '\n' )) { cout << map_line( dictionary, std::move(line) ) << endl; } return EXIT_SUCCESS; } 

I Made This a Pipe

Hardcoding paths like you did isn’t ideal, because the code will break when you (or your collaborators, or your instructor) compile it on a different machine. I had to change it anyway, so I followed the suggestion of other answers and made the dictionary file the first input argument, defaulting to transferor.txt in the current working directory.

I’ve heard computer science instructors say that most of their new students are used to searching for the files they need instead of working in a “current directory,” and often don’t find it intuitive at all that a program behaves differently when it’s run in some other working directory. I used an approach that makes sense to me, and to the people who wrote the API you’re using. But I grew up typing in line numbers on my 8-bit computer. An approach that might make more sense to you—and better matches how real-world apps these days are written—might be to set the configuration directory based on OS-specific environment variables, such as a subdirectory of %LOCALAPPDATA% on Windows, or $HOME/.local/share/ on Linux.

Abort When an Error is Not Recoverable

In your original program, you check that your input file was successfully opened. If not, you print an error message to cerr. This is a great idea. However, after you do that, you keep running the program, even though it cannot recover or do anything useful.

I added some boilerplate I often use (which in the real world would actually go into a header file) to abort with an error message that includes the line of source that threw the error. Another variation of this I often use saves the value of errno and includes the system error message from perror together with the line of source.

Although this uses STL types, it’s very C-style. A good alternative, especially when you’d like to be able to only ever return valid results, is to throw an exception on any error. You can then catch it if the error is recoverable, or let it crash the program if it isn’t.

Consider Using Type Aliases

I followed the excellent advice further up the thread to use a hash map instead of an ordered search tree. While I was at it, I added the type alias

using WordMap = std::unordered_map< std::string, std::string >; 

Thereafter, I always refer to Wordmap, WordMap::const_iterator, and so on. If we ever wanted to switch to another data structure, we could update the definition of WordMap in one place, and the code would still run. If we ever want to change it to hold wstring instead of string—that actually would still be a hassle, because I didn’t change all the instances of std::string to the key and value types of WordMap, but at least we’d change the definition in one place and then get a big list of all the instances of std::string we need to update.

Know When to Use and Not to Use endl

I am going to respectfully dissent in part from the common advice here to use \n instead of endl. Changing this program to pipe and transform standard input made it run in interactive mode. In interactive mode, you do want to flush your output after you process each input line, so the user typing at the keyboard sees the results and can type in the next line of input. If you do not, your program might hang in interactive mode. If you are writing to a file instead of interacting with a user, a flush after you complete your update makes it much less likely that you have saved a partial write in an inconsistent state to disk.

Also, the savings from not flushing are usually negligible to nonexistent. In some cases they can add up. Some early Linux versions of Firefox had problems because they flushed too often. But this is one of those things I would only worry about if profiling shows the app spends a lot of time flushing its output streams unnecessarily.

Always Add Braces to Control Structures

There have been some famous bugs that look like

if (weCheckedSee) doSomethingConditionally(); thisWasNotSupposedToRunUnconditionally(oopsMyBad); 

This gets harder and harder to eyeball for correctness the more levels of control structures you nest. If I’m writing nested control structures, I will typically annotate with comments like \\ end if-else or \\ end column loop

A good compiler nowadays might at least give you a warning when this happens, but you can’t count on it. I’m of the school that thinks, it’s better to make this a coding-style violation and make someone take a look at it anyway.

Refactor into Helper Functions

You can see in the code sample above that, in my refactoring above, no function has more than a single level of indentation. Anything more complicated than that was moved to a helper function.

Some coding guidelines do recommend that you always do this. it’s not something I get religious about. However, I think it makes a lot of sense here. Not only does the logic of the program become much easier to follow, it greatly facilitates several of the steps after this.

Use Static Single Assignments

Having variables that aren’t const is a major bug attractor. It’s hard to analyze complicated code that branches and prove that every variable is initialized correctly before it is used, and never modified spuriously. Back in the last century, C didn’t allow you to declare your variables any other way, but that is a thing of the past.

You make the program much easier to debug when you initialize your variables once, and only once, when they are declared, and never modify them again.

This is often going to make you move initialization into a helper function that returns an object. In my version, you can see two examples of this in the lines

constexpr char default_map_path[] = "./transferor.txt"; const char* const transferor_path = (argc > 1) ? argv[1] : default_map_path; const WordMap dictionary = read_word_map(transferor_path); 

These helper functions, which are used to initialize static single assignments, are called phi functions. My preferred style of using ternary ? : expressions for short conditionals doesn’t seem to have caught on widely, but it makes a lot of sense to me.

There are several advantages to moving the initialization of the dictionary into a helper function, read_word_map. One is that, although the dictionary obviously has to be modifiable when it is created, it would be a bug to modify it afterward, and now the compiler will detect it. Another is that you were manually closing your input file when you were done with it. By creating it as a local temporary ifstream object within read_map_file, the file will now be closed automatically, when that function returns and the ifstream destructor cleans up the object.

You’ll notice there’s one major category of variables I couldn’t make into static single assignments, and that was loop control variables. For example,

std::string line; while(getline( cin, line, '\n' )) { // ... 

In many cases, I refactor loops into tail-recursive functions, which are just as efficient. It’s often possible to use a new-style for or while loop to rewrite the loop-control variable as a static single assignment that expires at the end of each iteration of the loop, such as

for ( const auto& x : a_container ) 

or

while ( const auto it = a_map.find(key); it != a_map.end() ) 

The cin >> and getline syntax were not written to be used with this style, though, so they’re exceptions.

Use RIIA for Resource Management

Or maybe you’d rather call it RAII. Manually closing files and memory is s huge source of bugs and leaks. It’s so hard to guarantee that everything that should be cleaned up does get cleaned up on every possible branch of control flow that the Linux kernel and other C projects now make every single function have only a single exit point. With RIIA, the compiler does this for you, regardless of how you exit the function.

This will often require you to declare smart pointers for dynamically-allocated objects, and to create smaller scopes in which to declare your temporary resources.

Eliminate Unnecessary Copies

One of the most expensive things you can do is make a deep copy of a string, vector or array. In this refactoring, you can see several different techniques to optimize copies out.

  • fatal_error_handler takes const string_view arguments, which are much more efficient to copy than std_string, and know their own length, unlike const char*. Most of the time, this is how you should pass your input strings.
  • Both read_word_map and lookup_word pass their arguments by const reference, not by copying them. In the case of read_map, this is because the filename is a dummy parameter that is passed between two older APIs that both use C-style strings. In the case of lookup_word, it is because one possible path through the function returns the word parameter by const reference. Both of these could have been modified to use string_view.
  • Both read_word_map and map_line return their arguments by guaranteed copy elision. When you return a local object declared inside the function, the compiler is smart enough to realize that the local variable is an alias for the return object, and optimize it away to create the return value in place.
  • Objects returned from functions can be moved instead of copied, although I don’t think this actually enables any optimizations here.
  • The lookup_word function returns its value by const reference. Careful, though! If you do this, you have to be sure a dangling reference will not outlive the object it aliases! Here, we know that it won’t, because one possible return value is passed in from the caller and another is stored in the dictionary, and either will be used immediately.
  • The map_line function and the parse-line function both move instead of copy their data. You will notice that the argument to parse_line is, other than the temporary strings I mentioned before, the only variable in the program not declared const. This is because the buffer of the string is moved to become the buffer of the stringstream, a trivial operation, rather than making a deep copy of its data. There’s also a gotcha here: moving from an object leaves it in an unspecified state, so we cannot use the original variable again until it is reinitialized.

You can also see that I replaced the line

maptransferor.insert(std::make_pair(key,mapped)); 

with

word_map.emplace( std::move(key), std::move(mapped) ); 

This constructs the hash-table entry in place, and importantly, moves the key and mapped strings to their new locations, without copying them. Again, this leaves both the input variables in an unusual state, until they are reinitialized at the start of the loop.

Little Personal Quirks

I mention these cosmetic tweaks mainly so nobody thinks they’re important to copy. These include returning RXIT_SUCCESS instead of 0 and dropping stl:: in front of some identifiers that I was using before the STL existed and are never going to make anyone wonder, “Is this the standard library version of cout or some custom class?” the way vector or especially string would.

These are just matters of personal coding style, so do what works for you.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.