This is a nice little program. There are a few tweaks I’d make, some of them optimizations and others stylistic changes. Unusually for this site, people are rewriting your code themselves, so I’ll do the same. Then, I’ll explain the changes.
#include <cstdlib> #include <fstream> #include <iostream> #include <string> #include <sstream> #include <unordered_map> #include <utility> /* Many people prefer to use std:: consistently. */ using std::cerr; using std::cin; using std::cout; using std::endl; using std::exit; using std::getline; /* This boilerplate would really go in a separate header file. */ #define fatal_error(s) fatal_error_handler( __FILE__, __LINE__, (s) ) [[noreturn]] static void fatal_error_handler( const std::string_view file, const int line, const std::string_view msg ) { cout.flush(); // Don't mix the streams! cerr << msg << " (at " << file << ':' << line << ")\n"; exit(EXIT_FAILURE); } // We’ve already changed the type of this data structure once. using WordMap = std::unordered_map< std::string, std::string >; WordMap read_word_map( const char* const filename ) /* Reads a mapping of word pairs from the specified file into a hash map. * FIXME: Does not validate the syntax. */ { std::ifstream fi(filename); // atd::ios_base::in is the default mode. if(!fi.is_open()) { // This error is not recoverable. fatal_error("Problem in opening the dictionary file"); } WordMap word_map; // Will be returned by guaranteed copy elision. std::string key, mapped; while( fi >> key >> mapped ) { // Use the move constructor to construct in-place without copying. word_map.emplace( std::move(key), std::move(mapped) ); } return word_map; // fi is closed automatically on return, by RAII. } const std::string& lookup_word( const WordMap& dictionary, const std::string& word ) /* Returns the dictionary substitution for the word if one exists, or else * the unchanged word. */ { WordMap::const_iterator it = dictionary.find(word); return ( it == dictionary.end() ) ? word : it->second; } std::string map_line( const WordMap& dictionary, std::string&& line ) /* Returns a string with all words in line that are keys in the dictionary * replaced with their corresponding values. It replaces all whitespace * other than newlines with a single space. */ { std::stringstream input( std::move(line), std::stringstream::in ); /* The line string object, having been moved to the stream buffer, is now * invalid and cannot be used. This saves us from copying it. */ std::string output; // Will be returned by guaranteed copy elision. std::string word; // A temporary buffer. /* The first time this algorithm runs, it should not output a leading * space. */ if ( input >> word ) { output += lookup_word( dictionary, word ); } while ( input >> word ) { output += ' '; output += lookup_word( dictionary, word ); } return output; } int main( const int argc, const char* const argv[] ) { constexpr char default_map_path[] = "./transferor.txt"; const char* const transferor_path = (argc > 1) ? argv[1] : default_map_path; const WordMap dictionary = read_word_map(transferor_path); std::string line; while(getline( cin, line, '\n' )) { cout << map_line( dictionary, std::move(line) ) << endl; } return EXIT_SUCCESS; }
I Made This a Pipe
Hardcoding paths like you did isn’t ideal, because the code will break when you (or your collaborators, or your instructor) compile it on a different machine. I had to change it anyway, so I followed the suggestion of other answers and made the dictionary file the first input argument, defaulting to transferor.txt in the current working directory.
I’ve heard computer science instructors say that most of their new students are used to searching for the files they need instead of working in a “current directory,” and often don’t find it intuitive at all that a program behaves differently when it’s run in some other working directory. I used an approach that makes sense to me, and to the people who wrote the API you’re using. But I grew up typing in line numbers on my 8-bit computer. An approach that might make more sense to you—and better matches how real-world apps these days are written—might be to set the configuration directory based on OS-specific environment variables, such as a subdirectory of %LOCALAPPDATA% on Windows, or $HOME/.local/share/ on Linux.
Abort When an Error is Not Recoverable
In your original program, you check that your input file was successfully opened. If not, you print an error message to cerr. This is a great idea. However, after you do that, you keep running the program, even though it cannot recover or do anything useful.
I added some boilerplate I often use (which in the real world would actually go into a header file) to abort with an error message that includes the line of source that threw the error. Another variation of this I often use saves the value of errno and includes the system error message from perror together with the line of source.
Although this uses STL types, it’s very C-style. A good alternative, especially when you’d like to be able to only ever return valid results, is to throw an exception on any error. You can then catch it if the error is recoverable, or let it crash the program if it isn’t.
Consider Using Type Aliases
I followed the excellent advice further up the thread to use a hash map instead of an ordered search tree. While I was at it, I added the type alias
using WordMap = std::unordered_map< std::string, std::string >;
Thereafter, I always refer to Wordmap, WordMap::const_iterator, and so on. If we ever wanted to switch to another data structure, we could update the definition of WordMap in one place, and the code would still run. If we ever want to change it to hold wstring instead of string—that actually would still be a hassle, because I didn’t change all the instances of std::string to the key and value types of WordMap, but at least we’d change the definition in one place and then get a big list of all the instances of std::string we need to update.
Know When to Use and Not to Use endl
I am going to respectfully dissent in part from the common advice here to use \n instead of endl. Changing this program to pipe and transform standard input made it run in interactive mode. In interactive mode, you do want to flush your output after you process each input line, so the user typing at the keyboard sees the results and can type in the next line of input. If you do not, your program might hang in interactive mode. If you are writing to a file instead of interacting with a user, a flush after you complete your update makes it much less likely that you have saved a partial write in an inconsistent state to disk.
Also, the savings from not flushing are usually negligible to nonexistent. In some cases they can add up. Some early Linux versions of Firefox had problems because they flushed too often. But this is one of those things I would only worry about if profiling shows the app spends a lot of time flushing its output streams unnecessarily.
Always Add Braces to Control Structures
There have been some famous bugs that look like
if (weCheckedSee) doSomethingConditionally(); thisWasNotSupposedToRunUnconditionally(oopsMyBad);
This gets harder and harder to eyeball for correctness the more levels of control structures you nest. If I’m writing nested control structures, I will typically annotate with comments like \\ end if-else or \\ end column loop
A good compiler nowadays might at least give you a warning when this happens, but you can’t count on it. I’m of the school that thinks, it’s better to make this a coding-style violation and make someone take a look at it anyway.
Refactor into Helper Functions
You can see in the code sample above that, in my refactoring above, no function has more than a single level of indentation. Anything more complicated than that was moved to a helper function.
Some coding guidelines do recommend that you always do this. it’s not something I get religious about. However, I think it makes a lot of sense here. Not only does the logic of the program become much easier to follow, it greatly facilitates several of the steps after this.
Use Static Single Assignments
Having variables that aren’t const is a major bug attractor. It’s hard to analyze complicated code that branches and prove that every variable is initialized correctly before it is used, and never modified spuriously. Back in the last century, C didn’t allow you to declare your variables any other way, but that is a thing of the past.
You make the program much easier to debug when you initialize your variables once, and only once, when they are declared, and never modify them again.
This is often going to make you move initialization into a helper function that returns an object. In my version, you can see two examples of this in the lines
constexpr char default_map_path[] = "./transferor.txt"; const char* const transferor_path = (argc > 1) ? argv[1] : default_map_path; const WordMap dictionary = read_word_map(transferor_path);
These helper functions, which are used to initialize static single assignments, are called phi functions. My preferred style of using ternary ? : expressions for short conditionals doesn’t seem to have caught on widely, but it makes a lot of sense to me.
There are several advantages to moving the initialization of the dictionary into a helper function, read_word_map. One is that, although the dictionary obviously has to be modifiable when it is created, it would be a bug to modify it afterward, and now the compiler will detect it. Another is that you were manually closing your input file when you were done with it. By creating it as a local temporary ifstream object within read_map_file, the file will now be closed automatically, when that function returns and the ifstream destructor cleans up the object.
You’ll notice there’s one major category of variables I couldn’t make into static single assignments, and that was loop control variables. For example,
std::string line; while(getline( cin, line, '\n' )) { // ...
In many cases, I refactor loops into tail-recursive functions, which are just as efficient. It’s often possible to use a new-style for or while loop to rewrite the loop-control variable as a static single assignment that expires at the end of each iteration of the loop, such as
for ( const auto& x : a_container )
or
while ( const auto it = a_map.find(key); it != a_map.end() )
The cin >> and getline syntax were not written to be used with this style, though, so they’re exceptions.
Use RIIA for Resource Management
Or maybe you’d rather call it RAII. Manually closing files and memory is s huge source of bugs and leaks. It’s so hard to guarantee that everything that should be cleaned up does get cleaned up on every possible branch of control flow that the Linux kernel and other C projects now make every single function have only a single exit point. With RIIA, the compiler does this for you, regardless of how you exit the function.
This will often require you to declare smart pointers for dynamically-allocated objects, and to create smaller scopes in which to declare your temporary resources.
Eliminate Unnecessary Copies
One of the most expensive things you can do is make a deep copy of a string, vector or array. In this refactoring, you can see several different techniques to optimize copies out.
fatal_error_handler takes const string_view arguments, which are much more efficient to copy than std_string, and know their own length, unlike const char*. Most of the time, this is how you should pass your input strings. - Both
read_word_map and lookup_word pass their arguments by const reference, not by copying them. In the case of read_map, this is because the filename is a dummy parameter that is passed between two older APIs that both use C-style strings. In the case of lookup_word, it is because one possible path through the function returns the word parameter by const reference. Both of these could have been modified to use string_view. - Both
read_word_map and map_line return their arguments by guaranteed copy elision. When you return a local object declared inside the function, the compiler is smart enough to realize that the local variable is an alias for the return object, and optimize it away to create the return value in place. - Objects returned from functions can be moved instead of copied, although I don’t think this actually enables any optimizations here.
- The
lookup_word function returns its value by const reference. Careful, though! If you do this, you have to be sure a dangling reference will not outlive the object it aliases! Here, we know that it won’t, because one possible return value is passed in from the caller and another is stored in the dictionary, and either will be used immediately. - The
map_line function and the parse-line function both move instead of copy their data. You will notice that the argument to parse_line is, other than the temporary strings I mentioned before, the only variable in the program not declared const. This is because the buffer of the string is moved to become the buffer of the stringstream, a trivial operation, rather than making a deep copy of its data. There’s also a gotcha here: moving from an object leaves it in an unspecified state, so we cannot use the original variable again until it is reinitialized.
You can also see that I replaced the line
maptransferor.insert(std::make_pair(key,mapped));
with
word_map.emplace( std::move(key), std::move(mapped) );
This constructs the hash-table entry in place, and importantly, moves the key and mapped strings to their new locations, without copying them. Again, this leaves both the input variables in an unusual state, until they are reinitialized at the start of the loop.
Little Personal Quirks
I mention these cosmetic tweaks mainly so nobody thinks they’re important to copy. These include returning RXIT_SUCCESS instead of 0 and dropping stl:: in front of some identifiers that I was using before the STL existed and are never going to make anyone wonder, “Is this the standard library version of cout or some custom class?” the way vector or especially string would.
These are just matters of personal coding style, so do what works for you.
non\nexistent”) or with multiple spaces, and if there are common first words). \$\endgroup\$