to lexicographically compare kth word from two strings

Question

I m trying to write a c++ function to lexicographically compare kth word from two strings. here is my function :

bool kth_lexo () { int k = 2 ; str1 = "123 300 60009" ; str2 = "1500 10002" ; // to store the kth word of fist string in ptr1 char *ptr1 = strtok( (char*)str1.c_str() ," "); for(int i = 1; i<k; i++) { ptr1 = strtok(NULL," "); } // to store the kth word of second string in ptr2 char *ptr2 = strtok( (char*)str2.c_str() ," "); for(int i = 1; i<k; i++) { ptr2 = strtok(NULL," "); } string st1 = ptr1 ; string st2 = ptr2 ; return st1 > st2 ; }

In this function my lexicographical comparison works fine, as this func returns 1 because 300 (2nd word of str1) is lexicographically bigger than 10002 (2nd word of str2)

My Problem : If i slightly modify my function by replacing last line of previous function by this return ptr1>ptr2 ;

now my new function lokks something like this :

bool kth_lexo () { int k = 2 ; str1 = "123 300 60009" ; str2 = "1500 10002" ; // to store the kth word of fist string in ptr1 char *ptr1 = strtok( (char*)str1.c_str() ," "); for(int i = 1; i<k; i++) { ptr1 = strtok(NULL," "); } // to store the kth word of second string in ptr2 char *ptr2 = strtok( (char*)str2.c_str() ," "); for(int i = 1; i<k; i++) { ptr2 = strtok(NULL," "); } // modified line compared to previous function return ptr1 > ptr2 ; }

for this modified function each time my output consistently comes out to be 0, no matter whether kth word of str1 stored in ptr1 is lexicographically greater or smaller than kth word of str2 stored in ptr2.

also even after modifying the return statement by this line doesn't bring much help : return (*ptr1)>(*ptr2) ;

So what's the problem with either of these two return statement lines in my modified function for comparing the kth word of both the strings:

return ptr1 > ptr2 ;

OR

return (*ptr1) > (*ptr2) ;

Why the "c" tag? Please read its description!

Ulrich Eckhardt
– Ulrich Eckhardt

2021-01-24 09:32:57 +00:00
Commented Jan 24, 2021 at 9:32 — Ulrich Eckhardt
– Ulrich Eckhardt, Commented Jan 24, 2021 at 9:32

n314159 · Accepted Answer · 2021-01-24 15:24:25Z

You are using a very C-like program. Using modern C++ makes this much simpler and easier to read, since we can use very expressive syntax:

#include <string_view> #include <iostream> #include <cassert> auto find_kth_char(std::string_view to_search, char c, std::size_t k, std::size_t pos = 0) { for (; pos < std::string_view::npos && k > 0; --k) { pos = to_search.find(c, pos + 1); } return pos; } auto get_kth_word(std::string_view to_search, std::size_t k) { // We count starting on 1 assert(k > 0); auto start = find_kth_char(to_search, ' ', k - 1); if (start == std::string_view::npos) { return std::string_view{}; } auto end = find_kth_char(to_search, ' ', 1, start); return to_search.substr(start, end - start); } auto compare_kth(std::string_view lhs, std::string_view rhs, std::size_t k) { auto l_word = get_kth_word(lhs, k); auto r_word = get_kth_word(rhs, k); // returnvalue <=> 0 == lhs <=> rhs return l_word.compare(r_word); } int main() { auto str1 = "123 300 60009"; auto str2 = "1500 10002"; for (std::size_t k = 1; k < 4; ++k) { std::cout << k << ":\t" << compare_kth(str1, str2, k) << '\n'; } }

I am using C++17's string_view since we do not change anything in the strings and taking substrings etc. is very cheap with them. We use the find and compare member functions for doing the real work.

The return value from our function is an int that tells us whether the left hand side is smaller (negative result), equal (0) or greater (positve result) than the right hand side.

A M · Accepted Answer · 2021-01-24 12:09:41Z

If you would stop using C and consequently use C++, then this problem would not occur.

You are here mixing up C++ std::string and char* or const char*. Basically, for strings, std::string is that superior to the old style C-char-arrays or char* that you from now on and in the future should never use something else than std::string

A char pointer is an adress into some area in the memory, where your char data is stored. Dereferencing the pointer with *, will give you the element stored at this address. So only one element. Not a string or whatever. Only exactly one character.

comparing ptr1 > ptr2 , will not compare strings. It will compare some values, where the strings are stored in memory. "ptr1" could be 0x578962574 and "ptr2" could be 0x95324782, or whatever. We do not know the address. This will be defined by the linker.

And if you compare (*ptr1)>(*ptr2), then you compare only 2 singgle characters, and that may give you also the wrong result.

On the other hand, Comparing 2 std::strings, will always work as expected.

So, simple answer: Use std::string for all strings.

okay so far i have understood everything you wrote. But what if i try to execute this code cout<<ptr1 ; this line does print the whole kth word in string1. Why is this happening ??
The inserter operator << for std::ostream has an overload for a char*. It will take a char pointer and interprete this as a C-string. And consequently print this as a C-String.

Collectives™ on Stack Overflow

to lexicographically compare kth word from two strings

2 Answers 2

Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Related