2

Trying to compare strings using:

!(stringvector[i]).compare(vector[j][k]) 

only works for some entries of

vector[j][k] 

-- namely ones that are a case sensitive string match.

How do I get case-insensitive matching from this functionality?

Here is a bit of code I was working on

#include <iostream> #include <vector> #include <string> using namespace std; //poor form vector<string> stringvector = {"Yo", "YO", "babbybabby"}; vector<string> vec1 = {"yo", "Yo" , "these"}; vector<string> vec2 = {"these", "checked" , "too" , "Yo", "babbybabby"}; vector<vector<string>> vvs = {vec1, vec2}; for (int v = 0; v < vvs.size(); v++) //first index of vector { for(int s = 0; s < vvs[v].size(); s++) //second index of vector { for(int w = 0; w < stringvector.size(); w++) { if (stringvector[w] == vvs[v][s]) {cout << "******FOUND******";} } } } 

This doesn't print out FOUND for the case-insensitive matches.

Stringvector[w] == vvs[v][s] does not make case-insensitive comparison, is there a way to add this functionality easily?

--Prof D

12
  • You should show the declaration of vector and stringvector. Commented Feb 16, 2017 at 9:36
  • 2
    In the initializer list of vec2, the "Yo," should be "Yo", instead? Commented Feb 16, 2017 at 9:56
  • 2
    There is nothing obvious wrong with the code, that is how you compare strings. Make a full compilable example, show the output, say why you expect a different output. Commented Feb 16, 2017 at 9:59
  • 2
    Also, declaring a variable called vector is very confusing - particularly when you obviously have a using namespace std; in effect. (Don't do that.) Commented Feb 16, 2017 at 10:04
  • 2
    Final comment: Range based for's + auto removes a lot of the clutterthis a lot easier to read: for (const auto& vec : vec_vec) for (const auto& str : vec) for (const auto& target : stringvector) if (target == str) { ... } - with some newlines obviously! Commented Feb 16, 2017 at 10:07

3 Answers 3

2

tl;dr

Use the ICU library.


"The easy way", when it comes to natural language strings, is usually fraught with problems.

As I pointed out in my answer to that "lowercase conversion" answer @Armando linked to, if you want to actually do it right, you're currently best off using the ICU library, because nothing in the standard gives you actual Unicode support at this point.

If you look at the docs to std::tolower as used by @NutCracker, you will find that...

Only 1:1 character mapping can be performed by this function, e.g. the Greek uppercase letter 'Σ' has two lowercase forms, depending on the position in a word: 'σ' and 'ς'. A call to std::tolower cannot be used to obtain the correct lowercase form in this case.

If you want to do this correctly, you need full Unicode support, and that means the ICU library until some later revision of the C++ standard actually introduces that to the standard library.

Using icu::UnicodeString -- clunky as it might be at first -- for storing your language strings gives you access to caseCompare(), which does a proper case-insensitive comparison.

Sign up to request clarification or add additional context in comments.

Comments

1

You can implement a function for this purpose, example:

bool areEqualsCI(const string &x1, const string &x2){ if(x1.size() != x2.size()) return false; for(unsigned int i=0; i<x2.size(); ++i) if(tolower((unsigned char)x1[i]) != tolower((unsigned char)x2[i])) return false; return true; } 

I recommendy see this post How to convert std::string to lower case?

3 Comments

should be either tolower((unsigned char)x[1]) or tolower(x[1], std::locale()); the C library version is undefined for negative values
Also, it doesn't actually work right for MBCS.
Thank for the recommendations @M.M
0

First, I gave myself some freedom to pretty up your code a bit. For that purpose I replaced ordinary for loops with range-based for loops. Furthermore, I have changed your names of the variables. They are not perfect yet though since I don't know what's the purpose of the code. However, here is a refactored code:

#include <iostream> #include <vector> #include <string> int main() { std::vector<std::string> vec1 = { "Yo", "YO", "babbybabby" }; std::vector<std::string> vec2 = { "yo", "Yo" , "these" }; std::vector<std::string> vec3 = { "these", "checked", "too", "Yo", "babbybabby" }; std::vector<std::vector<std::string>> vec2_vec3 = { vec2, vec3 }; for (auto const& i : vec2_vec3) { for (auto const& j : i) { for (auto const& k : vec1) { if (k == j) { std::cout << k << " == " << j << std::endl; } } } } return 0; } 

Now, if you want to compare strings case-insensitively and if you have access to Boost library, you could use boost::iequals in the following manner:

#include <boost/algorithm/string.hpp> std::string str1 = "yo"; std::string str2 = "YO"; if (boost::iequals(str1, str2)) { // identical strings } 

On the other hand, if you don't have access to Boost library, you can make your own iequals function by using STL algorithms (C++14 required):

bool iequals(const string& a, const string& b) { return std::equal(str1.begin(), str1.end(), str2.begin(), str2.end(), [](char a, char b) { return std::tolower(a, std::locale()) == std::tolower(b, std::locale()); }); } std::string str1 = "yo"; std::string str2 = "YO"; if (iequals(str1, str2)) { // identical strings } 

Note that this would only work for Single-Byte Character Sets (SBCS).

3 Comments

should be either tolower((unsigned char)a) or tolower(a, std::locale()) , the C library version is undefined for negative values
@M.M you are right. Thanks for reviewing
You should also mention that it only works for single-byte chracter sets.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.