I am parsing a text file using boost regex in C++. I am looking for '\' characters from the file. This file also contains some unicode '\u' characters as well. So, is there a way to separate out '\' and '\u' character. Following is content of test.txt that I am parsing
"ID": "\u01FE234DA - this is id ", "speed": "96\/78", "avg": "\u01FE234DA avg\83" Following is my try
#include <boost/regex.hpp> #include <string> #include <iostream> #include <fstream> using namespace std; const int BUFSIZE = 500; int main(int argc, char** argv) { if (argc < 2) { cout << "Pass the input file" << endl; exit(0); } boost::regex re("\\\\+"); string file(argv[1]); char buf[BUFSIZE]; boost::regex uni("\\\\u+"); ifstream in(file.c_str()); while (!in.eof()) { in.getline(buf, BUFSIZE-1); if (boost::regex_search(buf, re)) { cout << buf << endl; cout << "(\) found" << endl; if (boost::regex_search(buf, uni)) { cout << buf << endl; cout << "unicode found" << endl; } } } } Now when I use above code it prints following
"ID": "\u01FE234DA - this is id ", (\) found "ID": "\u01FE234DA - this is id ", unicode found "speed": "96\/78", (\) found "avg": "\u01FE234DA avg\83" (\) found "avg": "\u01FE234DA avg\83" unicode found Instead of I want following
"ID": "\u01FE234DA - this is id ", unicode found "speed": "96\/78", (\) found "avg": "\u01FE234DA avg\83" (\) and unicode found I think the code is not able to distinguish '\' and '\u' separately but I am not sure where to change what.
\\\u123 testing(well - give or take a few more backslashes). Is there a particular reason to do this with regexes? As I said, iterating over the backslashes ought to be simple, straightforward, and robust.