4

I am facing some issues with non-Ascii chars in C++. I have one file containg non-ascii chars which I am reading in C++ via file Handling. After reading the file(say 1.txt) I am storing the data into string stream and writing it into another file(say 2.txt).

Assume 1.txt contains:

ação 

In 2.txt I should get same ouyput but non-Ascii chars are printed as their Hex value in 2.txt.

Also, I am quite sure that C++ is handling Ascii chars as Ascii only.

Please Help on how to print these chars correctly in 2.txt

EDIT:

Firstly Psuedo-Code for Whole Process:

1.Shell script to Read from DB one Value and stores in 11.txt 2.CPP Code(a.cpp) reading 11.txt and Writing to f.txt 

Data Present in DB which is being read: Instalação

File 11.txt contains: Instalação

File F.txt Contains: Instalação

Ouput of a.cpp on screen: Instalação

a.cpp

#include <iterator> #include <iostream> #include <algorithm> #include <sstream> #include<fstream> #include <iomanip> using namespace std; int main() { ifstream myReadFile; ofstream f2; myReadFile.open("11.txt"); f2.open("f2.txt"); string output; if (myReadFile.is_open()) { while (!myReadFile.eof()) { myReadFile >> output; //cout<<output; cout<<"\n"; std::stringstream tempDummyLineItem; tempDummyLineItem <<output; cout<<tempDummyLineItem.str(); f2<<tempDummyLineItem.str(); } } myReadFile.close(); return 0; } 

Locale says this:

LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= 
15
  • So exactly what is your question? "How do I identify ASCII characters, and print non-ASCII as hex?" Commented Jul 15, 2013 at 7:37
  • Post your actual code (the smallest sample that exhibits your problem) and then we can tell you what minimal changes have to be made. Commented Jul 15, 2013 at 7:38
  • I want to get non-ascii chars printed as non-ascii only in 2.txt and not as their hex values Commented Jul 15, 2013 at 7:38
  • @chris - Sorry but I can't post actual C++ code due to copyright issues. Commented Jul 15, 2013 at 7:39
  • 2
    @MayankJain, The posted code should be about the same length as that pseudocode. There's no way that an SSCCE of this could be copyrighted. Commented Jul 15, 2013 at 7:40

2 Answers 2

3

At least if I understand what you're after, I'd do something like this:

#include <iterator> #include <iostream> #include <algorithm> #include <sstream> #include <iomanip> std::string to_hex(char ch) { std::ostringstream b; b << "\\x" << std::setfill('0') << std::setw(2) << std::setprecision(2) << std::hex << static_cast<unsigned int>(ch & 0xff); return b.str(); } int main(){ // for test purposes, we'll use a stringstream for input std::stringstream infile("normal stuff. weird stuff:\x01\xee:back to normal"); infile << std::noskipws; // copy input to output, converting non-ASCII to hex: std::transform(std::istream_iterator<char>(infile), std::istream_iterator<char>(), std::ostream_iterator<std::string>(std::cout), [](char ch) { return (ch >= ' ') && (ch < 127) ? std::string(1, ch) : to_hex(ch); }); } 
Sign up to request clarification or add additional context in comments.

Comments

0

Sounds to me like a utf8 issue. Since you didn't tag your question with c++11 Here Is an excelent article on unicode and c++ streams.

From your updated code, let me explain what is happening. You create a file stream to read your file. Internally the file stream only recognizes chars, until you tell it otherwise. A char, on most machines, can only hold 8 bits of data, but the characters in your file are using more than 8 bits. To be able to read your file correctly, you NEED to know how it is encoded. The most common encoding is UTF-8, which uses between 1 and 4 chars for each character.

Once you know your encoding, you can either use wifstream (for UTF-16) or imbue() a locale for other encodings.

Update: If your file is ISO-88591 (from your comment above), try this.

wifstream myReadFile; myReadFile.imbue(std::locale("en_US.iso88591")); myReadFile.open("11.txt"); 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.