1

This problem troubles for a year. My R has trouble in opening my csv file containing simplified Chinese character. The data is coded as GBK I believe. I have three computers with different language and operation system and it has mixed results on opening the same Chinese csv file. Could someone tell me why the results are different?

  • (1)Windows+English OS+English R and R studio: UNABLE to read my csv even if I encoded it as UTF8,GBK, and you name it encoding for Chinese.
  • (2) Mac+EnglishOS+English R: ABLE to read the Chinese csv without forcing the encoding (update: after I reinstall operation system to El Caption, it could not open my csv correctly)

  • (3) Windows+Chinese OS,+Chinese R: ABLE read csv without forcing encoding or gbk

  • (4) Windows+English OS,+Chinese R: UNABLE
  • (5) Ubuntu English OS,English R: ABLE
  • In the windows case(English and Chinese OS), notebook can open the csv correctly but excel cannot in the English Case. When ever I could not open my csv with excel, my r cannot either.
  • If I converge the csv by Google sheet, my excel can open my csv but R still not ok.

How does the encoding work in R, why the results change with the OS Lanuage?

 read.csv(...,encoding=) 
1
  • thanks for noticing me that gbk is one of the possibility. I have trouble opening a CSV file in simplified Chinese downloaded from an online bank, I have tried latin1, I have tried iso-8859-1, I have tried cp1252, all to no avail. But gkb simply does the work! Commented Nov 3, 2019 at 5:58

2 Answers 2

1

It could be related with the excel csv encoding system. If your windows operation system is Englihs. The excel might not properly open the cvs correctly. A work around is the using google sheer or Ubuntu installed sheet to converge it to csv and try using r open it.

Sign up to request clarification or add additional context in comments.

Comments

0

I have figured out how to solve. It deals with large less than 800M files contained Simplified Chinese Characters. The key is that you should know the default Chinese encoding in your operation system.

The Ubuntu use UTF-8 as default Chinese Encoding. So you should encode it as UTF-8 instead GB18130 or other GB starting encoding.

  • (1) Download Open Office (free and fast to install, have have higher file size than Cals in Ubuntu).

  • (2) Detect your CSV encoding. Simply open your csv using Open office and choose an encoding method that display your Chinese character.

  • (3) Save your csv to correct encoding system according to your operation system. Default Windows are GBK for Chinese and Ubuntu is UTF8.

This should solve your file size problem and encoding problem. You do not even force the encoding. Normal read.csv would work.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.