Skip to content

Conversation

@hshimizu77
Copy link
Contributor

fopen() in Windows doesn't accept utf-8 encoded filename with multibyte characters, so need to convert it to filesystem encoding.
Set 'utf-8' as default in case sys.getfilesystemencoding() return None.
sys.getfilesystemencoding() will return 'mbcs' in Windows, and will 'utf-8' or user setting in other systems.

@jreback
Copy link
Contributor

jreback commented Apr 5, 2014

thanks

can u write a test that fails without this change and works with it ?

@jreback jreback added this to the 0.15.0 milestone Apr 22, 2014
@jreback
Copy link
Contributor

jreback commented Apr 22, 2014

@hshimizu77 tests?

@hshimizu77
Copy link
Contributor Author

I have added a test but I'm wondering it's right place or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just do:

source = source.encode(sys.getfilesystemencoding() or 'utf-8')

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

that's the correct place

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

@TomAugspurger can you test this on OSX?

@hshimizu77 ok...let me test on windows; I think we may put this in w/o the test (but don't change the test ATM).

as I don't think pandas will install on some systems because of the encoding of the filename (in theory it should always work, but don't want to bork older systems).

@jreback jreback modified the milestones: 0.14.0, 0.15.0 Apr 23, 2014
@TomAugspurger
Copy link
Contributor

The test errors

(pandas-dev) ~/E/p/l/p/s/p/p/i/tests (hshimizu77-bug-fix=) nosetests test_parsers:ParserTests.test_read_csv_example_in_windows_filesystem E ====================================================================== ERROR: test_parsers.ParserTests.test_read_csv_example_in_windows_filesystem ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/admin/Envs/pandas-dev/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/Users/admin/Envs/pandas-dev/lib/python2.7/site-packages/pandas/pandas/io/tests/test_parsers.py", line 1827, in test_read_csv_example_in_windows_filesystem self.read_csv(path) File "/Users/admin/Envs/pandas-dev/lib/python2.7/site-packages/pandas/pandas/io/tests/test_parsers.py", line 55, in read_csv raise NotImplementedError NotImplementedError ---------------------------------------------------------------------- Ran 1 test in 0.067s FAILED (errors=1) 

Seems to work from the interpreter though:

In [1]: from test_parsers import * In [2]: path = tm.get_data_path(u'日本語ファイル名テスト_read_csv_in_win_filesystem.csv') In [3]: pd.read_csv(path) Out[3]: A B C 0 100 200 300 1 aaa bbb ccc [2 rows x 3 columns]
@TomAugspurger
Copy link
Contributor

I think you want read_csv not self.read_csv

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

@hshimizu77

for this

path = tm.get_data_path(u'日本語ファイル名テスト_read_csv_in_win_filesystem.csv') 

use u(.....) instead of the a literal u (this is translated by the compat module as needed for py2/3

@jreback
Copy link
Contributor

jreback commented Apr 30, 2014

merged via 79df67a

thanks!

@jreback jreback closed this Apr 30, 2014
@hshimizu77 hshimizu77 deleted the bug-fix branch May 3, 2014 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO CSV read_csv, to_csv Unicode Unicode strings Windows Windows OS

3 participants