-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Labels
Milestone
Description
Code Sample, a copy-pastable example if possible
import pandas as pd df = pd.DataFrame({'a': ['丆']}) df.to_stata('test.dta') # UnicodeEncodeError: 'latin-1' codec can't encode character '\u4e06' in position 0: ordinal not in range(256)I picked an arbitrary CJK character to test this with.
Problem description
It would be possible to write Unicode strings to a Stata file by implementing a writer according to version 118 of the dta format.
I'd be interested in trying to submit a PR for this. (Edit: I don't use Stata anymore)
Expected Output
Stata file written to disk.
Output of pd.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-696.18.7.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.23.4 pytest: 3.5.1 pip: 10.0.1 setuptools: 39.1.0 Cython: 0.28.2 numpy: 1.14.3 scipy: 1.1.0 pyarrow: 0.10.0 xarray: None IPython: 7.0.1 sphinx: 1.7.4 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: 1.2.1 tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: 2.2.2 openpyxl: 2.5.3 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.4 lxml: 4.2.1 bs4: 4.6.3 html5lib: 1.0.1 sqlalchemy: 1.2.7 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: 0.1.6 pandas_gbq: None pandas_datareader: None adamrossnelson, shreyasgm and y1my1