-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
[EHN] pandas.DataFrame.to_orc #44554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
9a7b29a d11026f 0146ac3 0571602 d970b58 8b12e9f 65e6b7a 2114616 e4b40ef a7aa3e0 1ab9b6c 96969d5 2a54b8c 1caec9e 6f0a538 ae65214 045c411 c00ed0f fe275d7 9d3e0df 971f31c 52b68a0 76437ba c5d5852 b5cd022 7ad3df9 a73bb70 20aefe7 e7e81fe 6b659f7 18e5429 21cba6e c7bf39f e43c6dd afa0a8a cd585e6 b509c3c 1001002 55cab6e 89283e0 989468a a7fca36 7fc338c 91d1556 a28c5a8 162e5bb b230583 e16edab e4770b8 File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -226,7 +226,7 @@ def test_orc_reader_snappy_compressed(dirpath): | |
| tm.assert_equal(expected, got) | ||
| | ||
| | ||
| def test_orc_roundtrip(dirpath): | ||
| def test_orc_roundtrip_file(dirpath): | ||
| # GH44554 | ||
| # PyArrow gained ORC write support with the current argument order | ||
| pytest.importorskip("pyarrow", minversion="7.0.0") | ||
| ||
| | @@ -248,3 +248,26 @@ def test_orc_roundtrip(dirpath): | |
| got = read_orc(outputfile) | ||
| | ||
| tm.assert_equal(expected, got) | ||
| | ||
| | ||
| def test_orc_roundtrip_bytesio(): | ||
| # GH44554 | ||
| # PyArrow gained ORC write support with the current argument order | ||
| pytest.importorskip("pyarrow", minversion="7.0.0") | ||
| ||
| data = { | ||
| "boolean1": np.array([False, True], dtype="bool"), | ||
| "byte1": np.array([1, 100], dtype="int8"), | ||
| "short1": np.array([1024, 2048], dtype="int16"), | ||
| "int1": np.array([65536, 65536], dtype="int32"), | ||
| "long1": np.array([9223372036854775807, 9223372036854775807], dtype="int64"), | ||
| "float1": np.array([1.0, 2.0], dtype="float32"), | ||
| "double1": np.array([-15.0, -5.0], dtype="float64"), | ||
| "bytes1": np.array([b"\x00\x01\x02\x03\x04", b""], dtype="object"), | ||
| "string1": np.array(["hi", "bye"], dtype="object"), | ||
| } | ||
| expected = pd.DataFrame.from_dict(data) | ||
| | ||
| bytesio = expected.to_orc() | ||
| got = read_orc(bytesio) | ||
| | ||
| tm.assert_equal(expected, got) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
write_tablesupport os.PathLike, (fsspec)-urls, and strings indicating compression? If not it might be more consistent across to_* to have something like this:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does for everything with the possible exception of the fsspec URLs which do need to be tested.
Here is the API doc for the function:
https://arrow.apache.org/docs/python/generated/pyarrow.orc.write_table.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we necessarily have fsspec yet so let's use the approach you mentioned.