File tree Expand file tree Collapse file tree 1 file changed +10
-3
lines changed Expand file tree Collapse file tree 1 file changed +10
-3
lines changed Original file line number Diff line number Diff line change @@ -2877,9 +2877,16 @@ def to_parquet(
28772877
28782878 Notes
28792879 -----
2880- This function requires either the `fastparquet
2881- <https://pypi.org/project/fastparquet>`_ or `pyarrow
2882- <https://arrow.apache.org/docs/python/>`_ library.
2880+ * This function requires either the `fastparquet
2881+ <https://pypi.org/project/fastparquet>`_ or `pyarrow
2882+ <https://arrow.apache.org/docs/python/>`_ library.
2883+ * When saving a DataFrame with categorical columns to parquet,
2884+ the file size may increase due to the inclusion of all possible
2885+ categories, not just those present in the data. This behavior
2886+ is expected and consistent with pandas' handling of categorical data.
2887+ To manage file size and ensure a more predictable roundtrip process,
2888+ consider using :meth:`Categorical.remove_unused_categories` on the
2889+ DataFrame before saving.
28832890
28842891 Examples
28852892 --------
You can’t perform that action at this time.
0 commit comments