Skip to content

Conversation

@zhengfeiwang
Copy link
Contributor

Function pandas.read_sql read SQL query or database table into a DataFrame. However, for nullable integer columns, once there exists one row with null, pandas will infer this column as float, which may lead to subsequent issues (in our case, we use DataFrame for ETL and the integer converted to float, not suitable for our sink database table schema).
Using nullable integer is one of the solution we adopted, which may cause more memory usage. Therefore, I add default False parameter for pandas.read_sql to not enable it as default, once you want to enable it, just set it as True.

@pep8speaks
Copy link

pep8speaks commented Mar 5, 2021

Hello @zhengfeiwang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-03-05 03:26:58 UTC
@simonjayhawkins simonjayhawkins added IO SQL to_sql, read_sql, read_sql_query Performance Memory or execution speed performance labels Mar 14, 2021
@jorisvandenbossche jorisvandenbossche added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Apr 1, 2021
@github-actions
Copy link
Contributor

github-actions bot commented May 2, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label May 2, 2021
@lithomas1
Copy link
Contributor

@zhengfeiwang Can you implement this for all nullable dtypes(IntegerArray, BooleanArray, FloatingArray, and StringArray) as the parameter use_nullable_dtypes and resolve conflicts?

@lithomas1 lithomas1 removed the Stale label May 2, 2021
@zhengfeiwang
Copy link
Contributor Author

@zhengfeiwang Can you implement this for all nullable dtypes(IntegerArray, BooleanArray, FloatingArray, and StringArray) as the parameter use_nullable_dtypes and resolve conflicts?

OK, I will try to work on it and finish that ASAP

@lithomas1 lithomas1 removed the Performance Memory or execution speed performance label May 3, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Jun 3, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 3, 2021
@zhengfeiwang
Copy link
Contributor Author

I will finish this perf ASAP, please hold this thread currently.

@simonjayhawkins
Copy link
Member

I will finish this perf ASAP, please hold this thread currently.

Thanks. @zhengfeiwang . will close to clear the queue. ping when ready to continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO SQL to_sql, read_sql, read_sql_query NA - MaskedArrays Related to pd.NA and nullable extension arrays Stale

5 participants