Skip to content

Conversation

@milkshakeiii
Copy link
Contributor

@milkshakeiii milkshakeiii commented Oct 26, 2023

  • Emulate most aspects of the pandas get_dummies interface
  • Tests and doctest examples
  • Performance bottleneck is BigQuery column count in most cases.
@milkshakeiii milkshakeiii requested review from a team as code owners October 26, 2023 06:39
@milkshakeiii milkshakeiii requested a review from shobsi October 26, 2023 06:39
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Oct 26, 2023
@milkshakeiii milkshakeiii requested a review from shobsi October 27, 2023 19:19
@milkshakeiii
Copy link
Contributor Author

Thanks for the review! Working on addressing these comments today.

Copy link
Contributor

@TrevorBergeron TrevorBergeron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good, just a few suggestions on cutting down the method length a bit

@milkshakeiii milkshakeiii merged commit d8baad5 into main Nov 1, 2023
@milkshakeiii milkshakeiii deleted the b297352026-get-dummies branch November 1, 2023 20:41
ashleyxuu pushed a commit that referenced this pull request Nov 1, 2023
* feat: add pd.get_dummies * remove unneeded prefix case * param/documentation fixes * be stricter about types in test * be stricter about types in series test * remove unneeded comment * adjust for type difference in pandas 1 * add example code (tested) * fix None columns and add test cases * variable names and _get_unique_values per-column * account for pandas 1 behavior difference * remove already_seen set * avoid unnecessary join/projection * fix column ordering edge case * adjust for picky examples checker * example tweak * make part of the example comments * use ellipsis in doctest comment * add <BLANKLINES> to doctest string * extract parameter standardization * extract submethods --------- Co-authored-by: Henry J Solberg <henryjsolberg@google.com>
ashleyxuu pushed a commit that referenced this pull request Nov 1, 2023
* feat: add pd.get_dummies * remove unneeded prefix case * param/documentation fixes * be stricter about types in test * be stricter about types in series test * remove unneeded comment * adjust for type difference in pandas 1 * add example code (tested) * fix None columns and add test cases * variable names and _get_unique_values per-column * account for pandas 1 behavior difference * remove already_seen set * avoid unnecessary join/projection * fix column ordering edge case * adjust for picky examples checker * example tweak * make part of the example comments * use ellipsis in doctest comment * add <BLANKLINES> to doctest string * extract parameter standardization * extract submethods --------- Co-authored-by: Henry J Solberg <henryjsolberg@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.

4 participants