[Kernel][Type Widening] 9/ Implement TypeChange Parsing#4593
Merged
vkorukanti merged 2 commits intodelta-io:masterfrom Jun 5, 2025
Merged
[Kernel][Type Widening] 9/ Implement TypeChange Parsing#4593vkorukanti merged 2 commits intodelta-io:masterfrom
vkorukanti merged 2 commits intodelta-io:masterfrom
Conversation
This was referenced May 21, 2025
8e24e94 to 14e49de Compare 5f89234 to 64652bf Compare 5 tasks
583909c to 0377d14 Compare ddad757 to 538ec9c Compare vkorukanti pushed a commit that referenced this pull request May 28, 2025
…schemas (#4560) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/4560/files) to review incremental changes. - [**stack/tw_iterable**](#4560) [[Files changed](https://github.com/delta-io/delta/pull/4560/files)] - [stack/tw_cleanup_signature](#4561) [[Files changed](https://github.com/delta-io/delta/pull/4561/files/d85f1ee710f766b82228f81e0face2518ccaaa0c..96edf0378c81d9a156571fa1fd2f688d0d04ef27)] - [stack/tw_make_update_real](#4566) [[Files changed](https://github.com/delta-io/delta/pull/4566/files/96edf0378c81d9a156571fa1fd2f688d0d04ef27..66fd80cd94099869489ef2fded10b04a2977da35)] - [stack/tw_proper_addressing](#4567) [[Files changed](https://github.com/delta-io/delta/pull/4567/files/66fd80cd94099869489ef2fded10b04a2977da35..9adb29923a50f91785ac1d2527ac8fa05d52d1e2)] - [stack/tw_type_changes](#4568) [[Files changed](https://github.com/delta-io/delta/pull/4568/files/9adb29923a50f91785ac1d2527ac8fa05d52d1e2..ddff25fa0456d1a4022aea7749b47833b315bdaa)] - [stack/tw_type_changes_to_field_metadata](#4588) [[Files changed](https://github.com/delta-io/delta/pull/4588/files/ddff25fa0456d1a4022aea7749b47833b315bdaa..ff43b8e6af15ee95ced406fbd39d13d104fc7a66)] - [stack/tw_type_changes_to_field_metadata_impl](#4589) [[Files changed](https://github.com/delta-io/delta/pull/4589/files/ff43b8e6af15ee95ced406fbd39d13d104fc7a66..3642c6851822446c5be59dc0a0104a09ccbb551d)] - [stack/tw_serde_refactor](#4592) [[Files changed](https://github.com/delta-io/delta/pull/4592/files/3642c6851822446c5be59dc0a0104a09ccbb551d..b819ff738c45f8899f57408107c4fc41de045e25)] - [stack/add_type_change_parsing](#4593) [[Files changed](https://github.com/delta-io/delta/pull/4593/files/b819ff738c45f8899f57408107c4fc41de045e25..538ec9c5bf6afbf3473715bf94eb324b237138da)] - [stack/tw_upgrade_downgrade](#4603) [[Files changed](https://github.com/delta-io/delta/pull/4603/files/538ec9c5bf6afbf3473715bf94eb324b237138da..a6fcb20a3c4caf8e74dc0ca254beadc766fd0083)] --------- #### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description Adds a utility class for iterating over Schemas and modifying them. This will be used in follow-ups to help with type widing, but can also be used to avoid recursion code which has been duplicated in many places. ## How was this patch tested? Unit tests. ## Does this PR introduce _any_ user-facing changes? No. Internal helper only.
0a5fdcb to 985006f Compare vkorukanti pushed a commit that referenced this pull request May 28, 2025
…nstead of map (#4561) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/4561/files) to review incremental changes. - [**stack/tw_cleanup_signature**](#4561) [[Files changed](https://github.com/delta-io/delta/pull/4561/files)] - [stack/tw_make_update_real](#4566) [[Files changed](https://github.com/delta-io/delta/pull/4566/files/dc9493f8ecf74cd647c1056ca018017a96b36ee3..eb36c872f0beecf592a30269009b6acf06490bcb)] - [stack/tw_proper_addressing](#4567) [[Files changed](https://github.com/delta-io/delta/pull/4567/files/eb36c872f0beecf592a30269009b6acf06490bcb..7a825be9356dd2cdae2add3d76e890b0f09a403f)] - [stack/tw_type_changes](#4568) [[Files changed](https://github.com/delta-io/delta/pull/4568/files/7a825be9356dd2cdae2add3d76e890b0f09a403f..b67836ee303012f759b8be1be64b8a59477597ad)] - [stack/tw_type_changes_to_field_metadata](#4588) [[Files changed](https://github.com/delta-io/delta/pull/4588/files/b67836ee303012f759b8be1be64b8a59477597ad..952fb08f703c880cbd652fe23fb147e28b37a040)] - [stack/tw_type_changes_to_field_metadata_impl](#4589) [[Files changed](https://github.com/delta-io/delta/pull/4589/files/952fb08f703c880cbd652fe23fb147e28b37a040..a988cd6201ca97c39ef0d7f27700a67922c929d7)] - [stack/tw_serde_refactor](#4592) [[Files changed](https://github.com/delta-io/delta/pull/4592/files/a988cd6201ca97c39ef0d7f27700a67922c929d7..4c587b511087a7852b6eb9098ca329595106d372)] - [stack/add_type_change_parsing](#4593) [[Files changed](https://github.com/delta-io/delta/pull/4593/files/4c587b511087a7852b6eb9098ca329595106d372..985006f94d99909ea1d5aba9cc26b24f823a8250)] - [stack/tw_upgrade_downgrade](#4603) [[Files changed](https://github.com/delta-io/delta/pull/4603/files/985006f94d99909ea1d5aba9cc26b24f823a8250..03747ccc7e85a5bf7fff2b62c446a7cb7eb615ac)] --------- #### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description This changes the signature of computeSchemaChangesById to take schemas instead of Maps. This makes it possible to change the implementation of from not requiring the specific maps in question, and provides better testing by passing in real schemas instead of transformed values. ## How was this patch tested? Updated unit tests. ## Does this PR introduce _any_ user-facing changes? No
985006f to fc3da02 Compare vkorukanti pushed a commit that referenced this pull request May 28, 2025
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/4566/files) to review incremental changes. - [**stack/tw_make_update_real**](#4566) [[Files changed](https://github.com/delta-io/delta/pull/4566/files)] - [stack/tw_proper_addressing](#4567) [[Files changed](https://github.com/delta-io/delta/pull/4567/files/10c63b52905094fab67fffdbed2c89689b56dc10..a6c5a4b29316b9fe7ac04bf15200ce5f647fafb8)] - [stack/tw_type_changes](#4568) [[Files changed](https://github.com/delta-io/delta/pull/4568/files/a6c5a4b29316b9fe7ac04bf15200ce5f647fafb8..f7f67a943ab6e196a5ae70fa50959d0ab1b25654)] - [stack/tw_type_changes_to_field_metadata](#4588) [[Files changed](https://github.com/delta-io/delta/pull/4588/files/f7f67a943ab6e196a5ae70fa50959d0ab1b25654..6046391b6586de29a907bd2150f6d799ecfc236f)] - [stack/tw_type_changes_to_field_metadata_impl](#4589) [[Files changed](https://github.com/delta-io/delta/pull/4589/files/6046391b6586de29a907bd2150f6d799ecfc236f..2e41a183d4ff76203d503fe5857aa77a807e3316)] - [stack/tw_serde_refactor](#4592) [[Files changed](https://github.com/delta-io/delta/pull/4592/files/2e41a183d4ff76203d503fe5857aa77a807e3316..6d1a8a3cd96a14393e7be9342fdf1e5ad2aa6d7d)] - [stack/add_type_change_parsing](#4593) [[Files changed](https://github.com/delta-io/delta/pull/4593/files/6d1a8a3cd96a14393e7be9342fdf1e5ad2aa6d7d..fc3da02a407bc1d08cd97928e21fda01565fc85e)] - [stack/tw_upgrade_downgrade](#4603) [[Files changed](https://github.com/delta-io/delta/pull/4603/files/fc3da02a407bc1d08cd97928e21fda01565fc85e..943921b9b1d8ec4f648461d6e617cc8d62083f67)] --------- #### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description This pull request adds a specific class instead of a Tuple for tracking Schema updates. ## How was this patch tested? Existing unit tests ## Does this PR introduce _any_ user-facing changes? No, this is private API
c28d9da to 1da4e2f Compare vkorukanti pushed a commit that referenced this pull request May 30, 2025
…#4567) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/4567/files) to review incremental changes. - [**stack/tw_proper_addressing**](#4567) [[Files changed](https://github.com/delta-io/delta/pull/4567/files)] - [stack/tw_type_changes](#4568) [[Files changed](https://github.com/delta-io/delta/pull/4568/files/7a07781a45a757873a5dbf519f84c3e25f3af4e3..2c2f71a4a6b775abb4da3b141058ed20196cd099)] - [stack/tw_type_changes_to_field_metadata](#4588) [[Files changed](https://github.com/delta-io/delta/pull/4588/files/2c2f71a4a6b775abb4da3b141058ed20196cd099..d8f0abb884b33b12333f1b53b6d1aed124d3277a)] - [stack/tw_type_changes_to_field_metadata_impl](#4589) [[Files changed](https://github.com/delta-io/delta/pull/4589/files/d8f0abb884b33b12333f1b53b6d1aed124d3277a..1426562ddfbcb0954b481ecf8432785b84059e84)] - [stack/tw_serde_refactor](#4592) [[Files changed](https://github.com/delta-io/delta/pull/4592/files/1426562ddfbcb0954b481ecf8432785b84059e84..9a8462ad2abd1eb920b021fc5de9497223e844a4)] - [stack/add_type_change_parsing](#4593) [[Files changed](https://github.com/delta-io/delta/pull/4593/files/9a8462ad2abd1eb920b021fc5de9497223e844a4..1da4e2fe660ebd2d4d9eb9aa3702a9ec6d4ed837)] - [stack/tw_upgrade_downgrade](#4603) [[Files changed](https://github.com/delta-io/delta/pull/4603/files/1da4e2fe660ebd2d4d9eb9aa3702a9ec6d4ed837..6148941b110f389616819cf3e64ec10243f32776)] --------- #### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description This PR uses the new SchemaIterable class introduced in a prior PR to calculate schema changes. It takes the opportunity to eliminate recursive methods by doing change comparison at every element in a Schema tree (by creating a custom ID object and finding changes at ever level first). This trades off some memory for, IMO, cleaner code. In theory in a future iteration we can also avoid the implicit recursion on `DataType.equivelant` and `StructField.equal` but this can be done as a follow-up. ## How was this patch tested? Existing tests. ## Does this PR introduce _any_ user-facing changes? No.
bec387c to 582ff88 Compare vkorukanti pushed a commit that referenced this pull request Jun 3, 2025
…4568) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/4568/files) to review incremental changes. - [**stack/tw_type_changes**](#4568) [[Files changed](https://github.com/delta-io/delta/pull/4568/files)] - [stack/tw_type_changes_to_field_metadata](#4588) [[Files changed](https://github.com/delta-io/delta/pull/4588/files/3cc7f9fce24cf8d1c3cd13fff9cb9ac8b2aa3f61..cd9d0a002af5ae669ff0f5951053fc9f0f696509)] - [stack/tw_type_changes_to_field_metadata_impl](#4589) [[Files changed](https://github.com/delta-io/delta/pull/4589/files/cd9d0a002af5ae669ff0f5951053fc9f0f696509..61dd9cf9cc4620c2f2123acae907eb3df1b56326)] - [stack/tw_serde_refactor](#4592) [[Files changed](https://github.com/delta-io/delta/pull/4592/files/61dd9cf9cc4620c2f2123acae907eb3df1b56326..071397255cb5b4ef6e1c4521169de5fac4ab3397)] - [stack/add_type_change_parsing](#4593) [[Files changed](https://github.com/delta-io/delta/pull/4593/files/071397255cb5b4ef6e1c4521169de5fac4ab3397..582ff88d12131a235e9898df84ce8d120011618f)] - [stack/tw_upgrade_downgrade](#4603) [[Files changed](https://github.com/delta-io/delta/pull/4603/files/582ff88d12131a235e9898df84ce8d120011618f..e321508cfa79786493b3500cab62f36f4581d6fa)] --------- #### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description This PR updates schema validation to take into account type widening rules. It does the following: 1. Add switch logic to appropriately validate type changes and not throw if they type widening is enabled and the change is allowed under type widening (also accounting for iceberg ability. 2. Updates the new schema by copying over existing type widening values if none are present on the new schema, and adds in the type changes detected as changes. ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. -->
vkorukanti pushed a commit that referenced this pull request Jun 4, 2025
…rable and refactor collation fetching (#4588) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/4588/files/3cc7f9fce24cf8d1c3cd13fff9cb9ac8b2aa3f61..cd9d0a002af5ae669ff0f5951053fc9f0f696509) to review incremental changes. - [stack/tw_type_changes](#4568) [[Files changed](https://github.com/delta-io/delta/pull/4568/files)] - [**stack/tw_type_changes_to_field_metadata**](#4588) [[Files changed](https://github.com/delta-io/delta/pull/4588/files/3cc7f9fce24cf8d1c3cd13fff9cb9ac8b2aa3f61..cd9d0a002af5ae669ff0f5951053fc9f0f696509)] - [stack/tw_type_changes_to_field_metadata_impl](#4589) [[Files changed](https://github.com/delta-io/delta/pull/4589/files/cd9d0a002af5ae669ff0f5951053fc9f0f696509..61dd9cf9cc4620c2f2123acae907eb3df1b56326)] - [stack/tw_serde_refactor](#4592) [[Files changed](https://github.com/delta-io/delta/pull/4592/files/61dd9cf9cc4620c2f2123acae907eb3df1b56326..071397255cb5b4ef6e1c4521169de5fac4ab3397)] - [stack/add_type_change_parsing](#4593) [[Files changed](https://github.com/delta-io/delta/pull/4593/files/071397255cb5b4ef6e1c4521169de5fac4ab3397..582ff88d12131a235e9898df84ce8d120011618f)] - [stack/tw_upgrade_downgrade](#4603) [[Files changed](https://github.com/delta-io/delta/pull/4603/files/582ff88d12131a235e9898df84ce8d120011618f..e321508cfa79786493b3500cab62f36f4581d6fa)] --------- #### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description This PR adds the ability for SchemaIterable to avoid recursion on specific nested types by passing in the classes to skip. It uses this updated API to refactor collecting collations using the iterable, this is in preparation for doing the same logic with TypeChanges in the follow-up PR. ## How was this patch tested? Unit tests (both new and existing). ## Does this PR introduce _any_ user-facing changes? No
320751c to cd75097 Compare vkorukanti pushed a commit that referenced this pull request Jun 4, 2025
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/4589/files) to review incremental changes. - [**stack/tw_type_changes_to_field_metadata_impl**](#4589) [[Files changed](https://github.com/delta-io/delta/pull/4589/files)] - [stack/tw_serde_refactor](#4592) [[Files changed](https://github.com/delta-io/delta/pull/4592/files/ff20437822a7ab56ef64e739a846de4edb252e89..0f4aad0f937fad59f017c3c150cb721af4bfa566)] - [stack/add_type_change_parsing](#4593) [[Files changed](https://github.com/delta-io/delta/pull/4593/files/0f4aad0f937fad59f017c3c150cb721af4bfa566..cd75097216be5bf7070e5bd46b7e0ec67f0d1ff7)] - [stack/tw_upgrade_downgrade](#4603) [[Files changed](https://github.com/delta-io/delta/pull/4603/files/cd75097216be5bf7070e5bd46b7e0ec67f0d1ff7..a0aa5e0af91c89c646de81b944187ef36d78b03c)] --------- #### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description This PR adds logic to collect type changes from leaf fields and appropriately populate FieldMetadata on the nearest StructField, following the pattern already established by collations. Because this is on the StructField it means extra work is performed for nested maps/elements, fixing this is outside the scope of the PR> ## How was this patch tested? New unit tests. ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. -->
cd75097 to 48e303c Compare vkorukanti approved these changes Jun 5, 2025
| return structField; | ||
| } | ||
| | ||
| private static boolean isNested(DataType dataType) { |
Collaborator
There was a problem hiding this comment.
may be we should move this under DataType class
Collaborator Author
There was a problem hiding this comment.
I think this has been sprinkled in a few places, I'll do a cleanup to consolidate in a separate PR to keep things focused.
vkorukanti pushed a commit that referenced this pull request Jun 5, 2025
…intertwined with type parsing (#4592) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/4592/files) to review incremental changes. - [**stack/tw_serde_refactor**](#4592) [[Files changed](https://github.com/delta-io/delta/pull/4592/files)] - [stack/add_type_change_parsing](#4593) [[Files changed](https://github.com/delta-io/delta/pull/4593/files/6e5bbbeb27f0b6d0418c22e0e0b39907cbd2cc4e..48e303c7921df89414e75f9b45d78defcd17847a)] - [stack/tw_upgrade_downgrade](#4603) [[Files changed](https://github.com/delta-io/delta/pull/4603/files/48e303c7921df89414e75f9b45d78defcd17847a..2d6fffbe062fb6aab01d4ce59d0736caf8c13a87)] --------- <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description This refactors the population of collation on StringTypes to be separate from parsing. This simplifies flows/arguments or parsing and consolidates the logic in a single place using StreamIterable. ## How was this patch tested? Existing tests. ## Does this PR introduce _any_ user-facing changes? No.
48e303c to 76e99c5 Compare This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🥞 Stacked PR
Use this link to review incremental changes.
Which Delta project/connector is this regarding?
Description
This PR implements parsing of type changes from the JSON schema to Kernel's schema model.
How was this patch tested?
Add tests for parsing as well as updated schema evolution tests to ensure this works end-to-end.
Does this PR introduce any user-facing changes?
Type changes are now appropriately parsed, and can be round-tripped conforming to the delta spec.