[Kernel]Incrementally build Crc from previous CRC#5
Open
huan233usc wants to merge 84 commits intostack/crc-full-improvefrom
Open
[Kernel]Incrementally build Crc from previous CRC#5huan233usc wants to merge 84 commits intostack/crc-full-improvefrom
huan233usc wants to merge 84 commits intostack/crc-full-improvefrom
Conversation
… not yet implemented (delta-io#4678) Currently the support is just for the metadata updates to column mapping enabled tables. Data path is not yet implemented. Block it so that the connectors won't write invalid data files.
…bles (delta-io#4670) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> This PR improves row tracking materialized column name assignment by: - Making it explicit that row tracking column names are assigned only when creating new tables - Adding validation to ensure row tracking column names already exist on existing tables when row tracking enabled. If the configs are missing in this case (which should not happen), we now throw an error instead of silently assigning them. ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> New unit tests. ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> No.
…#4683) #### Which Delta project/connector is this regarding? - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description For RESTORE TABLE and the CLONE command, we currently don't mark a DomainMetadata as removed when target snapshot has the DomainMetadata, but the source snapshot does not have that DomainMetadata. The fix is to mark a DomainMetadata as removed during a RESTORE or CLONE command if: 1. The DomainMetadata is included in the list of DomainMetadata to remove for REPLACE TABLE. 2. The DomainMetadata is not present in the list of DomainMetadata that would anyways be committed by the CLONE command. ## How was this patch tested? See test changes. ## Does this PR introduce _any_ user-facing changes? No.
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> ## Description Consider both key 'notebook_id' ad 'notebookID'
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description This PR cleans up the repeated catalog name in Insert tests ## How was this patch tested? Existing UTs ## Does this PR introduce _any_ user-facing changes? No
…ss (delta-io#4682) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> This PR moves Assertion to the constructor of Log Segment. Closes delta-io#4639 . ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> Two tests are added to LogSegmentSuite.scala . One includes invalid check point files (from different table) and one includes invalid delta JSON files. We expect these invalid inputs can be successfully caught by the assert function in log segment. ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> No.
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description Make sure RewriteHelper to always use the correct snapshot ID when validating the snapshot <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> ## How was this patch tested? Existing UT <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> NO
…pl; Include parsedLogData when constructing LogSegment (delta-io#4615) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/4615/files) to review incremental changes. - [**stack/kernel_catalog_managed_2c**](delta-io#4615) [[Files changed](https://github.com/delta-io/delta/pull/4615/files)] - [stack/kernel_catalog_managed_3](delta-io#4644) [[Files changed](https://github.com/delta-io/delta/pull/4644/files/9c5b1dc063b429057a8f479e8bac78f2aced645c..f4a65dcb07281c02fbe368e9865a38c2b7044280)] - [stack/kernel_catalog_managed_4](delta-io#4663) [[Files changed](https://github.com/delta-io/delta/pull/4663/files/f4a65dcb07281c02fbe368e9865a38c2b7044280..e007d5778d98bd5946243495ea8cccf39e3e16bf)] --------- #### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [X] Kernel - [ ] Other (fill in here) ## Description This PR adds a few classes and adds some new logic: - ResolvedTableInternalImpl --> our actual implementation of resolved table - ResolvedTableBuilderInternalImpl --> our actual implementation of the builder - ResolvedTableFactory --> we want the builder to get input; the class should remain small and succinct; we want all of the "now that we have the input, go and create the actual resolved table" to be in a separate module (just cleaner) -- that's what this factory is. you can think of it as similar to SnapshotManager - Update SnapshotManager::getLogSegmentForVersion to take in the parsed log data ## How was this patch tested? New UTs ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. -->
…lta-io#4690) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/4690/files) to review incremental changes. - [**stack/kernel_log_replay_refactor_lazy**](delta-io#4690) [[Files changed](https://github.com/delta-io/delta/pull/4690/files)] --------- <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [X] Kernel - [ ] Other (fill in here) ## Description Change LogReplay to take in a lazy LogSegment. This will simplify the ResolvedTableFactory code as commented here https://github.com/delta-io/delta/pull/4644/files#r2121995895 ## How was this patch tested? Just a refactor. Existing UTs ## Does this PR introduce _any_ user-facing changes? No
… only as needed (delta-io#4644) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/4644/files) to review incremental changes. - [**stack/kernel_catalog_managed_3**](delta-io#4644) [[Files changed](https://github.com/delta-io/delta/pull/4644/files)] - [stack/kernel_catalog_managed_5_builder_validation](delta-io#4664) [[Files changed](https://github.com/delta-io/delta/pull/4664/files/0b4bdd9a76defb12a692789fbcacc3c4662d7ac4..3089c3e234e6f9ea451f57d25893e39920cf71b8)] --------- #### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [X] Kernel - [ ] Other (fill in here) ## Description Add the ability to load the protocol and metadata -- but do so only as needed. e.g. if we got the P & M in the ResolvedTableBuilder, then we don't need to perform LogReplay at all to load them. In this PR, we _only_ eagerly provide the P & M. We should add better mock test capabilities to easily provide a P & M to the, for example, json handler in a followup PR. ## How was this patch tested? New simple UT. ## Does this PR introduce _any_ user-facing changes? No.
…elta-io#4568) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/4568/files) to review incremental changes. - [**stack/tw_type_changes**](delta-io#4568) [[Files changed](https://github.com/delta-io/delta/pull/4568/files)] - [stack/tw_type_changes_to_field_metadata](delta-io#4588) [[Files changed](https://github.com/delta-io/delta/pull/4588/files/3cc7f9fce24cf8d1c3cd13fff9cb9ac8b2aa3f61..cd9d0a002af5ae669ff0f5951053fc9f0f696509)] - [stack/tw_type_changes_to_field_metadata_impl](delta-io#4589) [[Files changed](https://github.com/delta-io/delta/pull/4589/files/cd9d0a002af5ae669ff0f5951053fc9f0f696509..61dd9cf9cc4620c2f2123acae907eb3df1b56326)] - [stack/tw_serde_refactor](delta-io#4592) [[Files changed](https://github.com/delta-io/delta/pull/4592/files/61dd9cf9cc4620c2f2123acae907eb3df1b56326..071397255cb5b4ef6e1c4521169de5fac4ab3397)] - [stack/add_type_change_parsing](delta-io#4593) [[Files changed](https://github.com/delta-io/delta/pull/4593/files/071397255cb5b4ef6e1c4521169de5fac4ab3397..582ff88d12131a235e9898df84ce8d120011618f)] - [stack/tw_upgrade_downgrade](delta-io#4603) [[Files changed](https://github.com/delta-io/delta/pull/4603/files/582ff88d12131a235e9898df84ce8d120011618f..e321508cfa79786493b3500cab62f36f4581d6fa)] --------- #### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description This PR updates schema validation to take into account type widening rules. It does the following: 1. Add switch logic to appropriately validate type changes and not throw if they type widening is enabled and the change is allowed under type widening (also accounting for iceberg ability. 2. Updates the new schema by copying over existing type widening values if none are present on the new schema, and adds in the type changes detected as changes. ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. -->
huan233usc pushed a commit that referenced this pull request Jul 10, 2025
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> Exposes deletion vector descriptor field as a parameter of AddFile for generating v3 add actions. ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> Unit tests. ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> No.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which Delta project/connector is this regarding?
Description
How was this patch tested?
Does this PR introduce any user-facing changes?