Skip to content

Conversation

@pR0Ps
Copy link
Contributor

@pR0Ps pR0Ps commented Apr 26, 2023

This commit fixes an issue where adding a small file to a ZipFile object while forcing zip64 extensions causes an extra Zip64 record to be added to the zip, but doesn't update the min_version or file sizes.

To create a file that reproduces the issue (copied from #103861):

import zipfile with zipfile.ZipFile("out.zip", mode="w", allowZip64=True) as zf: with zf.open("text.txt", mode="w", force_zip64=True) as zi: zi.write(b"some data")

Diff of information extracted by zipdetails from running the above script before and after this commit.

 0000 LOCAL HEADER #1 04034B50 -0004 Extract Zip Spec 14 '2.0' +0004 Extract Zip Spec 2D '4.5' 0005 Extract OS 00 'MS-DOS' 0006 General Purpose Flag 0000 0008 Compression Method 0000 'Stored' 000A Last Mod Time 00210000 'Mon Dec 31 19:00:00 1979' 000E CRC D9C2E91E -0012 Compressed Length 00000009 -0016 Uncompressed Length 00000009 +0012 Compressed Length FFFFFFFF +0016 Uncompressed Length FFFFFFFF 001A Filename Length 0008 001C Extra Length 0014 001E Filename 'text.txt' 0026 Extra ID #0001 0001 'ZIP64' 0028 Length 0010 002A Uncompressed Size 0000000000000009 0032 Compressed Size 0000000000000009 003A PAYLOAD some data 0043 CENTRAL HEADER #1 02014B50 -0047 Created Zip Spec 14 '2.0' +0047 Created Zip Spec 2D '4.5' 0048 Created OS 03 'Unix' -0049 Extract Zip Spec 14 '2.0' +0049 Extract Zip Spec 2D '4.5' 004A Extract OS 00 'MS-DOS' 004B General Purpose Flag 0000 004D Compression Method 0000 'Stored' 004F Last Mod Time 00210000 'Mon Dec 31 19:00:00 1979' 0053 CRC D9C2E91E 0057 Compressed Length 00000009 005B Uncompressed Length 00000009 005F Filename Length 0008 0061 Extra Length 0000 0063 Comment Length 0000 0065 Disk Start 0000 0067 Int File Attributes 0000 [Bit 0] 0 'Binary Data' 0069 Ext File Attributes 01800000 006D Local Header Offset 00000000 0071 Filename 'text.txt' 0079 END CENTRAL HEADER 06054B50 007D Number of this disk 0000 007F Central Dir Disk no 0000 0081 Entries in this disk 0001 0083 Total Entries 0001 0085 Size of Central Dir 00000036 0089 Offset to Central Dir 00000043 008D Comment Length 0000

A test has also been added that checks that these properties are correctly set.

Potential reviewer based on the git blame of the changed lines: @serhiy-storchaka (182d7cd)
Potential reviewers based on the experts index: @Yhg1s, @gpshead

@gpshead
Copy link
Member

gpshead commented Apr 26, 2023

I think this is the issue you were talking about at lunch today? (yay pycon sprints!)

@gpshead gpshead added the sprint label Apr 26, 2023
@pR0Ps
Copy link
Contributor Author

pR0Ps commented Apr 26, 2023

Yep, that's this one!

@pR0Ps pR0Ps force-pushed the bugfix/force-zip64 branch 2 times, most recently from bd02290 to 03a58a6 Compare April 27, 2023 23:26
This commit fixes an issue where adding a small file to a `ZipFile` object while forcing zip64 extensions causes an extra Zip64 record to be added to the zip, but doesn't update the `min_version` or file sizes. Fixes python#103861
@pR0Ps pR0Ps force-pushed the bugfix/force-zip64 branch from 03a58a6 to c42700d Compare April 27, 2023 23:31
This fixes an issue where if data requiring zip64 extensions was added to an unseekable stream without specifying `force_zip64=True`, zip64 extensions would not be used and a RuntimeError would not be raised when closing the file (even though the size would be known at that point). This would result in successfully writing corrupt zip files. Deciding if zip64 extensions are required outside of the `FileHeader` function means that both `FileHeader` and `_ZipWriteFile` will always be in sync. Previously, the `FileHeader` function could enable zip64 extensions without propagating that decision to the `_ZipWriteFile` class, which would then not correctly write the data descriptor record or check for errors on close.
@pR0Ps pR0Ps force-pushed the bugfix/force-zip64 branch from 2ee333f to b68e70f Compare May 1, 2023 03:13
@pR0Ps pR0Ps requested a review from gpshead May 1, 2023 03:19
@pR0Ps
Copy link
Contributor Author

pR0Ps commented May 11, 2023

@gpshead Addressed review comments

gpshead added 2 commits May 15, 2023 23:26
…ompatibility in the API. Code within this module always passes an explicit zip64, so the overall bug fix remains valid. We just don't want to break existing user code constructing their own ZipInfo objects and calling zi.FileHeader themselves for whatever reasons. _(hopefully rare, but it isn't a protected or private API so we can't make assumptions)_
@gpshead
Copy link
Member

gpshead commented May 16, 2023

github actions CI infrastructure seems broken at the moment. But I think this the PR is ready.

I undid one part of your change so it'd remain a pure bugfix: ZipInfo.FileHeader() is technically a public API - even if it seems like nothing should use it we must assume someone does. So getting rid of the old zip64=None default behavior would be an API change, so I restored that. But it should be a no-op as FileHeader is always called with an explicit zip64 bool supplied within the zipfile module itself.

I'll take another look later and rerun CI after it's healthy again.

@gpshead gpshead merged commit 798bcaa into python:main May 16, 2023
@gpshead gpshead added type-bug An unexpected behavior, bug, or error needs backport to 3.11 only security fixes labels May 16, 2023
@miss-islington
Copy link
Contributor

Thanks @pR0Ps for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @pR0Ps and @gpshead, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 798bcaa1eb01de7db9ff1881a3088603ad09b096 3.11

gpshead pushed a commit to gpshead/cpython that referenced this pull request May 16, 2023
…ed in some cases (pythonGH-103863) Fix Zip64 extensions not being properly applied in some cases: Fixes an issue where adding a small file to a `ZipFile` object while forcing zip64 extensions causes an extra Zip64 record to be added to the zip, but doesn't update the `min_version` or file sizes in the primary central directory header. Also fixed an edge case in checking if zip64 extensions are required: This fixes an issue where if data requiring zip64 extensions was added to an unseekable stream without specifying `force_zip64=True`, zip64 extensions would not be used and a RuntimeError would not be raised when closing the file (even though the size would be known at that point). This would result in successfully writing corrupt zip files. Deciding if zip64 extensions are required outside of the `FileHeader` function means that both `FileHeader` and `_ZipWriteFile` will always be in sync. Previously, the `FileHeader` function could enable zip64 extensions without propagating that decision to the `_ZipWriteFile` class, which would then not correctly write the data descriptor record or check for errors on close. If anyone is actually using `ZipInfo.FileHeader` as a public API without explicitly passing True or False in for zip64, their own code may still be susceptible to that kind of bug unless they make a similar change to where the zip64 decision happens. Fixes pythonGH-103861 --------- Co-authored-by: Gregory P. Smith <greg@krypto.org>. (cherry picked from commit 798bcaa) Co-authored-by: Carey Metcalfe <carey@cmetcalfe.ca>
@bedevere-bot
Copy link

GH-104534 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label May 16, 2023
carljm added a commit to carljm/cpython that referenced this pull request May 16, 2023
* main: pythonGH-104510: Fix refleaks in `_io` base types (python#104516) pythongh-104539: Fix indentation error in logging.config.rst (python#104545) pythongh-104050: Don't star-import 'types' in Argument Clinic (python#104543) pythongh-104050: Add basic typing to CConverter in clinic.py (python#104538) pythongh-64595: Fix write file logic in Argument Clinic (python#104507) pythongh-104523: Inline minimal PGO rules (python#104524) pythongh-103861: Fix Zip64 extensions not being properly applied in some cases (python#103863) pythongh-69152: add method get_proxy_response_headers to HTTPConnection class (python#104248) pythongh-103763: Implement PEP 695 (python#103764) pythongh-104461: Run tkinter test_configure_screen on X11 only (pythonGH-104462) pythongh-104469: Convert _testcapi/watchers.c to use Argument Clinic (python#104503) pythongh-104482: Fix error handling bugs in ast.c (python#104483) pythongh-104341: Adjust tstate_must_exit() to Respect Interpreter Finalization (pythongh-104437) pythonGH-102613: Fix recursion error from `pathlib.Path.glob()` (pythonGH-104373)
gpshead added a commit that referenced this pull request May 17, 2023
…some cases (GH-103863) (#104534) Fix Zip64 extensions not being properly applied in some cases: Fixes an issue where adding a small file to a `ZipFile` object while forcing zip64 extensions causes an extra Zip64 record to be added to the zip, but doesn't update the `min_version` or file sizes in the primary central directory header. Also fixed an edge case in checking if zip64 extensions are required: This fixes an issue where if data requiring zip64 extensions was added to an unseekable stream without specifying `force_zip64=True`, zip64 extensions would not be used and a RuntimeError would not be raised when closing the file (even though the size would be known at that point). This would result in successfully writing corrupt zip files. Deciding if zip64 extensions are required outside of the `FileHeader` function means that both `FileHeader` and `_ZipWriteFile` will always be in sync. Previously, the `FileHeader` function could enable zip64 extensions without propagating that decision to the `_ZipWriteFile` class, which would then not correctly write the data descriptor record or check for errors on close. If anyone is actually using `ZipInfo.FileHeader` as a public API without explicitly passing True or False in for zip64, their own code may still be susceptible to that kind of bug unless they make a similar change to where the zip64 decision happens. Fixes GH-103861 --------- . (cherry picked from commit 798bcaa) Co-authored-by: Carey Metcalfe <carey@cmetcalfe.ca>
@pR0Ps pR0Ps deleted the bugfix/force-zip64 branch May 23, 2023 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sprint type-bug An unexpected behavior, bug, or error

4 participants