59

Any ideas on how to unzip a piped zip file like this:

wget -qO- http://downloads.wordpress.org/plugin/akismet.2.5.3.zip 

I wished to unzip the file to a directory, like we used to do with a normal file:

wget -qO- http://downloads.wordpress.org/plugin/akismet.2.5.3.zip | unzip -d ~/Desktop 
3
  • While the question is valid, if you are using git to work with WordPress, there is now a Git mirror of each of them. Ignore my comment if its not your case :) Otherwise save yourself the problems of figuring out how to use such a path to automate your installation and head over to use Git submodule/Composer using github.com/wp-plugins Commented Dec 17, 2014 at 18:07
  • 1
    zip requires random access to work. It cannot read incrementally from a pipe -- which is why the zsh-based answer creates a temporary file, not trying to work as a pipe. Commented May 13, 2022 at 19:52
  • usually you only want to write successful response to stdout. see also: write http error body to stderr Commented Mar 9, 2024 at 19:31

8 Answers 8

74

The ZIP file format includes a directory (index) at the end of the archive. This directory says where, within the archive each file is located and thus allows for quick, random access, without reading the entire archive.

This would appear to pose a problem when attempting to read a ZIP archive through a pipe, in that the index is not accessed until the very end and so individual members cannot be correctly extracted until after the file has been entirely read and is no longer available. As such it appears unsurprising that most ZIP decompressors simply fail when the archive is supplied through a pipe.

The directory at the end of the archive is not the only location where file meta information is stored in the archive. In addition, individual entries also include this information in a local file header, for redundancy purposes.

Although not every ZIP decompressor will use local file headers when the index is unavailable, the tar and cpio front ends to libarchive (a.k.a. bsdtar and bsdcpio) can and will do so when reading through a pipe, meaning that the following is possible:

wget -qO- http://downloads.wordpress.org/plugin/akismet.2.5.3.zip | bsdtar -xvf- -C ~/Desktop 
Sign up to request clarification or add additional context in comments.

4 Comments

I have a .zip-file here that contains files with executable permissions. When I download and pipe into bsdtar, the exec bits get thrown away. When I download to disk and extract with bsdtar or unzip then, the exec bits are honoured.
What is the rationale behind including a directory (index) at the end of the archive? Where is to read about that?
@pmor Look up the history of the ZIP filetype. It's because when creating a zip file, you may not know until the end where all the files have come from. Going back to insert a header at the start of a file you've already written is a challenge I suspect Phil Katz may have preferred to avoid.
@pmor it allows you to add/remove/view individual files easily. Zip files are never solid archives like tar.* or the default options of RAR and 7Z, each file is compressed separately and you can extract only the single file you need
28

BusyBox's unzip can take stdin and extract all the files.

wget -qO- http://downloads.wordpress.org/plugin/akismet.2.5.3.zip | busybox unzip - 

The dash after unzip is to use stdin as input.

You can even,

cat file.zip | busybox unzip - 

But that's just redundant of unzip file.zip.

If your distro uses BusyBox by default (e.g. Alpine), just run unzip -.

4 Comments

Busybox 1.22.0 fails with Archive: - unzip: lseek: Illegal seek in Debian. What version of Busybox did you use?
v1.27.2 on Ubuntu 18.10
This didn't work for me on Alpine 3.10 (via Docker). (Not ragging on you, I think it's a useful answer and that comments about working/non-working versions are also helpful)
unzip in some versions of BusyBox (e.g. 1.27.2) doesn't support Zip64, thus it works only for member files smaller than 4 GiB.
19

just use zcat

wget -qO- http://downloads.wordpress.org/plugin/akismet.2.5.3.zip | zcat >> myfile.txt 
  • This will only extract first file. You will see this error message "gzip: stdin has more than one entry--rest ignored" after the first file is extracted.

4 Comments

This is an O <-- Remember
This was what I was looking for. Some files I curl now and again are just single files zipped (don't know why, they're not particularly large) and I don't have control over them being in this format. Using zcat was the solution for me here!
Annoying gotcha - this only works with GNU zcat/gzip, NOT BSD gzip
zcat works perfectly.
14

While the following will not work in bash, it will work in zsh. Since many zsh users may end up here, it may still be useful:

% unzip =( wget -qO- http://downloads.wordpress.org/plugin/akismet.2.5.3.zip ) Archive: /tmp/zshLCod6x creating: akismet/ inflating: akismet/admin.php inflating: akismet/akismet.css inflating: akismet/akismet.gif inflating: akismet/akismet.js inflating: akismet/akismet.php inflating: akismet/legacy.php inflating: akismet/readme.txt inflating: akismet/widget.php % 

As you can notice the temporary downloaded zip file has been deleted straight away :

% ls /tmp/zshLCod6x ls: cannot access '/tmp/zshLCod6x': No such file or directory % 

4 Comments

Note that this will anyway download the full file before running unzip, which is not the original question.
True. Unfortunately, the zip file format puts its "central directory" at the end of the file, and the unzipping algorithm first reads that directory before processing the files. Hence, a true piping solution that correctly unzips isn't really a possibility. (This is also a problem for web applications that want to process large uploaded zip files - it cannot be done in a streaming fashion.)
While it is true that there is an index at the end of the file, containing "authoritative" information on which files have been deleted from the archive (without the need to regenerate it at each deletion), I can successfully extract a simple ZIP in a pipelined way with bsdtar, because there are headers indeed preceding each file. bsdtar would probably give bad results in case the archive has been modified ("phantom" files would appear, since it is not known till the end of the archive which ones are the latest version).
Very neat - i had never seen that form of process substitution in zsh before zsh.sourceforge.io/Intro/intro_7.html
11
wget -q -O tmp.zip http://downloads.wordpress.org/plugin/akismet.2.5.3.zip && unzip tmp.zip && rm tmp.zip 

4 Comments

The use of && is better once the next command only starts if the previous finished successfully. Thanks
This is not extracting de zip in a piped manner. With your proposal you need to use more disk space, and wear it out (important in SSD if the files are big). It is also more efficient to directly parallelise the download and the extraction.
Also, -qO- -O tmp.zip is tautologic: you pass -O - and then -O tmp.zip which is pointless here.
The question specifically asks for unzip from pipe. This answer uses temporary files instead, which may not work on read-only filesystems or other specific use-cases
5

I'd take a look at funzip (http://www.info-zip.org/mans/funzip.html). The man page for it notes,

...filter for extracting from a ZIP archive in a pipe 

Sorry I don't have an example, but it looks like it does come with the Linux unzip utility.

1 Comment

It only dumps the FIRST FILE. funzip without a file argument acts as a filter; that is, it assumes that a ZIP archive (or a gzip'd(1) file) is being piped into standard input, and it extracts the first member from the archive to stdout.
2

Reposting my answer:

I wrote a Python (2.x) script to do streaming extraction of ZIP archives, you can get it from here: https://raw.githubusercontent.com/pts/unzip_scan/master/unzip_scan.py . Usage: cat file.zip | sh unzip_scan.py -.

Comments

2

Another solution if you already have unzip, you should also have funzip which comes from the same [unzip] package.

This utility is made for reading from pipes/stdin. However it seems very primitive and can apparently extract only the first file from the archive (as per the manpage).

Anyway for the sake of completeness on the context of the question, here is a way I found to do it with funzip for a single-file .zip archive:

curl -sL https://<url_to_my_archive>.zip | funzip - > <my_extracted_file> 

replace <url_to_my_archive> and <my_extracted_file> with your values.

2 Comments

because to extract other files you have to wait for the directory entry at the end. Just avoid piping zip files to unzip. And funzip was already answered previously
Where do you see something piping zip files to unzip ? Secondly, no, a funzip actual usage example was never provided in this thread, it was simply mentioned.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.