I have a script that generates my XML sitemap and writes it to the file sitemap.xml.gz - i.e. an XML file, compressed with gzip. This file is definitely written correctly as when I download it via FTP it's all good.
However, when I download the file direct from the site (over HTTP), the resulting file appears to be doubly-compressed. When I unzip the file, the sitemap.xml file is a binary file. If I rename that to sitemap2.xml.gz and try unzipping again, I get the true XML file.
So I think the server (Apache2) is for some reason taking the .gz file and serving it with gzip compression again. The headers for the file come back as this:
Status: HTTP/1.1 200 OK Date: Mon, 16 Jul 2012 00:00:47 GMT Server: Apache Last-Modified: Sun, 15 Jul 2012 23:35:26 GMT ETag: "89fff2-3bc46-4c4e6c48deb80" Accept-Ranges: bytes Vary: Accept-Encoding Content-Encoding: gzip Connection: close Transfer-Encoding: chunked Content-Type: application/x-gzip In my httpd.conf I have this:
# compress all text & html: AddOutputFilterByType DEFLATE text/html text/plain text/xml application/xml text/css text/javascript application/x-javascript application/javascript My VirtualHost declaration only has some mod rewrite stuff.
Anyone have any ideas why Apache might be sending the gzip header for this file?
UPDATE: I removed the application/xml entry from the AddOutputFilterByType line, and the file now downloads normally like any other binary file. However, the problem now is that regular .xml files are no longer sent gzipped.
So it seems like the server is deciding that .xml.gz files should be parsed as application/xml, even though it sends it with the header application/x-gzip.
Also, I checked the /etc/mime-types file, it doesn't have an entry for gzip and has this comment at the top:
Note: Compression schemes like "gzip", "bzip", and "compress" are not actually "mime-types". They are "encodings" and hence must not have entries in this file to map their extensions. The "mime-type" of an encoded file refers to the type of data that has been encoded, not the type of encoding.
sites-enabled/mysite.confin a VirtualHost declaration. As mentioned above, they are only rewrites/redirects, nothing that would affect encoding. I tested in several browsers with the same result.file.txt.gzand download it via the browser, it appears to end up as a regular gzip file (1.4KB) containing the original text file (3KB).