5

I would like to create a shell script that will check to make sure all files in a directory that appear superficially to be image files (e.g. have typical image file extensions like .jpg, .bmp etc.) are actually image files.

We recently had an issue where a hacker was able to generate a file in a directory and mask it as a .jpg file. I would like to create a shell script to check all files in the directory to make sure they are real jpg, gif or png files.

5
  • 2
    Use file * to check file types Commented Mar 10, 2015 at 20:53
  • Checking file contents sounds like a bad solution to a security problem. What happened after the attacker created their non-JPEG .jpg file? Could there be a way of preventing them from doing the part that is actually harmful? Commented Mar 10, 2015 at 21:06
  • What if you just configure your web server to not execute any code in the upload directory ? That way attacker can upload malware all day long, but nothing will be executed. Commented Mar 11, 2015 at 2:02
  • The upload directory is not the problem, it was where they saved the file. Commented Mar 11, 2015 at 2:14
  • Not directly answering your question, but given the background of the issue you may find this useful. github.com/maxlabelle/WebMalwareScanner Commented Feb 21, 2018 at 9:30

3 Answers 3

9

I think you want to be very careful about using file in a circumstance where you give it completely untrusted input. For instance, RHEL 5 file will identify this:

GIF87a <?php echo "Hello from PHP!\n"; ?> 

As "GIF image data, version 87a, 15370 x 28735". The PHP interpreter has no trouble executing that input. That lack of trouble is the basis for "local file inclusion" (LFI) problems.

Second, file (and even strings) actually parse input files to tell you what you want to know. These parsers are complicated and have problems.

I'm going to suggest the identify command out of the ImageMagick suite. It isn't fooled by my simple example above, and it only parses image files correctly, so it should be less prone to security flaws than file.

1
  • 1
    See my answer for an implementation using ImageMagick identify … and two examples of how to beat it. Commented Nov 15, 2017 at 15:34
8

As a quick first pass, the file command can quickly detect image headers:

if file "$FILE" |grep -qE 'image|bitmap'; then echo "File '$FILE' has the headers of an image" fi 

(The second alternation for bitmap is needed if you want to recognize Windows BMP files since libmagic does not use the word "image" to describe bitmap images.)

However, we can trick file with the PHP-based fake image from Bruce Ediger's answer:

$ echo 'GIF87a<?php echo "Hello from PHP!"; ?>' > fake.gif $ file fake.gif && echo image detected || echo no image detected fake.gif: GIF image data, version 87a, 16188 x 26736 image detected 

Using Imagemagick identify

The ImageMagick suite has an identify script with a CLI frontend that will return some metadata on a given image. It fails when the expected metadata is not present, so it is ideal for this purpose:

$ identify fake.gif && echo image detected || echo no image detected identify-im6.q16: negative or zero image size `fake.gif' @ error/gif.c/ReadGIFImage/1402. no image detected 

For faster scanning of a large collection of files, I recommend putting both together:

if file "$FILE" |grep -qE 'image|bitmap' \ && ! identify "$FILE" >/dev/null 2>&1; then echo "File '$FILE' is a fake image!" fi 

(This redirects the output of identify into oblivion since we only care about whether it was able to complete successfully, which is captured by its exit code.)

Even this can still be tricked

The following example uses a simple 1x1 white GIF with the same PHP code added to the end. I don't know PHP and I'm not sure this will actually run, but since PHP is a template language that prints the literal "text" to anything outside its <?php … ?> tag, I assume that this will run the given code as-is, with merely some garbage before the payload.

$ { echo 'R0lGODdhAQABAIAAAP///////ywAAAAAAQABAAACAkQBAD' echo 's8P3BocCBlY2hvICJIZWxsbyBmcm9tIFBIUCEiOyA/Pgo=' } | base64 -d > fake2.gif $ strings fake2.gif GIF87a ;<?php echo "Hello from PHP!"; ?> $ file fake2.gif fake2.gif: GIF image data, version 87a, 1 x 1 $ identify fake2.gif fake2.gif GIF 1x1 1x1+0+0 8-bit sRGB 2c 68B 0.000u 0:00.000 

This can also be done with a GIF comment to be fully valid as an image:

$ hd fake3.gif 00000000 47 49 46 38 39 61 01 00 01 00 80 00 00 ff ff ff |GIF89a..........| 00000010 ff ff ff 21 fe 20 3c 3f 70 68 70 20 65 63 68 6f |...!. <?php echo| 00000020 20 22 48 65 6c 6c 6f 20 66 72 6f 6d 20 50 48 50 | "Hello from PHP| 00000030 21 22 3b 20 3f 3e 00 2c 00 00 00 00 01 00 01 00 |!"; ?>.,........| 00000040 00 02 02 44 01 00 3b |...D..;| 00000047 

I've picked on GIF and taken advantage of its comment system, but just concatenating a payload after any image should also work to bypass this detection technique. It's merely harder than fooling file and (depending on the implementation) it might leave some evidence behind (the garbage from the image).

8
  • There is just one little problem. BMP files: 1.bmp: PC bitmap, Windows 3.x format, 1345 x 620 x 24. Other files like .jpeg, .png, .tif and .rgb will have the image data string on the file command return. Commented Mar 10, 2015 at 21:01
  • @nwildner: brentwpeterson only mentioned jpg, gif or png. Commented Mar 10, 2015 at 21:05
  • @nwildner, .bmp is easily added. See my revision above. IIRC, it's the only image format that doesn't have the word "image" in its libmagic description. Commented Mar 10, 2015 at 21:09
  • 1
    Great add, even with .bmp not being part of the scope :). grep 'image\|bitmap' shall work too. Commented Mar 10, 2015 at 21:15
  • 1
    @Blauhirn – File magic is great, but only examines the first few characters of each file and can therefore be tricked. That's the whole point of the question. My answer uses it as a pre-filter and then runs supposed images through imagemagick's identify in order to determine whether it has more expected image metadata. This can be tricked too, but it's harder to do. Commented Oct 16, 2017 at 17:40
0

Following the commands put together by @Adam Katz, I found that my system always failed because the -q option used to suppress the output from grep made it always give a return code of zero. Removing this allows it to work OK, but it means that the output of the identify command is strewn across the screen.

I am using GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu) and GNU grep 3.7, which ship with Xubuntu 22.04.1 LTS by default.

My solution works the same way as Adams, running the necessary commands from within the test.

[[ ( $(file "$file" | grep -E 'image|bitmap') != "" ) \ && ( $(identify $file | grep error) -eq 0 ) ]] \ && echo "File $file appears to be an image" \ || echo "File $file appears to be a fake" 

I hope this helps someone in the same way that Adam's post helped me.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.