4

I have the following JPEG files :

$ ls -l -rw-r--r-- 1 user group 384065 janv. 21 12:10 CamScanner 01-10-2022 14.54.jpg -rw-r--r-- 1 user group 200892 janv. 10 14:55 CamScanner 01-10-2022 14.55.jpg -rw-r--r-- 1 user group 283821 janv. 21 12:10 CamScanner 01-10-2022 14.56.jpg 

I use $ img2pdf to transform each image into a PDF file. To do that :

$ find . -type f -name "*.jpg" -exec img2pdf "{}" --output $(basename {} .jpg).pdf \; 

Result :

$ ls -l *.pdf -rw-r--r-- 1 user group 385060 janv. 21 13:06 CamScanner 01-10-2022 14.54.jpg.pdf -rw-r--r-- 1 user group 201887 janv. 21 13:06 CamScanner 01-10-2022 14.55.jpg.pdf -rw-r--r-- 1 user group 284816 janv. 21 13:06 CamScanner 01-10-2022 14.56.jpg.pdf 

How can I remove the .jpg part of the PDF filenames ? I.e., I want CamScanner 01-10-2022 14.54.pdf and not CamScanner 01-10-2022 14.54.jpg.pdf.

Used alone, $ basename filename .extension prints the filename without the extension, e.g. :

$ basename CamScanner\ 01-10-2022\ 14.54.jpg .jpg CamScanner 01-10-2022 14.54 

But it seems that syntax doesn't work in my $ find command. Any idea why ?

Note : if you replace $ img2pdf by $ echo it's the same, $ basename doesn't get rid of the .jpg part :

$ find . -type f -name "*.jpg" -exec echo $(basename {} .jpg).pdf \; ./CamScanner 01-10-2022 14.56.jpg.pdf ./CamScanner 01-10-2022 14.55.jpg.pdf ./CamScanner 01-10-2022 14.54.jpg.pdf 
1
  • 1
    Crystal ball guess: $() is shell expansion, that's done before find even gets called. At that point, basename tries to work on the literal {}. Commented Jan 21, 2022 at 7:12

3 Answers 3

7

The issue with your find command is that the command substitution around basename is executed by the shell before it even starts running find (as a step in evaluating what the arguments to find should be).

Whenever you need to run anything other than a simple utility with optional arguments for a pathname found by find, for example if you need to do any piping, redirections or expansions (as in your question), you will need to employ a shell to do those things:

find . -type f -name '*.jpg' \ -exec sh -c 'img2pdf --output "$(basename "$1" .jpg).pdf" "$1"' sh {} \; 

Or, more efficiently (each call to sh -c would handle a batch of found pathnames),

find . -type f -name '*.jpg' -exec sh -c ' for pathname do img2pdf --output "$(basename "$pathname" .jpg).pdf" "$pathname" done' sh {} + 

Or, with zsh,

for pathname in ./**/*.jpg(.DN); do img2pdf --output $pathname:t:r.png $pathname done 

This uses the globbing qualifier .DN to only match regular files (.), to allow matching of hidden names (D), and to remove the pattern if no matches are found (N). It then uses the :t modifier to extract the "tail" (filename component) of $pathname, :r to extract the "root" (no filename suffix) of the resulting base name, and then adds .png to the end.

Note that all of the above variations would write the output to the current directory, regardless of where the JPEG file was found. If all your JPEG files are in the current directory, there is absolutely no need to use find, and you could use a simple loop over the expansion of the *.jpg globbing pattern:

for pathname in ./*.jpg; do img2pdf --output "${pathname%.jpg}.png" "$pathname" done 

The parameter substitution ${pathname%.jpg} removes .jpg from the end of the value of $pathname. You may possibly want to use this substitution in place of basename if you want to write the output to the original directories where the JPEG files were found, in the case that you use find over multiple directories, e.g., something like

find . -type f -name '*.jpg' -exec sh -c ' for pathname do img2pdf --output "${pathname%.jpg}.pdf" "$pathname" done' sh {} + 

See also:

8
  • Great and thorough answer. Thanks ! Commented Jan 22, 2022 at 4:34
  • To write the output to the original directories where the JPEG files were found, the for loop is not necessary : find . -type f -name '*.jpg' -exec sh -c 'img2pdf --output "${1%.jpg}.pdf" "$1"' sh {} \; Commented Jan 30, 2022 at 5:13
  • @ChennyStar No, the for loop is not necessary, but it make it much more efficient as only a single shell is spawned for many found images. Without the loop, you run not only img2pdf for each found file, but also sh -c, which makes it slower. This would be significant if you have many images. Commented Jan 30, 2022 at 6:56
  • OK, understood ! Thanks ! Commented Jan 30, 2022 at 6:58
  • While I agree in theory, in practice the spawning of a shell for each file doesn't seem to have that much of an impact. I ran img2pdf on each .jpg file found in /usr (890 in my cases, I made a copy of /usr in /tmp for these tests). Roughly 4'58" in both cases (with or without a for loop). You can try yourself (make a copy of /usr to /tmp first) : time sudo find /tmp/usr -name '*.jpg' -exec sh -c 'img2pdf --output "${1%.jpg}.pdf" "$1"' sh {} \; vs time sudo find /tmp/usr -name '*.jpg' -exec sh -c 'for f do img2pdf --output "${f%.jpg}.pdf" "$f"; done' sh {} \+ Commented Jan 30, 2022 at 10:39
1

@Ulrich Schwarz's comment is apt. To bring it full circle, let's assume that you don't have any filenames with quotes or single-quotes in them.

Adapt your find syntax to simply output the basename sans the .jpg, and then use awk perhaps, to reconstruct the img2pdf syntax utilizing the .jpg and .pdf extensions where appropriate:

This find command will output the bare basename:

$ find . -type f -name "*.jpg" -exec basename {} .jpg \; CamScanner 01-10-2022 14.56 CamScanner 01-10-2022 14.55 CamScanner 01-10-2022 14.54 

Now pass those basenames to awk and let awk construct the correct syntax for img2pdf:

$ find . -type f -name "*.jpg" -exec basename {} .jpg \; | \ awk '{print "cp -vp '\''" $0 ".jpg'\'' --output '\''" $0 ".pdf'\''"}' img2pdf 'CamScanner 01-10-2022 14.56.jpg' --output 'CamScanner 01-10-2022 14.56.pdf' img2pdf 'CamScanner 01-10-2022 14.55.jpg' --output 'CamScanner 01-10-2022 14.55.pdf' img2pdf 'CamScanner 01-10-2022 14.54.jpg' --output 'CamScanner 01-10-2022 14.54.pdf' 

If that syntax looks okay, then pipe it to your favorite shell.

1
  • 2
    Double quotes in filenames wouldn't be a problem but newline characters or leading dashes would be. With some shells, backslash or exclamation marks would be. With yash, so would sequences of bytes not forming valid characters in the locale. Commented Jan 21, 2022 at 8:57
1

From a pragmatic point of view, once you have reached the state you describe at the end of your question:

./CamScanner 01-10-2022 14.56.jpg.pdf ./CamScanner 01-10-2022 14.55.jpg.pdf ./CamScanner 01-10-2022 14.54.jpg.pdf 

You have the option of using rename to get the final file names you want:

~ rename 's/jpg.pdf/pdf/' 

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.