1

I'm trying to do a script which lists files on a directory and then searchs one by one every file in other directory. For dealing with spaces and special characters like "[" or "]" I'm using $(printf %q "$FILENAME") as input for the find command: find /directory/to/search -type f -name $(printf %q "$FILENAME"). It works like a charm for every filename except in one case: when there's multibyte characters (UTF-8). In that case the output of printf is an external quoted string, i.e.: $'file name with blank spaces and quoted characters in the form of \NNN\NNN', and that string is not being expanded without the $'' quoting, so find searchs for a file with a name including that quote: «$'filename'».

Is there an alternative solution in order to be able to pass to find any kind of filename?

My script is like follows (I know some lines can be deleted, like the "RESNAME="):

#!/bin/bash if [ -d $1 ] && [ -d $2 ]; then IFSS=$IFS IFS=$'\n' FILES=$(find $1 -type f ) for FILE in $FILES; do BASEFILE=$(printf '%q' "$(basename "$FILE")") RES=$(find $2 -type f -name "$BASEFILE" -print ) if [ ${#RES} -gt 1 ]; then RESNAME=$(printf '%q' "$(basename "$RES")") else RESNAME= fi if [ "$RESNAME" != "$BASEFILE" ]; then echo "FILE NOT FOUND: $FILE" fi done else echo "Directories do not exist" fi IFS=$IFSS 

As an answer said, I've used associative arrays, but with no luck, maybe I'm not using correctly the arrays, but echoing it (array[@]) returns nothing. This is the script I've written:

#!/bin/bash if [ -d "$1" ] && [ -d "$2" ]; then declare -A files find "$2" -type f -print0 | while read -r -d $'\0' FILE; do BN2="$(basename "$FILE")" files["$BN2"]="$BN2" done echo "${files[@]}" find "$1" -type f -print0 | while read -r -d $'\0' FILE; do BN1="$(basename "$FILE")" if [ "${files["$BN1"]}" != "$BN1" ]; then echo "File not found: "$BN1"" fi done fi 
7
  • use double quotes to enclose $IFS: IFSS="$IFS", than IFS="$IFSS" Commented Oct 27, 2013 at 16:21
  • use -v switch in printf instead of fork: printf -v BASEFILE "%q" "${file##*/}" Commented Oct 27, 2013 at 16:25
  • try to add export LANG=C at top of your script (and whipe all printf "%q" as this is useless while you enclose all variables by double quotes: BASEFILE="$(...)" and if [ -f $2/"${BASEFILE##*/}" ]... and maybe: care about leading newline \n : STRING="${STRING//$'\n'}" Commented Oct 27, 2013 at 16:34
  • @F.Hauri Quoting in assignments is not necessary. But quoting of command arguments as in find "$2" is necessary. Commented Oct 27, 2013 at 17:09
  • btw where is the regex? Commented Oct 27, 2013 at 17:19

5 Answers 5

1

Don't use for loops. First, it is slower. Your find has to complete before the rest of your program can run. Second, it is possible to overload the command line. The enter for command must fit in the command line buffer.

Most importantly of all, for sucks at handling funky file names. You're running conniptions trying to get around this. However:

find $1 -type f -print0 | while read -r -d $'\0' FILE 

will work much better. It handles file names -- even file names that contain \n characters. The -print0 tells find to separate file names with the NUL character. The while read -r -d $'\0 FILE will read each file name (separate by the NUL character) into $FILE.

If you put quotes around the file name in the find command, you don't have to worry about special characters in the file names.

Your script is running find once for each file found. If you have 100 files in your first directory, you're running find 100 times.

Do you know about associative (hash) arrays in BASH? You are probably better off using associative arrays. Run find on the first directory, and store those files names in an associative array.

Then, run find (again using the find | while read syntax) for your second directory. For each file you find in the second directory, see if you have a matching entry in your associative array. If you do, you know that file is in both arrays.


Addendum

I've been looking at the find command. It appears there's no real way to prevent it from using pattern matching except through a lot of work (like you were doing with printf. I've tried using the -regex matching and using \Q and \E to remove the special meaning of pattern characters. I haven't been successful.

There comes a time that you need something a bit more powerful and flexible than shell to implement your script, and I believe this is the time.

Perl, Python, and Ruby are three fairly ubiquitous scripting languages found on almost all Unix systems and are available on other non-POSIX platforms (cough! ...Windows!... cough!).

Below is a Perl script that takes two directories, and searches them for matching files. It uses the find command once and uses associative arrays (called hashes in Perl). I key the hash to the name of my file. In the value portion of the hash, I store an array of the directories where I found this file.

I only need to run the find command once per directory. Once that is done, I can print out all the entries in the hash that contain more than one directory.

I know it's not shell, but this is one of the cases where you can spend a lot more time trying to figure out how to get shell to do what you want than its worth.

#! /usr/bin/env perl use strict; use warnings; use feature qw(say); use File::Find; use constant DIRECTORIES => qw( dir1 dir2 ); my %files; # # Perl version of the find command. You give it a list of # directories and a subroutine for filtering what you find. # I am basically rejecting all non-file entires, then pushing # them into my %files hash as an array. # find ( sub { return unless -f; $files{$_} = [] if not exists $files{$_}; push @{ $files{$_} }, $File::Find::dir; }, DIRECTORIES ); # # All files are found and in %files hash. I can then go # through all the entries in my hash, and look for ones # with more than one directory in the array reference. # IF there is more than one, the file is located in multiple # directories, and I print them. # for my $file ( sort keys %files ) { if ( @{ $files{$file} } > 1 ) { say "File: $file: " . join ", ", @{ $files{$file} }; } } 
Sign up to request clarification or add additional context in comments.

8 Comments

I'll try that, but I think I've something wrong with my terminal because if I do a find of a file with "[" and "]" symbols on its name and double quoting it, it doesn't return any results although the file exists and is where I'm searching. For example: find . -type f -name "(.HDG) Infierno blanco (2011) [BDRip 720p x264 DTS Dual Esp-Eng].mkv" No results, but if I try: find . -type f -name "$(printf %q "(.HDG) Infierno blanco (2011) [BDRip 720p x264 DTS Dual Esp-Eng].mkv")" Then the file is found.
I've updated the question with a script that I think implements what you said, but it does not work.
Hey, your perl script does exactly what my bash script (the one with hashes) does :). So I wouldn't quite say it's time to switch to another tool ;-p
I've been trying various ways to get this to work in shell. There is a hard limit on the value of the key in associative arrays in shell. I'm going to give the answer in Perl
@gniourf_gniourf I've been futzing around with find and associative arrays in shell and haven't been able to get it to work. I have file names that are too long for keys and contain characters not allowed in keys. That's why I switched to Perl.
|
0

Try something like this:

find "$DIR1" -printf "%f\0" | xargs -0 -i find "$DIR2" -name \{\} 

Comments

0

How about this one-liner?

find dir1 -type f -exec bash -c 'read < <(find dir2 -name "${1##*/}" -type f)' _ {} \; -printf "File %f is in dir2\n" -o -printf "File %f is not in dir2\n" 

Absolutely 100% safe regarding files with funny symbols, newlines and spaces in their name.

How does it work?

find (the main one) will scan through directory dir1 and for each file (-type f) will execute

read < <(find dir2 -name "${1##*/} -type f") 

with argument the name of the current file given by the main find. This argument is at position $1. The ${1##*/} removes everything before the last / so that if $1 is path/to/found/file the find statement is:

find dir2 -name "file" -type f 

This outputs something if file is found, otherwise has no output. That's what is read by the read bash command. read's exit status is true if it was able to read something, and false if there wasn't anything read (i.e., in case nothing is found). This exit status becomes bash's exit status which becomes -exec's status. If true, the next -printf statement is executed, and if false, the -o -printf part will be executed.

If your dirs are given in variables $dir1 and $dir2 do this, so as to be safe regarding spaces and funny symbols that could occur in $dir2:

find "$dir1" -type f -exec bash -c 'read < <(find "$0" -name "${1##*/}" -type f)' "$dir2" {} \; -printf "File %f is in $dir2\n" -o -printf "File %f is not in $dir2\n" 

Regarding efficiency: this is of course not an efficient method at all! the inner find will be executed as many times as there are found files in dir1. This is terrible, especially if the directory tree under dir2 is deep and has many branches (you can rely a little bit on caching, but there are limits!).

Regarding usability: you have fine-grained control on how both find's work and on the output, and it's very easy to add many more tests.


So, hey, tell me how to compare files from two directories? Well, if you agree on loosing a little bit of control, this will be the shortest and most efficient answer:

diff dir1 dir2 

Try it, you'll be amazed!

1 Comment

Thanks for your answer. That one-liner seems to do the job. It's more or less clear how it works, I'll analyse it carefully and improve it (too much verbose). The diff command does not work well for my purpose, but all possibilities are welcome.
0

Since you are only using find for its recursive directory following, it will be easier to simply use the globstar option in bash. (You're using associative arrays, so your bash is new enough).

#!/bin/bash shopt -s globstar declare -A files if [[ -d $1 && -d $2 ]]; then for f in "$2"/**/*; do [[ -f "$f" ]] || continue BN2=$(basename "$f") files["$BN2"]=$BN2 done echo "${files[@]}" for f in "$1"/**/*; do [[ -f "$f" ]] || continue BN1=$(basename $f) if [[ ${files[$BN1]} != $BN1 ]]; then echo "File not found: $BN1" fi done fi 

** will match zero or more directories, so $1/**/* will match all the files and directories in $1, all the files and directories in those directories, and so forth all the way down the tree.

Comments

0

If you want to use associative arrays, here's one possibility that will work well with files with all sorts of funny symbols in their names (this script has too much to just show the point, but it is usable as is – just remove the parts you don't want and adapt to your needs):

#!/bin/bash die() { printf "%s\n" "$@" exit 1 } [[ -n $1 ]] || die "Must give two arguments (none found)" [[ -n $2 ]] || die "Must give two arguments (only one given)" dir1=$1 dir2=$2 [[ -d $dir1 ]] || die "$dir1 is not a directory" [[ -d $dir2 ]] || die "$dir2 is not a directory" declare -A dir1files declare -A dir2files while IFS=$'\0' read -r -d '' file; do dir1files[${file##*/}]=1 done < <(find "$dir1" -type f -print0) while IFS=$'\0' read -r -d '' file; do dir2files[${file##*/}]=1 done < <(find "$dir2" -type f -print0) # Which files in dir1 are in dir2? for i in "${!dir1files[@]}"; do if [[ -n ${dir2files[$i]} ]]; then printf "File %s is both in %s and in %s\n" "$i" "$dir1" "$dir2" # Remove it from dir2 has unset dir2files["$i"] else printf "File %s is in %s but not in %s\n" "$i" "$dir1" "$dir2" fi done # Which files in dir2 are not in dir1? # Since I unset them from dir2files hash table, the only keys remaining # correspond to files in dir2 but not in dir1 if [[ -n "${!dir2files[@]}" ]]; then printf "File %s is in %s but not in %s\n" "$dir2" "$dir1" "${!dir2files[@]}" fi 

Remark. The identification of files is only based on their filenames, not their contents.

2 Comments

I used your script but it couldn't cope with the [ and ] characters. Bash confuses them with the operator of the associative array.
@user2925396 I don't understand where these characters can cause problems... is it in a filename? can you provide an example of failure so that I can fix it?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.